Kids can do math better than x86 cpu's.

AZ Nomad · Mar 8, 2007

Hello,

In principle it should be possible to add, substract, multiple or divide a 1
billion digits with a 1 billion digits, even kids learn this in school with
the standard math tables and math algorithms for addition, substraction,
multiple and division.

Riiiiight. Don't use the math processor for doing a spreadsheet of a few
million calculations; call over the neighbor's kid! The problem is that
the kid starts whining about the boredom after just 6-7 calculations.

John Larkin · Mar 8, 2007

That was me, I use that approach in all my binary arbitrary precision libs.

With me it was the opposite, sort of: The _very_ first non-obligatory
program I wrote was a (Fortran-2) implementation of arbitrary precision,
written to calculate as many digits of pi as my student cpu-seconds
budget would allow.

OTOH, I used to spend my high school physics classes playing around with
Taylor series and arbitrary precision algorithms on paper, just because
that allowed me to calculate with better precision than my big 5-digit
log table book would allow.

I spent my high-school years playing with photomultipliers and tunnel
diodes and surplus radars and stuff, so I never did learn to program.
That hasn't stopped me from writing a million lines or so of code.

John

Skybuck Flying · Mar 8, 2007

AZ Nomad said:
Riiiiight. Don't use the math processor for doing a spreadsheet of a few
million calculations; call over the neighbor's kid! The problem is that
the kid starts whining about the boredom after just 6-7 calculations.

No, not millions of calculations.

Just one calculation:

11125422312367237234783454709540408340698340698340683049860398460349869374865786132123164361327957396523468273591287591237459238759237592375975923759283759823759237592375927398572398572394754164361436142634164243232123118202390593740572308652389562397652865

*

78673862876823144124298762947639845862359836963459364589364598376598365892376523978653876432231342132112121265121236721371278157834253482374584582348236583483458723054095409854709850985470985705968705968705968754096875409875409870457854097845069854097856098348572357862352364528761431654126541

Try calculating that in your spreadsheet or x86/x64 processor.

The processor can't do it without help of software/algorithms but your
neighbor's kids can all by themselfes !

Conclusion: processor is missing the basics of math, kids do have the basics
of math.

Bye,
Skybuck.

John Larkin · Mar 8, 2007

No, not millions of calculations.

Just one calculation:

11125422312367237234783454709540408340698340698340683049860398460349869374865786132123164361327957396523468273591287591237459238759237592375975923759283759823759237592375927398572398572394754164361436142634164243232123118202390593740572308652389562397652865

*

78673862876823144124298762947639845862359836963459364589364598376598365892376523978653876432231342132112121265121236721371278157834253482374584582348236583483458723054095409854709850985470985705968705968705968754096875409875409870457854097845069854097856098348572357862352364528761431654126541

Try calculating that in your spreadsheet or x86/x64 processor.

The processor can't do it without help of software/algorithms but your
neighbor's kids can all by themselfes !

Kids use algorithms. But no kid is going to do your multiply
successfully.

John

blmblm@myrealbox.com · Mar 8, 2007

[ snip ]

Just one calculation:

11125422312367237234783454709540408340698340698340683049860398460349869374865786132123164361327957396523468273591287591237459238759237592375975923759283759823759237592375927398572398572394754164361436142634164243232123118202390593740572308652389562397652865

*

78673862876823144124298762947639845862359836963459364589364598376598365892376523978653876432231342132112121265121236721371278157834253482374584582348236583483458723054095409854709850985470985705968705968705968754096875409875409870457854097845069854097856098348572357862352364528761431654126541

Try calculating that in your spreadsheet or x86/x64 processor.

The processor can't do it without help of software/algorithms but your
neighbor's kids can all by themselfes !

Conclusion: processor is missing the basics of math, kids do have the basics
of math.

But is that a fair comparison? The kid had to be taught some
algorithm for doing long division, no? Isn't that in some sense
analogous to software for arbitrary-precision arithmetic on a
processor that doesn't directly support it?

Nick Maclaren · Mar 8, 2007

|>
|> >Bignum math is either trivial (no efficiency/speed requirements), or
|> >critical.
|> >
|> >In the latter case FPGAs are unsuited.
|>
|> I remain unconvinced just because you state it to be so.

In which case, I won't bother to say that Terje is right, because you
won't take any notice.

|> >I guarantee you that this will be significantly slower than quite simple
|> >x86 code using the basic 64x64->128 bitMUL opcode.
|>
|> OK, so prove it to me.

Why? We are not paid to educate you, and we are not inclined to do
so if you don't want to learn.

Regards,
Nick Maclaren.

me · Mar 8, 2007

No, not millions of calculations.

Just one calculation:

111254223123672372347834547095404083406983406983406830498603984603498693
748657861321231643613279573965234682735912875912374592387592375923759759
237592837598237592375923759273985723985723947541643614361426341642432321
23118202390593740572308652389562397652865

*

786738628768231441242987629476398458623598369634593645893645983765983658
923765239786538764322313421321121212651212367213712781578342534823745845
823482365834834587230540954098547098509854709857059687059687059687540968
754098754098704578540978450698540978560983485723578623523645287614316541
26541

Try calculating that in your spreadsheet or x86/x64 processor.

The processor can't do it without help of software/algorithms but your
neighbor's kids can all by themselfes !

Conclusion: processor is missing the basics of math, kids do have the
basics of math.

Bye,
Skybuck.

Hey dopey, kids can't do it without an algorithm either....

Richard · Mar 8, 2007

[Please do not mail me a copy of your followup]

[email protected] (Nick Maclaren) spake the secret code

|>
|> >Bignum math is either trivial (no efficiency/speed requirements), or
|> >critical.
|> >
|> >In the latter case FPGAs are unsuited.
|>
|> I remain unconvinced just because you state it to be so.

In which case, I won't bother to say that Terje is right, because you
won't take any notice.

Your arguments are essentially an "appeal to authority", which is why
I don't find them convincing. If its so trivial to prove the
assertion, then I don't see why this would be a problem.

|> >I guarantee you that this will be significantly slower than quite simple
|> >x86 code using the basic 64x64->128 bitMUL opcode.
|>
|> OK, so prove it to me.

Why? We are not paid to educate you, and we are not inclined to do
so if you don't want to learn.

I didn't ask you to educate me; the assertion was made that FPGAs
would be horrible for this and I'm interested to know why. If the
response to my "Why?" is just "I'm not going to be bothered explaining
it, even in the briefest of terms" then I don't see why I should put
much stock in your argument.

If you want to be a crufty curmudgeon, that's certainly your
prerogative, but it doesn't make for a convincing defense of the
assertion.

robertwessel2@yahoo.com · Mar 9, 2007

Did you take into account the number of transistors needed for each method ?

Maybe the methods you mentioned require less steps but because it's a
complexer method it requires more transistors !

Since the simpler method requires less transistors it can be implemented
more in parallel on the same "chip space".

So to make comparision fair that factor has to be taken into the equation as
well for example:

1. "Reaction time of simple method" versus "Reaction time of complex
method".

Plus:

2. "Throughput of all sequentially called parallel implemented simple
methods" versus "Throughput of all sequentially called parallel implemented
complex methods".

For example:

The complex method has reaction time of 1000 nanoseconds.

The simple method has reaction time of 50000 nanoseconds.

The complex method can be implemented 10 times on chip space.

The simple method can be implemented 1000 times on chip space.

1 second = 1 000 000 000 nanoseconds:

Throughput for one complex method: 1 000 000 000 / 1 000 = 1 000 000

Throughput for one simple method: 1 000 000 000 / 50 000 = 20 000

Total throughput complex method: 1 000 000 * 10 = 10 000 000

Total throughput simple method: 20 000 * 1 000 = 20 000 000

This example shows the simple method could outperform the complex method.

Throughput wasn't on the table before...

The question of throughput makes for a much more complex question. It
may in some cases be possible to build a very small, but very slow,
multiplier and you might get the best throughput per transistor that
way. For example, a bit serial multiplier will be of size N, but will
take N**2 time to multiply two numbers. A full multiplier (using the
basic algorithm) will be of size N**2, but will multiply in
approximately log(N)**2.

Also, since we're talking about hardware implementations, pipelining
has to be considered. The simply full multiplier can easily be
pipelined so that about log(N) multiplications can be working their
way though it at any one time.

Now as to particular implementations. Building a full Karatsuba
multiplier would be straightforward, and would have about the same
advantage in transistor count (modulo some constant) over the basic
algorithm as does the software implementation. It would even be
pipelineable. You will definitely lose something in die area, since
the resulting structure will not be anywhere near as regular as that
for a basic full multiplier, so you wouldn't be able to fit as many on
a chip.

The transforms needed for FFT multiplication would require a fair bit
of analysis before I'd be willing to speculate on how hard it would be
to implement a "full FFT multiplier." But as a basic rule, the amount
of hardware you need for a "full" implementation will be roughly
proportional to the number of operations performed by the algorithm.

But the question is, would you ever choose to build a 1024x1024 bit
multiplier in hardware directly? It's possible that might be the best
approach for some application, but doesn't solve the problem for
numbers bigger than 1024 bits, and will be very expensive in terms of
area. Almost certainly you'd build a high precision multiplier out of
some smaller components, like a 32x32 multiplier, storage for the work
fields, and some control logic, etc. The control logic then replaces
the software algorithm, and other than some linear improvements in
efficiency (for example, the multiplier might pipeline, you might be
able to merge several add functions into a single clock cycle), runs
in exactly the same way. And mind you that you can fit a quite
complex sequencer in the same space as a 32x32 multiplier.

In any event, it seem difficult to justify any really large hardware
multipliers. At some point, the software overhead of a extended
precision calculation becomes insignificant. If the CPU actually
provided a reasonably fast 1024x1024 multiplier, you could probably
write a 16384x16384 multiplier in interpreted Basic (using the
hardware 1024x1024 multiplier as a primitive) that would be
performance indistinguishable from a hardware or microcode
implementation of the same thing. So why burden the hardware with
that? Which brings us to your original point. A CPU should not be
expected to solve all of the worlds computation problems directly. It
should, however, provide a good set of primitives that allow the
programmer to solve those problems. Why, for example, should
processor have primitive to compute hyperbolic trig functions, or
directly handle complex numbers (or any of an infinity of other
complex functions)? While at some point it may make sense to include
those, for the foreseeable future, implementing those directly in
hardware would bring modest performance gains (at best), for
considerable complexity, and be applicable only to a small user
community.

As a general comment, I can't think of a single CPU that's directly
implemented arbitrary precision arithmetic. A few cases of
specialized hardware, for example the modular exponentiation unit on
Niagara, do exist, but those are not for general purpose use.

AZ Nomad · Mar 9, 2007

No, not millions of calculations.

Just one calculation:

Try calculating that in your spreadsheet or x86/x64 processor.

The processor can't do it without help of software/algorithms but your
neighbor's kids can all by themselfes !

Try it some time. I guarantee the kid will tell you to **** off.

Larry Elmore · Mar 9, 2007

SBCL gave me
875279949449928687171082381195645449709812282235188534611991220487369750660866053304942879984231537943067897694712601511924963938379101922647017996467693753500201567049250263137918035932076754718727578460674626632963053682953542462596305825701489258055255205143723480139275624009749763785363546996951941944685372601254940771920912104622418969407723724820905001415363123084056285866891887451657302837214191356534169032664600101693678003589068377972327249782374353699650791261530178195228611902447640024153616963547499862263386141535774330150101189965
faster than I could blink. Unless the kid is an idiot savant, I
guarantee he's using a learned algorithm, i.e. software, being executed
by his wetware. Unbelievably slower and far more error prone. I think
you either need to get back on your legal meds or lay off the illegal ones.

Try it some time. I guarantee the kid will tell you to **** off.

I might have done it as a kid if challenged and a decent prize was in
the offing. Otherwise, you're right. I'd have told him to stuff
himself. At least I don't need to offer prizes to my CPU. Sheesh.

Terje Mathisen · Mar 9, 2007

Richard said:
OK, so prove it to me.

I don't have to, Thomas Womack already did in
<
Terje

Peter Dickerson · Mar 9, 2007

me said:
snip

But they all do math better than children. Faster and less prone to
errors...

Math(s) or arithmetic. Probably the answer is both for (most) children.

Peter

Nick Maclaren · Mar 9, 2007

|>
|> >|> >Bignum math is either trivial (no efficiency/speed requirements), or
|> >|> >critical.
|> >|> >
|> >|> >In the latter case FPGAs are unsuited.
|> >|>
|> >|> I remain unconvinced just because you state it to be so.
|> >
|> >In which case, I won't bother to say that Terje is right, because you
|> >won't take any notice.
|>
|> Your arguments are essentially an "appeal to authority", which is why
|> I don't find them convincing. If its so trivial to prove the
|> assertion, then I don't see why this would be a problem.

No, it's not. Terje, I and others have repeatedly pointed out why that
is so. You put some minimal effort into finding out a little, posted
incorrect claims on the basis of it, and will neither accept correction
nor look more deeply into the issues. The data for both of Terje's
statements are fairly easy to find, if you can be bothered.

|> >Why? We are not paid to educate you, and we are not inclined to do
|> >so if you don't want to learn.
|>
|> I didn't ask you to educate me; the assertion was made that FPGAs
|> would be horrible for this and I'm interested to know why. ...

Yes, you did. You want an explanation of what you could perfectly well
find out for yourself with a bit of effort. Look that up for yourself
and, IF you have trouble understanding, THEN ask again.

If you had asked for references in an appropriate fashion in the first
place, someone might have helped you.

Regards,
Nick Maclaren.

Richard · Mar 9, 2007

[Please do not mail me a copy of your followup]

Man, you are one argumentative dude. Thomas Womack posted something
useful. The rest of the posts on this just said "it ain't so" without
providing *any* data.

Ken Hagan · Mar 9, 2007

[Please do not mail me a copy of your followup]

Man, you are one argumentative dude. Thomas Womack posted something
useful. The rest of the posts on this just said "it ain't so" without
providing *any* data.

So Thomas has given you a fish and Nick is suggesting a fishing rod.

AZ Nomad · Mar 9, 2007

SBCL gave me
875279949449928687171082381195645449709812282235188534611991220487369750660866053304942879984231537943067897694712601511924963938379101922647017996467693753500201567049250263137918035932076754718727578460674626632963053682953542462596305825701489258055255205143723480139275624009749763785363546996951941944685372601254940771920912104622418969407723724820905001415363123084056285866891887451657302837214191356534169032664600101693678003589068377972327249782374353699650791261530178195228611902447640024153616963547499862263386141535774330150101189965
faster than I could blink. Unless the kid is an idiot savant, I
guarantee he's using a learned algorithm, i.e. software, being executed
by his wetware. Unbelievably slower and far more error prone. I think
you either need to get back on your legal meds or lay off the illegal ones.

I might have done it as a kid if challenged and a decent prize was in
the offing. Otherwise, you're right. I'd have told him to stuff
himself. At least I don't need to offer prizes to my CPU. Sheesh.

Especially a problem with 260 and a 290 digits. 75,400 single digit
multiplies. At one every two seconds (time for the multiply plus
some time for the carry), that's 42 hours straight and that's not
counting the adds. Good luck getting a kid in today's video game
generation to do any task that takes more than ten minutes.
The odds of there being no mistakes is vanishingly small. I'd say the
odds of absolute peace in the middle east for the next two centuries
is higher.

By the way, the answer is:
875279949449928687171082381195645449709812282235188534611991220487369\
750660866053304942879984231537943067897694712601511924963938379101922\
647017996467693753500201567049250263137918035932076754718727578460674\
626632963053682953542462596305825701489258055255205143723480139275624\
009749763785363546996951941944685372601254940771920912104622418969407\
723724820905001415363123084056285866891887451657302837214191356534169\
032664600101693678003589068377972327249782374353699650791261530178195\
228611902447640024153616963547499862263386141535774330150101189965

My intel compatible processor had no problem at all. I didn't even have
to install any new tools on my linux box. The lowly 'dc' desk calculator
app did it just fine.

jasen · Mar 10, 2007

Hello,

In principle it should be possible to add, substract, multiple or divide a 1
billion digits with a 1 billion digits, even kids learn this in school with
the standard math tables and math algorithms for addition, substraction,
multiple and division.

These algorithms can be applied to arbitrary length numbers. (For positional
number systems)

(I think these algorithms were invented thousands of years ago by the
Greeks).

However x86 cpu's today can not do with kids, Greeks or common people can do
and that is do math with arbitrary long numbers.

I'd like to see you multiply two 100 digit numbers dictated to you.
you're not allowed to write them down....

x86 cpu's only do math with a limited/fixed number of digits/bits... which
is quite disappointing.

no, it's realistic.

Why is this ? Sigh.

memory is finite.

Some possibilities:

1. Add new instructions which can multiple arbitary long numbers in memory.

2. Add high level chips which can use the low level chips to do these more
complex math algorithms...

The usefulness of arbitrary precision math not significant enough.

The point is: this would prevent programmers from having to reinvent the old
Greek math wheel,

so do arbirtary precision libraries...

Bye.
Jasen

jasen · Mar 10, 2007

Using add with carry is easy in C, various compilers can detect it:

x = y + z;
if (x < y) x++;
x += a;

Translates to:

ADDS x, y, z
ADC x, x, a

what if x,y,and z, are 100 to 200 times wider than the widest register?

Terje Mathisen · Mar 10, 2007

jasen said:
what if x,y,and z, are 100 to 200 times wider than the widest register?

Easy!

In that case you're working with variable length/arbitrary length
numbers, in which case unsigned overflow cannot happen.

The code above is a basic building block for such library code.

Terje

Moore's Lobby Podcast

Menu

Categories

Platforms

Content

Connect With Us

Network

Kids can do math better than x86 cpu's.

Kids can do math better than x86 cpu's.

AZ Nomad

John Larkin

Skybuck Flying

John Larkin

[email protected]

Nick Maclaren

me

Richard

[email protected]

AZ Nomad

Larry Elmore

Terje Mathisen

Peter Dickerson

Nick Maclaren

Richard

Ken Hagan

AZ Nomad

jasen

jasen

Terje Mathisen

Similar threads