Kids can do math better than x86 cpu's.

Skybuck Flying · Mar 7, 2007

Hello,

In principle it should be possible to add, substract, multiple or divide a 1
billion digits with a 1 billion digits, even kids learn this in school with
the standard math tables and math algorithms for addition, substraction,
multiple and division.

These algorithms can be applied to arbitrary length numbers. (For positional
number systems)

(I think these algorithms were invented thousands of years ago by the
Greeks).

However x86 cpu's today can not do with kids, Greeks or common people can do
and that is do math with arbitrary long numbers.

x86 cpu's only do math with a limited/fixed number of digits/bits... which
is quite disappointing.

Why is this ? Sigh.

Some possibilities:

1. Add new instructions which can multiple arbitary long numbers in memory.

(Example. multiply memory_pointer1, memory_length1, memory_pointer2,
memory_length2, memory_pointer3, memory_length3)
(something like that... could even be a 2 memory location operation instead
of 3)
(A * B = C)

or

2. Add high level chips which can use the low level chips to do these more
complex math algorithms...

The point is: this would prevent programmers from having to reinvent the old
Greek math wheel, saving them time

and ofcourse giving them easy coding
platform/more productivity

Just some thoughts.

Bye,
Skybuck.

Tim Shoppa · Mar 7, 2007

The point is: this would prevent programmers from having to reinvent the old
Greek math wheel, saving them time and ofcourse giving them easy coding
platform/more productivity

There are numerous CPU's out there that deal with arbitrary precision
arithmetic. Typically the instruction set was optimized for COBOL

,
which should give you a hint that you want to look at ISA's from the
60's and 70's to see examples.

In the real world there are very few needs for so many digits of
precision.

But when the needs do arrive, only the stupid programmers re-invent
the wheel. The others use commonly available algorithms already out
there. The GNU multi-precision library and the C bignum library come
to mind immediately.

And when the wheel has been re-invented, it's been re-invented for the
better: for example there are better algorithms for multiplication,
these algorithms borrowing ideas from FFT's. It should not be a
surprise that a lot of computers have already been optimized for FFT
performance

. Google for "Schoenhage-Strassen algorithm".

Tim.

Jan Vorbrüggen · Mar 7, 2007

And when the wheel has been re-invented, it's been re-invented for the

better: for example there are better algorithms for multiplication,
these algorithms borrowing ideas from FFT's. It should not be a
surprise that a lot of computers have already been optimized for FFT
performance . Google for "Schoenhage-Strassen algorithm".

And again, if and when you need an implementation of these algorithms, get an
existing one, e.g., contained in OpenSSL, instead of re-doing it from scratch
(and likely getting it wrong in the process).

Jan

John Larkin · Mar 7, 2007

Hello,

In principle it should be possible to add, substract, multiple or divide a 1
billion digits with a 1 billion digits, even kids learn this in school with
the standard math tables and math algorithms for addition, substraction,
multiple and division.

These algorithms can be applied to arbitrary length numbers. (For positional
number systems)

(I think these algorithms were invented thousands of years ago by the
Greeks).

However x86 cpu's today can not do with kids, Greeks or common people can do
and that is do math with arbitrary long numbers.

x86 cpu's only do math with a limited/fixed number of digits/bits... which
is quite disappointing.

Why is this ? Sigh.

Some possibilities:

1. Add new instructions which can multiple arbitary long numbers in memory.

(Example. multiply memory_pointer1, memory_length1, memory_pointer2,
memory_length2, memory_pointer3, memory_length3)
(something like that... could even be a 2 memory location operation instead
of 3)
(A * B = C)

or

2. Add high level chips which can use the low level chips to do these more
complex math algorithms...

The point is: this would prevent programmers from having to reinvent the old
Greek math wheel, saving them time and ofcourse giving them easy coding
platform/more productivity

Just some thoughts.

Bye,
Skybuck.

You really should do a little googling before you go off on rants like
this. There are lots of extended math libraries and applications
around, both binary and decimal. Some will compute to arbitrary
precision. This issue was addressed decades ago.

The most common math functions, 32 bit integer operations and floats
out to 80 bits, are executed in x86 hardware. The oddball stuff, like
arbitrary precision and decimal floats, is done as subroutines.

John

langwadt@ieee.org · Mar 7, 2007

You really should do a little googling before you go off on rants like
this. There are lots of extended math libraries and applications
around, both binary and decimal. Some will compute to arbitrary
precision. This issue was addressed decades ago.

The most common math functions, 32 bit integer operations and floats
out to 80 bits, are executed in x86 hardware. The oddball stuff, like
arbitrary precision and decimal floats, is done as subroutines.

John- Hide quoted text -

- Show quoted text -

which is basically how humans would do it; digit by digit

-Lasse

Nick Maclaren · Mar 7, 2007

|>
|> In the real world there are very few needs for so many digits of
|> precision.
|>
|> But when the needs do arrive, only the stupid programmers re-invent
|> the wheel. The others use commonly available algorithms already out
|> there. The GNU multi-precision library and the C bignum library come
|> to mind immediately.

That's too strong. I am doing it at present, for good and sufficient
reasons. I agree that your statement is true 99% of the time, perhaps
more

To the best of my knowledge, there is nothing that comes close to what
I am currently doing, though there has been in the past. But I baulk
at trying to FIND the old codes, though I would happily convert one of
them if I could do so. Where would YOU look up things that dropped out
of common practice in the 1960s?

|> And when the wheel has been re-invented, it's been re-invented for the
|> better: for example there are better algorithms for multiplication,
|> these algorithms borrowing ideas from FFT's. It should not be a
|> surprise that a lot of computers have already been optimized for FFT
|> performance

. Google for "Schoenhage-Strassen algorithm".

Not at all. It has very often been invented worse. I despair at the
way that computer scientists reinvent technologies that predate their
subject - and then get the number of sides on the wheel wrong :-(

For example, how would YOU do division and modular multiplication, how
do those packages do it, and are you quite sure that is the best way?
Well, let's not go there - it would take all week

Regards,
Nick Maclaren.

John Larkin · Mar 7, 2007

which is basically how humans would do it; digit by digit

-Lasse

Yeah. I posted a problem a while back, about needing a fast 64-bit
binary-to-ascii conversion routine. Someone made a suggestion that
helped solve the problem for me (divide by 1e9 and convert the chunks
separately) and when I wrote the code I realized, for the first time,
how "long division" actually works.

John

John Larkin · Mar 7, 2007

Not at all. It has very often been invented worse. I despair at the
way that computer scientists reinvent technologies that predate their
subject - and then get the number of sides on the wheel wrong :-(

From "Dreaming in Code" : The worst thing about programmers is that
they love to write code.

John

Jim Thompson · Mar 7, 2007

Yeah. I posted a problem a while back, about needing a fast 64-bit
binary-to-ascii conversion routine. Someone made a suggestion that
helped solve the problem for me (divide by 1e9 and convert the chunks
separately) and when I wrote the code I realized, for the first time,
how "long division" actually works.

John

They're back to teaching "casting out 9's" in the schools here.
Except they gave it some different name to confuse grandparents ;-)

...Jim Thompson

Richard The Dreaded Libertarian · Mar 7, 2007

And again, if and when you need an implementation of these algorithms, get
an existing one, e.g., contained in OpenSSL, instead of re-doing it from
scratch (and likely getting it wrong in the process).

You could download the source for bc, which compiles on Linux, but could
probably be ported:
http://directory.fsf.org/bc.html

Good Luck!
Rich

Skybuck Flying · Mar 7, 2007

It's not only about reinventing the wheel, it's also about
efficiency/speed/performance/easy of use even.

A CPU should be able to do it faster in hardware then a software library...

Bye,
Skybuck.

shoppa@trailing-edge.com · Mar 7, 2007

It's not only about reinventing the wheel, it's also about
efficiency/speed/performance/easy of use even.

Highest efficiency (in terms of programmer hours) is almost always
achieved by pulling in an on off-the-shelf library that you know
works

.

There are exceptions, and the market finds them: crypto accelerators
for example. The market even finds things that are not truly
exceptions!

A CPU should be able to do it faster in hardware then a software library...

Apply this for each of the bazillion things you might want your CPU to
do, and you end up with a CPU with a bazillion instructions.

Tim.

MitchAlsup · Mar 7, 2007

A CPU should be able to do it faster in hardware then a software library...

What makes you think this?

Nick Maclaren · Mar 7, 2007

|>
|> > A CPU should be able to do it faster in hardware then a software library...
|>
|> What makes you think this?

He is right, but not VERY right.

The standard ISAs provide poor primitives for the purpose. For multiple
precision integer arithmetic, there are only a few extra primitives
needed (e.g. NxN->2N multiplication, add with carry etc.) For what I
am doing (software binary floating-point), the ISA and HLL primitives
stink. The actual arithmetic isn't the problem, as it is the same as
for integers, but the housekeeping, normalisation, rounding etc. are
a right royal pain in the, er, neck.

My estimates from many years back, which are probably still true, is
that a decent set of primitives would enable a software floating-point
to be 3-5 times slower than all-hardware (for a plausible precision).
But current ones make it a LOT slower than that.

However, when one is talking large integer arithmetic (hundreds of
digits), all one needs is a small number of extra primitives, and there
is no difficulty in doing it in software. Even without them, it is
very rarely more than 3 times slower than the best hardware could be.

Regards,
Nick Maclaren.

robertwessel2@yahoo.com · Mar 8, 2007

which is basically how humans would do it; digit by digit

Only on the really slow implementations. While bignum addition and
subtraction has to be done digit by digit (for some definition of
digit), you do *not* want to do division and multiplication the way
you learned it in school. Karatsuba multiplication (O(n**1.6) instead
of O(n**2) for the classical algorithm) is useful for medium sized
numbers, and FFT (O(n ln(n) ln(ln(n)))) for large numbers. Division
is usually be preformed by approximating a result and then refining it
with an appropriate number of iterations of Newton-Raphson (which just
needs multiplications).

John Larkin · Mar 8, 2007

It's not only about reinventing the wheel, it's also about
efficiency/speed/performance/easy of use even.

A CPU should be able to do it faster in hardware then a software library...

Bye,
Skybuck.

But there are few problems that need more than 64 or 80 bit
floating-point math, and fewer still that need it fast.

John

Richard · Mar 8, 2007

[Please do not mail me a copy of your followup]

John Larkin <[email protected]> spake the secret code

But there are few problems that need more than 64 or 80 bit
floating-point math, and fewer still that need it fast.

....which makes it perfect for reconfigurable computing.

As I understand it, that's what people are doing these days. Where
previously you would have been required to develop a custom ASIC at
great NRE cost, nowadays you can develop custom hardware using a
relatively inexpensive FPGA peripheral card. Configuring the FPGA
circuitry is very much like writing software when you use something
like VHDL or Verilog.

In fact, you can even get these in a portable form factor -- there's
an FPGA card you can buy that plugs into the Nintendo GameBoy Advance
and it comes with a complete tool chain included. All for about $200
from Charmed Labs <http://www.charmedlabs.com> as the Xport 2.0 which
has 150,000 gates.

Quadibloc · Mar 8, 2007

and FFT (O(n ln(n) ln(ln(n)))) for large numbers.

Also known as Schõnhage-Strassen multiplication.

John Savard

Terje Mathisen · Mar 8, 2007

John said:
Yeah. I posted a problem a while back, about needing a fast 64-bit
binary-to-ascii conversion routine. Someone made a suggestion that
helped solve the problem for me (divide by 1e9 and convert the chunks

That was me, I use that approach in all my binary arbitrary precision libs.

separately) and when I wrote the code I realized, for the first time,
how "long division" actually works.

With me it was the opposite, sort of: The _very_ first non-obligatory
program I wrote was a (Fortran-2) implementation of arbitrary precision,
written to calculate as many digits of pi as my student cpu-seconds
budget would allow.

OTOH, I used to spend my high school physics classes playing around with
Taylor series and arbitrary precision algorithms on paper, just because
that allowed me to calculate with better precision than my big 5-digit
log table book would allow.

Terje

Terje Mathisen · Mar 8, 2007

Skybuck said:
It's not only about reinventing the wheel, it's also about
efficiency/speed/performance/easy of use even.

A CPU should be able to do it faster in hardware then a software library...

That is only true of the core building blocks take so short a time to
run that the decoding time becomes a limiting factor.

With pipelined cpus lots of algorithms does in fact run faster in sw
than the corresponding high-level opcode, even on those cpus which did
bother to implement these CISCy operations.

(Look for VAX polynomial evaluation, or x86 BCD and string opodes.)

Terje

Moore's Lobby Podcast

Menu

Categories

Platforms

Content

Connect With Us

Network

Kids can do math better than x86 cpu's.

Kids can do math better than x86 cpu's.

Skybuck Flying

Tim Shoppa

Jan Vorbrüggen

John Larkin

[email protected]

Nick Maclaren

John Larkin

John Larkin

Jim Thompson

Richard The Dreaded Libertarian

Skybuck Flying

[email protected]

MitchAlsup

Nick Maclaren

[email protected]

John Larkin

Richard

Quadibloc

Terje Mathisen

Terje Mathisen

Similar threads