Maker Pro
Maker Pro

Kids can do math better than x86 cpu's.

S

Skybuck Flying

Jan 1, 1970
0
Hello,

In principle it should be possible to add, substract, multiple or divide a 1
billion digits with a 1 billion digits, even kids learn this in school with
the standard math tables and math algorithms for addition, substraction,
multiple and division.

These algorithms can be applied to arbitrary length numbers. (For positional
number systems)

(I think these algorithms were invented thousands of years ago by the
Greeks).

However x86 cpu's today can not do with kids, Greeks or common people can do
and that is do math with arbitrary long numbers.

x86 cpu's only do math with a limited/fixed number of digits/bits... which
is quite disappointing.

Why is this ? Sigh.

Some possibilities:

1. Add new instructions which can multiple arbitary long numbers in memory.

(Example. multiply memory_pointer1, memory_length1, memory_pointer2,
memory_length2, memory_pointer3, memory_length3)
(something like that... could even be a 2 memory location operation instead
of 3)
(A * B = C)

or

2. Add high level chips which can use the low level chips to do these more
complex math algorithms...

The point is: this would prevent programmers from having to reinvent the old
Greek math wheel, saving them time ;) and ofcourse giving them easy coding
platform/more productivity :D

Just some thoughts.

Bye,
Skybuck.
 
T

Tim Shoppa

Jan 1, 1970
0
The point is: this would prevent programmers from having to reinvent the old
Greek math wheel, saving them time ;) and ofcourse giving them easy coding
platform/more productivity :D

There are numerous CPU's out there that deal with arbitrary precision
arithmetic. Typically the instruction set was optimized for COBOL :),
which should give you a hint that you want to look at ISA's from the
60's and 70's to see examples.

In the real world there are very few needs for so many digits of
precision.

But when the needs do arrive, only the stupid programmers re-invent
the wheel. The others use commonly available algorithms already out
there. The GNU multi-precision library and the C bignum library come
to mind immediately.

And when the wheel has been re-invented, it's been re-invented for the
better: for example there are better algorithms for multiplication,
these algorithms borrowing ideas from FFT's. It should not be a
surprise that a lot of computers have already been optimized for FFT
performance :). Google for "Schoenhage-Strassen algorithm".

Tim.
 
J

Jan Vorbrüggen

Jan 1, 1970
0
And when the wheel has been re-invented, it's been re-invented for the
better: for example there are better algorithms for multiplication,
these algorithms borrowing ideas from FFT's. It should not be a
surprise that a lot of computers have already been optimized for FFT
performance :). Google for "Schoenhage-Strassen algorithm".

And again, if and when you need an implementation of these algorithms, get an
existing one, e.g., contained in OpenSSL, instead of re-doing it from scratch
(and likely getting it wrong in the process).

Jan
 
J

John Larkin

Jan 1, 1970
0
Hello,

In principle it should be possible to add, substract, multiple or divide a 1
billion digits with a 1 billion digits, even kids learn this in school with
the standard math tables and math algorithms for addition, substraction,
multiple and division.

These algorithms can be applied to arbitrary length numbers. (For positional
number systems)

(I think these algorithms were invented thousands of years ago by the
Greeks).

However x86 cpu's today can not do with kids, Greeks or common people can do
and that is do math with arbitrary long numbers.

x86 cpu's only do math with a limited/fixed number of digits/bits... which
is quite disappointing.

Why is this ? Sigh.

Some possibilities:

1. Add new instructions which can multiple arbitary long numbers in memory.

(Example. multiply memory_pointer1, memory_length1, memory_pointer2,
memory_length2, memory_pointer3, memory_length3)
(something like that... could even be a 2 memory location operation instead
of 3)
(A * B = C)

or

2. Add high level chips which can use the low level chips to do these more
complex math algorithms...

The point is: this would prevent programmers from having to reinvent the old
Greek math wheel, saving them time ;) and ofcourse giving them easy coding
platform/more productivity :D

Just some thoughts.

Bye,
Skybuck.

You really should do a little googling before you go off on rants like
this. There are lots of extended math libraries and applications
around, both binary and decimal. Some will compute to arbitrary
precision. This issue was addressed decades ago.

The most common math functions, 32 bit integer operations and floats
out to 80 bits, are executed in x86 hardware. The oddball stuff, like
arbitrary precision and decimal floats, is done as subroutines.

John
 
You really should do a little googling before you go off on rants like
this. There are lots of extended math libraries and applications
around, both binary and decimal. Some will compute to arbitrary
precision. This issue was addressed decades ago.

The most common math functions, 32 bit integer operations and floats
out to 80 bits, are executed in x86 hardware. The oddball stuff, like
arbitrary precision and decimal floats, is done as subroutines.

John- Hide quoted text -

- Show quoted text -

which is basically how humans would do it; digit by digit

-Lasse
 
N

Nick Maclaren

Jan 1, 1970
0
|>
|> In the real world there are very few needs for so many digits of
|> precision.
|>
|> But when the needs do arrive, only the stupid programmers re-invent
|> the wheel. The others use commonly available algorithms already out
|> there. The GNU multi-precision library and the C bignum library come
|> to mind immediately.

That's too strong. I am doing it at present, for good and sufficient
reasons. I agree that your statement is true 99% of the time, perhaps
more :)

To the best of my knowledge, there is nothing that comes close to what
I am currently doing, though there has been in the past. But I baulk
at trying to FIND the old codes, though I would happily convert one of
them if I could do so. Where would YOU look up things that dropped out
of common practice in the 1960s?

|> And when the wheel has been re-invented, it's been re-invented for the
|> better: for example there are better algorithms for multiplication,
|> these algorithms borrowing ideas from FFT's. It should not be a
|> surprise that a lot of computers have already been optimized for FFT
|> performance :). Google for "Schoenhage-Strassen algorithm".

Not at all. It has very often been invented worse. I despair at the
way that computer scientists reinvent technologies that predate their
subject - and then get the number of sides on the wheel wrong :-(

For example, how would YOU do division and modular multiplication, how
do those packages do it, and are you quite sure that is the best way?
Well, let's not go there - it would take all week :)


Regards,
Nick Maclaren.
 
J

John Larkin

Jan 1, 1970
0
which is basically how humans would do it; digit by digit

-Lasse


Yeah. I posted a problem a while back, about needing a fast 64-bit
binary-to-ascii conversion routine. Someone made a suggestion that
helped solve the problem for me (divide by 1e9 and convert the chunks
separately) and when I wrote the code I realized, for the first time,
how "long division" actually works.

John
 
J

John Larkin

Jan 1, 1970
0
Not at all. It has very often been invented worse. I despair at the
way that computer scientists reinvent technologies that predate their
subject - and then get the number of sides on the wheel wrong :-(

From "Dreaming in Code" : The worst thing about programmers is that
they love to write code.


John
 
J

Jim Thompson

Jan 1, 1970
0
Yeah. I posted a problem a while back, about needing a fast 64-bit
binary-to-ascii conversion routine. Someone made a suggestion that
helped solve the problem for me (divide by 1e9 and convert the chunks
separately) and when I wrote the code I realized, for the first time,
how "long division" actually works.

John

They're back to teaching "casting out 9's" in the schools here.
Except they gave it some different name to confuse grandparents ;-)

...Jim Thompson
 
R

Richard The Dreaded Libertarian

Jan 1, 1970
0
And again, if and when you need an implementation of these algorithms, get
an existing one, e.g., contained in OpenSSL, instead of re-doing it from
scratch (and likely getting it wrong in the process).

You could download the source for bc, which compiles on Linux, but could
probably be ported:
http://directory.fsf.org/bc.html

Good Luck!
Rich
 
S

Skybuck Flying

Jan 1, 1970
0
It's not only about reinventing the wheel, it's also about
efficiency/speed/performance/easy of use even.

A CPU should be able to do it faster in hardware then a software library...

Bye,
Skybuck.
 
It's not only about reinventing the wheel, it's also about
efficiency/speed/performance/easy of use even.

Highest efficiency (in terms of programmer hours) is almost always
achieved by pulling in an on off-the-shelf library that you know
works :).

There are exceptions, and the market finds them: crypto accelerators
for example. The market even finds things that are not truly
exceptions!
A CPU should be able to do it faster in hardware then a software library...

Apply this for each of the bazillion things you might want your CPU to
do, and you end up with a CPU with a bazillion instructions.

Tim.
 
N

Nick Maclaren

Jan 1, 1970
0
|>
|> > A CPU should be able to do it faster in hardware then a software library...
|>
|> What makes you think this?

He is right, but not VERY right.

The standard ISAs provide poor primitives for the purpose. For multiple
precision integer arithmetic, there are only a few extra primitives
needed (e.g. NxN->2N multiplication, add with carry etc.) For what I
am doing (software binary floating-point), the ISA and HLL primitives
stink. The actual arithmetic isn't the problem, as it is the same as
for integers, but the housekeeping, normalisation, rounding etc. are
a right royal pain in the, er, neck.

My estimates from many years back, which are probably still true, is
that a decent set of primitives would enable a software floating-point
to be 3-5 times slower than all-hardware (for a plausible precision).
But current ones make it a LOT slower than that.

However, when one is talking large integer arithmetic (hundreds of
digits), all one needs is a small number of extra primitives, and there
is no difficulty in doing it in software. Even without them, it is
very rarely more than 3 times slower than the best hardware could be.


Regards,
Nick Maclaren.
 
which is basically how humans would do it; digit by digit


Only on the really slow implementations. While bignum addition and
subtraction has to be done digit by digit (for some definition of
digit), you do *not* want to do division and multiplication the way
you learned it in school. Karatsuba multiplication (O(n**1.6) instead
of O(n**2) for the classical algorithm) is useful for medium sized
numbers, and FFT (O(n ln(n) ln(ln(n)))) for large numbers. Division
is usually be preformed by approximating a result and then refining it
with an appropriate number of iterations of Newton-Raphson (which just
needs multiplications).
 
J

John Larkin

Jan 1, 1970
0
It's not only about reinventing the wheel, it's also about
efficiency/speed/performance/easy of use even.

A CPU should be able to do it faster in hardware then a software library...

Bye,
Skybuck.

But there are few problems that need more than 64 or 80 bit
floating-point math, and fewer still that need it fast.

John
 
R

Richard

Jan 1, 1970
0
[Please do not mail me a copy of your followup]

John Larkin <[email protected]> spake the secret code
But there are few problems that need more than 64 or 80 bit
floating-point math, and fewer still that need it fast.

....which makes it perfect for reconfigurable computing.

As I understand it, that's what people are doing these days. Where
previously you would have been required to develop a custom ASIC at
great NRE cost, nowadays you can develop custom hardware using a
relatively inexpensive FPGA peripheral card. Configuring the FPGA
circuitry is very much like writing software when you use something
like VHDL or Verilog.

In fact, you can even get these in a portable form factor -- there's
an FPGA card you can buy that plugs into the Nintendo GameBoy Advance
and it comes with a complete tool chain included. All for about $200
from Charmed Labs <http://www.charmedlabs.com> as the Xport 2.0 which
has 150,000 gates.
 
T

Terje Mathisen

Jan 1, 1970
0
John said:
Yeah. I posted a problem a while back, about needing a fast 64-bit
binary-to-ascii conversion routine. Someone made a suggestion that
helped solve the problem for me (divide by 1e9 and convert the chunks

That was me, I use that approach in all my binary arbitrary precision libs.
separately) and when I wrote the code I realized, for the first time,
how "long division" actually works.

:)

With me it was the opposite, sort of: The _very_ first non-obligatory
program I wrote was a (Fortran-2) implementation of arbitrary precision,
written to calculate as many digits of pi as my student cpu-seconds
budget would allow.

OTOH, I used to spend my high school physics classes playing around with
Taylor series and arbitrary precision algorithms on paper, just because
that allowed me to calculate with better precision than my big 5-digit
log table book would allow. :)

Terje
 
T

Terje Mathisen

Jan 1, 1970
0
Skybuck said:
It's not only about reinventing the wheel, it's also about
efficiency/speed/performance/easy of use even.

A CPU should be able to do it faster in hardware then a software library...

That is only true of the core building blocks take so short a time to
run that the decoding time becomes a limiting factor.

With pipelined cpus lots of algorithms does in fact run faster in sw
than the corresponding high-level opcode, even on those cpus which did
bother to implement these CISCy operations.

(Look for VAX polynomial evaluation, or x86 BCD and string opodes.)

Terje
 
Top