Connect with us

Kids can do math better than x86 cpu's.

Discussion in 'Electronic Design' started by Skybuck Flying, Mar 7, 2007.

Scroll to continue with content
  1. Hello,

    In principle it should be possible to add, substract, multiple or divide a 1
    billion digits with a 1 billion digits, even kids learn this in school with
    the standard math tables and math algorithms for addition, substraction,
    multiple and division.

    These algorithms can be applied to arbitrary length numbers. (For positional
    number systems)

    (I think these algorithms were invented thousands of years ago by the
    Greeks).

    However x86 cpu's today can not do with kids, Greeks or common people can do
    and that is do math with arbitrary long numbers.

    x86 cpu's only do math with a limited/fixed number of digits/bits... which
    is quite disappointing.

    Why is this ? Sigh.

    Some possibilities:

    1. Add new instructions which can multiple arbitary long numbers in memory.

    (Example. multiply memory_pointer1, memory_length1, memory_pointer2,
    memory_length2, memory_pointer3, memory_length3)
    (something like that... could even be a 2 memory location operation instead
    of 3)
    (A * B = C)

    or

    2. Add high level chips which can use the low level chips to do these more
    complex math algorithms...

    The point is: this would prevent programmers from having to reinvent the old
    Greek math wheel, saving them time ;) and ofcourse giving them easy coding
    platform/more productivity :D

    Just some thoughts.

    Bye,
    Skybuck.
     
  2. Tim Shoppa

    Tim Shoppa Guest

    There are numerous CPU's out there that deal with arbitrary precision
    arithmetic. Typically the instruction set was optimized for COBOL :),
    which should give you a hint that you want to look at ISA's from the
    60's and 70's to see examples.

    In the real world there are very few needs for so many digits of
    precision.

    But when the needs do arrive, only the stupid programmers re-invent
    the wheel. The others use commonly available algorithms already out
    there. The GNU multi-precision library and the C bignum library come
    to mind immediately.

    And when the wheel has been re-invented, it's been re-invented for the
    better: for example there are better algorithms for multiplication,
    these algorithms borrowing ideas from FFT's. It should not be a
    surprise that a lot of computers have already been optimized for FFT
    performance :). Google for "Schoenhage-Strassen algorithm".

    Tim.
     
  3. And when the wheel has been re-invented, it's been re-invented for the
    And again, if and when you need an implementation of these algorithms, get an
    existing one, e.g., contained in OpenSSL, instead of re-doing it from scratch
    (and likely getting it wrong in the process).

    Jan
     
  4. John  Larkin

    John Larkin Guest

    You really should do a little googling before you go off on rants like
    this. There are lots of extended math libraries and applications
    around, both binary and decimal. Some will compute to arbitrary
    precision. This issue was addressed decades ago.

    The most common math functions, 32 bit integer operations and floats
    out to 80 bits, are executed in x86 hardware. The oddball stuff, like
    arbitrary precision and decimal floats, is done as subroutines.

    John
     
  5. Guest

    which is basically how humans would do it; digit by digit

    -Lasse
     
  6. |>
    |> In the real world there are very few needs for so many digits of
    |> precision.
    |>
    |> But when the needs do arrive, only the stupid programmers re-invent
    |> the wheel. The others use commonly available algorithms already out
    |> there. The GNU multi-precision library and the C bignum library come
    |> to mind immediately.

    That's too strong. I am doing it at present, for good and sufficient
    reasons. I agree that your statement is true 99% of the time, perhaps
    more :)

    To the best of my knowledge, there is nothing that comes close to what
    I am currently doing, though there has been in the past. But I baulk
    at trying to FIND the old codes, though I would happily convert one of
    them if I could do so. Where would YOU look up things that dropped out
    of common practice in the 1960s?

    |> And when the wheel has been re-invented, it's been re-invented for the
    |> better: for example there are better algorithms for multiplication,
    |> these algorithms borrowing ideas from FFT's. It should not be a
    |> surprise that a lot of computers have already been optimized for FFT
    |> performance :). Google for "Schoenhage-Strassen algorithm".

    Not at all. It has very often been invented worse. I despair at the
    way that computer scientists reinvent technologies that predate their
    subject - and then get the number of sides on the wheel wrong :-(

    For example, how would YOU do division and modular multiplication, how
    do those packages do it, and are you quite sure that is the best way?
    Well, let's not go there - it would take all week :)


    Regards,
    Nick Maclaren.
     
  7. John  Larkin

    John Larkin Guest


    Yeah. I posted a problem a while back, about needing a fast 64-bit
    binary-to-ascii conversion routine. Someone made a suggestion that
    helped solve the problem for me (divide by 1e9 and convert the chunks
    separately) and when I wrote the code I realized, for the first time,
    how "long division" actually works.

    John
     
  8. John  Larkin

    John Larkin Guest

    From "Dreaming in Code" : The worst thing about programmers is that
    they love to write code.


    John
     
  9. Jim Thompson

    Jim Thompson Guest

    They're back to teaching "casting out 9's" in the schools here.
    Except they gave it some different name to confuse grandparents ;-)

    ...Jim Thompson
     
  10. You could download the source for bc, which compiles on Linux, but could
    probably be ported:
    http://directory.fsf.org/bc.html

    Good Luck!
    Rich
     
  11. It's not only about reinventing the wheel, it's also about
    efficiency/speed/performance/easy of use even.

    A CPU should be able to do it faster in hardware then a software library...

    Bye,
    Skybuck.
     
  12. Guest

    Highest efficiency (in terms of programmer hours) is almost always
    achieved by pulling in an on off-the-shelf library that you know
    works :).

    There are exceptions, and the market finds them: crypto accelerators
    for example. The market even finds things that are not truly
    exceptions!
    Apply this for each of the bazillion things you might want your CPU to
    do, and you end up with a CPU with a bazillion instructions.

    Tim.
     
  13. MitchAlsup

    MitchAlsup Guest

    What makes you think this?
     
  14. |>
    |> > A CPU should be able to do it faster in hardware then a software library...
    |>
    |> What makes you think this?

    He is right, but not VERY right.

    The standard ISAs provide poor primitives for the purpose. For multiple
    precision integer arithmetic, there are only a few extra primitives
    needed (e.g. NxN->2N multiplication, add with carry etc.) For what I
    am doing (software binary floating-point), the ISA and HLL primitives
    stink. The actual arithmetic isn't the problem, as it is the same as
    for integers, but the housekeeping, normalisation, rounding etc. are
    a right royal pain in the, er, neck.

    My estimates from many years back, which are probably still true, is
    that a decent set of primitives would enable a software floating-point
    to be 3-5 times slower than all-hardware (for a plausible precision).
    But current ones make it a LOT slower than that.

    However, when one is talking large integer arithmetic (hundreds of
    digits), all one needs is a small number of extra primitives, and there
    is no difficulty in doing it in software. Even without them, it is
    very rarely more than 3 times slower than the best hardware could be.


    Regards,
    Nick Maclaren.
     
  15. Guest


    Only on the really slow implementations. While bignum addition and
    subtraction has to be done digit by digit (for some definition of
    digit), you do *not* want to do division and multiplication the way
    you learned it in school. Karatsuba multiplication (O(n**1.6) instead
    of O(n**2) for the classical algorithm) is useful for medium sized
    numbers, and FFT (O(n ln(n) ln(ln(n)))) for large numbers. Division
    is usually be preformed by approximating a result and then refining it
    with an appropriate number of iterations of Newton-Raphson (which just
    needs multiplications).
     
  16. John  Larkin

    John Larkin Guest

    But there are few problems that need more than 64 or 80 bit
    floating-point math, and fewer still that need it fast.

    John
     
  17. Richard

    Richard Guest

    [Please do not mail me a copy of your followup]

    John Larkin <> spake the secret code
    ....which makes it perfect for reconfigurable computing.

    As I understand it, that's what people are doing these days. Where
    previously you would have been required to develop a custom ASIC at
    great NRE cost, nowadays you can develop custom hardware using a
    relatively inexpensive FPGA peripheral card. Configuring the FPGA
    circuitry is very much like writing software when you use something
    like VHDL or Verilog.

    In fact, you can even get these in a portable form factor -- there's
    an FPGA card you can buy that plugs into the Nintendo GameBoy Advance
    and it comes with a complete tool chain included. All for about $200
    from Charmed Labs <http://www.charmedlabs.com> as the Xport 2.0 which
    has 150,000 gates.
     
  18. Quadibloc

    Quadibloc Guest

    Also known as Schõnhage-Strassen multiplication.

    John Savard
     
  19. That was me, I use that approach in all my binary arbitrary precision libs.
    :)

    With me it was the opposite, sort of: The _very_ first non-obligatory
    program I wrote was a (Fortran-2) implementation of arbitrary precision,
    written to calculate as many digits of pi as my student cpu-seconds
    budget would allow.

    OTOH, I used to spend my high school physics classes playing around with
    Taylor series and arbitrary precision algorithms on paper, just because
    that allowed me to calculate with better precision than my big 5-digit
    log table book would allow. :)

    Terje
     
  20. That is only true of the core building blocks take so short a time to
    run that the decoding time becomes a limiting factor.

    With pipelined cpus lots of algorithms does in fact run faster in sw
    than the corresponding high-level opcode, even on those cpus which did
    bother to implement these CISCy operations.

    (Look for VAX polynomial evaluation, or x86 BCD and string opodes.)

    Terje
     
Ask a Question
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.
Electronics Point Logo
Continue to site
Quote of the day

-