Connect with us

ugly math

Discussion in 'Electronic Design' started by John Larkin, Nov 12, 2006.

Scroll to continue with content
  1. John  Larkin

    John Larkin Guest

    OK, I've got a 64-bit unsigned binary number, in a pair of 32-bit
    registers, in an MC68332 cpu. Its maximum value is 1e13, which is 10
    seconds measured in picoseconds, so the MS longword can only get up to
    about 2328 or something like that. I'm looking for an efficient way to
    turn this into a decimal ASCII string, so basicly need to convert it
    into bcd. The bcd output will need to have 14 digits. "Numerical
    Recipes" doesn't address mundane problems like this.

    One way is to create a BCD buffer, zero it, and map each nibble of the
    input through a 16-entry bcd lookup table, and sum all the bcd terms.
    The 68332 has a fairly efficient BCD add instruction, so adding a bcd
    table entry to the running sum isn't ghastly. The lookup table needs
    12 entries (one per active nibble) each holding 16 bcd numbers, each
    being a 14 digit bcd number, namely 8 bytes, grand total 1536 bytes.
    Creating this table would be a minor nuisance; I'd need a
    jillion-digit calculator. Well, PowerBasic does have a 64-bit integer
    type, so maybe I can write a little Basic program to gen the tables.
    There's even a BCD data type, up to 18 digits.

    One nice algorithm for binary-to-bcd is to just keep dividing by 10
    and posting the remainders as the bcd digits, in reverse order of
    course. The 68332 can divide a 64-bit value by 10, with a remainder,
    but the quotient is limited to 32 bits, not enough. I could use this
    mode to convert the low 32-bit part to bcd, then use four nibble table
    lookups on the high half to finish things off. Since I also need a
    32-bit-to-bcd thing somewhere else, maybe that isn't such a gross
    idea.

    If the raw input could be scaled to 64-bit fractional format,
    successive multiplies by 10 will pump out digits, ms digit first.
    Gotta think about that one... it may have embarassing rounding
    problems.

    Or maybe I could program our Xilinx FPGA to do a hardware divide,
    divide a 64 bit integer by 10 and feed me the remainders. Hell, maybe
    it could do the entire ascii conversion in hardware.

    Oh, I'm programming in bare-metal assembly.

    Anybody have any tricks here?

    (You comp.arch guys take it easy on me, please. I'm just a simple
    engineer.)

    John
     
  2. Something like this:-

    Divide the 64 bit value by 1 million.
    Then
    Convert the quotient and remainder to ascii strings.

    (Divide by 10, push remainder, divide quotient by 10, push remainder, ..., store quotient, pop remainder, store, pop remainder,
    store ...)
     
  3. Andrew Holme

    Andrew Holme Guest

    I think you can use the same trick which the FORTH word M/MOD uses. This
    divides a 32-bit value by a 16-bit value, giving a 32-bit quotient and
    16-bit remainder; but it does it using the U/ primitive, which is 32/16 =
    16q+16r.


    Say you want to work out (a + 2^32*b) / n using your 64/32 --> 32q, 32r
    instruction

    Divide your high word first (set the top 32-bits to zero and move b into the
    lower 32)
    b/n = q1 + r1/n

    Now do the low word, adding in the remainder from the above:
    (a + 2^32*r1) / n = q2 + r2/n

    You can now form a 64-bit quotient and 32-bit remainder from the two
    results:
    (a + 2^32*b) / n = 2^32q1 + q2 + r2/n



    Here's the code in FORTH:

    $Colon MDivMod, 'M/MOD'
    ; Unsigned 32/16 -> 32q 16r
    ; >R 0 R U/ ROT SWAP R> U/ ROT SWAP ( l h d -- l h r )
    ;
    ;>R l h d
    ;0 l h 0 d
    ;R l h 0 d d
    ;U/ l q r d
    ;ROT q r l d
    ;SWAP q l r d
    ;R> q l r d
    ;U/ q1H q2L r2
    ;ROT q2L r2 q1H
    ;SWAP q2L q1H r2
     
  4. John  Larkin

    John Larkin Guest

    Gadzooks, my 68332 can do that beautifully. That splits the number
    into two chunks I can process with the available 64/32 divide opcodes.

    Don't need no stinkin' pushes and pops... I've got 16 registers!

    Thanks!

    John
     
  5. Thanks,
    Interesting. Maybe I'll write the (32 bit) code in Macro-11 for reference later on.
    Should be useful for microcontrollers with h/w integer divide
    Richard
     
  6. Arlet

    Arlet Guest

    Given the 64/32 bit divide, this is the best solution. Instead of
    storing all the BCD nibbles on the stack, or in registers, just write
    them to memory as you produce them. The trick is to write them
    'backwards'. Set up an address register with the end of your ASCII
    buffer, write a zero byte with autodecrement, and write each nibble
    with autodecrement. When you're done, the register holds the start of
    the string in the correct order, terminated by zero.
     
  7. Robert Baer

    Robert Baer Guest

    Attached is a FORTRAN program that converts internal binary back to
    ASCII decimal, for a FHT (Hartley) multi-million digit multiply routine.
    As such, the input (large) number was "sampled" into 7-digit or less
    length sequences, converted to binary, then FHT, convolve, inverse FHT,
    and this used to convert back to ASCII in the same-sized sequences.
    The secret to this speedy code (attached) is a 579Kbyte look-up table
    (actually two tables ASK4 and ASK5) placed in (labelled) common by an
    ASM "program" which i could send to you if you wish.



    CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
    C C
    C SUBROUTINE BIN2ASC(G,An,ASK,IS10) C
    C G: input numeric array from multiply C
    C An: output char array, each element ISMP long C
    C ASK: input ASCII lookup array (ASK4 or ASK5) C
    C IS10: output, =1 if result overflows, is == 10^ISMP C
    C *Converts internal format to ASCII, 4-7 characters. C
    C 10/29/98 C
    C C
    CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

    SUBROUTINE BIN2ASC(G,An,ASK,IS10)

    c$noreference
    C * Common to all programs *
    IMPLICIT NONE
    COMMON IP,N,N2,N4,N7,IP7,ISMP,JFLAG,TCA,TCB,IRecSz1,IRecSz2,
    1 IRecSz3
    C Table lookup numeric to ASCII, 4 & 5 digits via ASCII arrays
    C Note: these arrays are filled in DGROUP.asm data declarations
    COMMON /NAM/ ASK4,ASK5
    INTEGER*4 IP,N,N2,N4,N7,IP7,ISMP,JFLAG,IRecSz1,IRecSz2,IRecSz3
    CHARACTER TCA*1,TCB*1,ASK4(0:9999)*4,ASK5(0:99999)*5
    c$reference

    INTEGER*4 I,IS10,T,U
    CHARACTER An*(*)(0:N7),ASK*(*)(0:10**ISMP-1),TMP*4
    DOUBLE PRECISION G(0:N7)

    C Table lookup numeric to ASCII, 4 & 5 digits via ASCII array ASK
    ! get integer overflow somewhere..
    N2=N4+N4
    IF (INT(G(N7)/10**ISMP) .EQ. 1) IS10=1 ! yup; fix exponent

    SELECT CASE (ISMP)
    CASE (3)
    DO I=0, N7 ! uses ASK4 or ASK5
    TMP=ASK(INT(G(I)))
    An(N7-I)=TMP(2:4)
    END DO
    CASE (4,5)
    DO I=0, N7 ! uses ASK4 or ASK5
    An(N7-I)=ASK(INT(G(I)))
    END DO
    CASE (6) ! uses ASK5 for top 5 digits
    DO I=0, N7 ! 6 digit table would take 6 megs
    T=INT(G(I)*1.0D-1) ! remove divide
    U=G(I)-T*10 ! remove modulo; 10 percent improvement
    An(N7-I)=ASK(T)//CHAR(U+48)
    END DO
    CASE (7) ! uses ASK4
    DO I=0, N7
    T=INT(G(I)*1.0D-4) ! remove divide
    U=G(I)-T*10000 ! remove modulo; 10 percent improvement
    TMP=ASK(T)
    An(N7-I)=TMP(2:4)//ASK(U)
    END DO
    END SELECT

    RETURN
    END
     
  8. Why does it need to be efficient? Code-space efficient or execution
    time efficient? For displaying purposes, a refresh rate of 10 times
    per second is good enough. A lookup table with 1,10,100,1000 upto
    1000000000000 converted to 64 bit values, and then subtract those
    values until you hit 0. Or a larger table, with 1,2,3,4,5,6,7,8,9,10,
    20,30,40,50 and so on, with compares and subs. Such table would
    hold 9x16x8 bytes (1152) of data.

    Or use a different hardware counter, to count the picoseconds,
    lsb part of it in binary, msb part of it in decimal.
     
  9. You can of course use a 64/32->32,32 (result/remainder) in a loop to
    divide an arbitrarily long bitstring by a 32-bit number.

    If this needs to be efficient, then it is much faster to do the long
    divisions by 1e9 (which fits in 32 bits, then use regular ascii
    conversion code to convert each 9-digit part into ascii/bcd digits.

    If time is _really_ critical, then you should search for my _fast_
    10-digit 32-bit binary to ascii conversion algorithm, which I posted on
    usenet (in asn asm newsgroup) many years ago. It is fast enough that AMD
    'borrowed' it without attribution for their code optimization manual! :-(

    Terje
     
  10. dave

    dave Guest

    You don't provide much context for this problem, so I'll just drop this
    into the thread. If you have a C compiler available, why not just port
    the Calc program (supports arbitrary precision arithmetic) to your
    computer? (http://isthe.com/chongo/tech/comp/calc/)

    Dave Feustel
     
  11. Arlet

    Arlet Guest

    Seems overkill to use arbitrary precision when only 64 bits are needed.
    Most modern C compilers have a 64 bit type, usually implemented as
    'long long'.

    In that case, a simple one line call:

    sprintf( asc, "%lld", bin )

    will do the requested conversion.

    Of course, OP said he wants to use assembler, presumably because the
    available memory does not allow a few kB for the library calls. For
    code efficiency, it's hard to beat the solution proposed by other
    poster where you do an initial 64/32 bit division by 1 million,
    followed by repeated 32-bit divisions by 10 to get the BCD digits.
     
  12. dave

    dave Guest

    I agree. I have found calc to be quite useful on OpenBSD, although it's
    probably not useful if the OP is developing code for an embedded system.
     
  13. colin

    colin Guest

    ha ive just implemented such a thing in c to display speed of my motor
    controller on a dspic30,
    but it uses modulus and divide per digit wich is probaly too slow for you if
    you are doing it in assembler as you probably need speed, else why do it in
    assembler.

    but heres an idea i just made up to do it quickly with no divides or
    multiplications but instead uses lots of code
    but its very fast i would think, only executing 4 compares 1 and + 1
    subtract per decimal digit (78 ops).

    its aranged like a binary decision tree, each decision spliting the problem
    into two aprox halves,

    psuedo code :-
    (well ok its c actually and totaly untested !)

    unsigned64 result=0;
    x=input;
    if (x>=5e12)
    if(x>=7e12)
    if(x>=9e12)
    {
    result&=9<<62;
    input-=9e12;
    }
    else
    if(x>=8e12)
    {
    result&=8<<62;
    input-=8e12;
    }
    else
    {
    result&=7<<62;
    input-=7e12;
    }
    else
    if(x>=6e12)
    {
    result&=6<<62;
    input-=6e12;
    }
    else
    {
    result&=5<<62;
    input-=5e12;
    }
    else if(x>2e12 etc ......

    if (x>=5e11)
    if(x>=7e11)
    if(x>=9e11)
    {
    result&=9<<48;
    input-=9e11;
    }
    etc ....


    Colin =^.^=
     
  14. colin

    colin Guest

    I spoted a few mistake already as I made some changes just before posting,
    x is an uneeded temp and shld be replaced by input,
    I assumed input is actualy <1e13
    you could make a macro for each digit and call it once for each digit
    so digital exponent would be text replaced by macro parameter, binary shift
    <- parameter*4

    Colin =^.^=
     
  15. John  Larkin

    John Larkin Guest

    My thoughts exactly.

    Here's my code, based on Richard's suggestion, but dividing by 1e9
    instead of 1e6. I haven't read it through enough, so there's likely a
    bug or so. I usually code bugs here and there, but catch then on
    read-throughs, before I actually run the code.

    ACORN is separately callable to convert 32-bit integers.

    John




    .SBTTL . ASCII CONVERSIONS

    ; GIVEN A 64-BIT TIME IN PICOSECONDS, CONVERT THAT INTO AN ASCII
    ; STRING IN OUR "BCD" STRING BUFFER.

    ; INPUT IS IN D0:D1, MAX VALUE 99,999,999,999,999, 14 DIGITS!

    ; WE'LL BREAK D0:D1 INTO TWO CHUNKS BY FIRST DIVIDING BY 1E9

    ATIME: MOVE.L # 1000000000, D3 ; ONE WHOLE BILLION! WOW.
    DIVU.Q D3, D0:D1 ; DO THAT DIVIDE

    ; D0 (REMAINDER) HOLDS THE LOW 9 DECIMAL DIGITS, 999,999,999 MAX
    ; D1 (QUOTIENT) NOW HOLDS THE NUMBER OF BILLIONS, 10,999 MAX

    MOVEA.W # BCX, A0 ; AIM AT END OF 'BCD' BUFFER, +1
    MOVE.W # 9-1, D7 ; REQUEST 9 DIGITS
    MOVE.L D0, D6 ; COPY CORRESPONDING CHUNK
    BSR.S ACORN ; CONVERT THEM, POKE INTO BUFFER

    MOVE.W # 5-1, D7 ; MAKE READY FOR BIG 5 DIGITS
    MOVE.L D1, D6 ; COPY THE GIGAS
    ; AND FALL INTO FINAL CONVERSION

    ; THIS LITTLE SUB CONVERTS THE CONTENTS OF D5 INTO
    ; AN ASCII STRING.

    ; D7 IS NUMBER OF DIGITS TO OUTPUT, DBF STYLE
    ; D6 HOLDS UNSIGNED LONG TO CONVERT
    ; A0 AIMS AT START, THE *LS* DIGIT OF THE STRING+1,
    ; MEANING WE'LL WORK BACKWARDS, -(A0) MODE

    ACORN: MOVE.L # 10, D3 ; RADIX CONSTANT FOR HUMANOIDS

    ACRID: CLR.L D5 ; MAKE INPUT INTO 64-BIT INT
    DIVU.Q D3, D5:D6 ; DIVIDE, REMAINDER IN D5
    ADDI.W # "0", D5 ; CONVERT TO ASCII 0..9
    MOVE.B D5, -(A0) ; POKE A NEW DIGIT

    DBF D7, ACRID ; AND LOOP
    RTS

    ; ON EXIT, A0 AIMS AT THE MS CHARACTER
     
  16. John  Larkin

    John Larkin Guest

    I'd love to see that; could you possibly repost here?

    Is there a competition for the *slowest* binary to decimal-ascii
    conversion? I think I may have written a candidate, for the 6802 uP, a
    while ago.

    John
     
  17. John  Larkin

    John Larkin Guest

    Partly for elegance. The box has a 38.4 kbaud serial interface, and
    users can interrogate it for current settings, so I would like to not
    be speed-limiting the communications. I guess it would be OK to do the
    64-bit binary to 14-char ASCII in a couple of milliseconds, which
    ain't hard on a 68332 no matter how you do it. The other efficiency is
    my coding time.

    I posted what looks like a nice routine, based on Richard's divide
    breakup, followed by using the 68332's very nice 64/32 divide with
    remainder opcode.

    I've done that, on machines without the divide opcode, but the divide
    thing just spits out ASCII digits in a 5-opcode loop, a few
    microseconds per character.

    I considered having my FPGA guy do the whole thing for me in VHDL,
    sort of as a math coprocessor, but the split divide thing looks fine.

    John
     
  18. John  Larkin

    John Larkin Guest

    Yeah, I expect the entire application (managing an embedded digital
    delay generator) to come in around 6 kilobytes or so, including serial
    command parser, 'help' text, FPGA loader, flash manager, and of course
    the hardware management.

    John
     
  19. one simple way is to have a powers of ten table:

    dq 1000000000000,10000000000,1000000000
    dq 10000000,1000000,1000000,100000,10000,1000,100,1

    Start by seeing how many times you can subtract the first entry--
    that's your top digit.
    continue doing the same for each succeeding table entry. Quite fast,
    and no divides needed.
     
  20. David Brown

    David Brown Guest

    I know you're using assembly, but I can't be bothered writing it out in
    assembly (certainly not tight and fast code) at this stage. But here's
    an idea in rough C, that you should be able to translate well enough.

    static const uint64 bcds[8][256] = {
    { 0, 0x0001, 0x0002, ..., 0x0255 },
    { 0, 0x00000256, 0x00000512, 0x00000768, ..., 0x00065535 },
    { 0, 0x00065536, 0x00131072, ..., 0x16711425 },
    ...
    { 0, 0x02814749 76710656, ... } // Overflows after 35th entry }
    { ... } // Row 8 is all overflows, and can be skipped altogether
    };


    uint64 intToBcd(uint64 x) {
    uint64 sum = 0;
    int i;
    uint8 *p = (uint8* &x) + 7; // Point to LSB

    for (i = 0; i < ; i++) {
    sum = bcd64add(sum, bcds[*p--]);
    };
    return sum;
    }

    The idea is to have 8 lookup tables, one for each byte placing in the
    64-bit number (excluding the msb, as that will certainly overflow - you
    could skip some of the other bytes too as you don't need the full
    64-bit). Each table gives the bcd value of each of the 256 values for a
    byte in that place. You decompose the original 64-bit int into bytes,
    and build up the bcd sum. If you want to save space, you could use 16
    4-bit lookup tables instead of 8 8-bit tables.
     
Ask a Question
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.
Electronics Point Logo
Continue to site
Quote of the day

-