Terje,
Your x86 BCD reference jogs my memory back to 80's on BYTE
Magazine's BIX (Byte Information eXchange) teleconferencing system
(cosy).
Some of us processor-bigots, in the CPU or CPUs confernece, had an
informal my-cpu-is-better-than-yours speed competition involving
multiplication of 72-bit unsigned binary integers. The 9 bytes were
picked, in theory, so as not to give arbitrary advantage to cpus of
particular word sizes.
My entry for Apple /// (8-bit cpu Mostek 6502B) just used the
lowest level bit bit-test/add/shifting/carry instructions. The code
first determined the 6 10-byte results from one of the multipliers
(the one on the left {smile}) when multiplied by 2, 4, 8, 16, 32, 64
and 128. This actually meant 8 results were precalculated in that
multiplying by 0 was obvious and multiplying by 1 was already given.
The 6 values were done with simple bitshifts/carry/add. Once that was
done then the other multipler was processed bit-by-bit right-to-left
and answer adjusted/accumulated accordingly just like a school child
would do in decimal on paper. Choosing which multiplier (OK,
multiplier, multiplicand) to process based on the lower number of 1
was sometimes useful.
This simple-minded low-level approach actually outperformed other
cpus including 8086, 8088, z80, 8080, etc. lovingly handcrafted in
assembler making use of, ahem, higher instructions.
The code used maybe a half dozen rather simple MACROs. Of
course, the fact that there was ample room in the 6502B's "zero page
of 256 bytes) to do all the work was a big boost. In effect, the zero
page's faster access meant you could treat it as 256 registers.
Cheers, - Jim
Jim Keohane, Multi-Platforms, Inc.