Fixed point Vs Floating point

skilambi@gmail.com · Jun 15, 2008

Hi All,

I am trying to understand how fixed point and floating point are
implemented in hardware. This is what I understand so far

1) Fixed point means the decimal point is fixed. Floating point has
some bits set for mantissa and some for the exponent ie the decimal
point "floats".
2) Floating point has more dynamic range for the given number of bits
as compared to fixed.
3) Floating point is harder to implement in hardware than fixed point
but is closer to representing real world values.
4) Fixed point is better when ur application has power consumption
requirements but doesn care as much about the precision.

All of this is good. But i am trying to visualize fixed and floating
in hardware. What i am trying to ask is, in fixed point is there a
seprate datapath for fraction and for the the integer part. I wouldnt
think so because at the end of the day its just bits and its upto us
how we interpret them. I am sure fixed point and floating point
hardware are architecturally different.

So one good question to ask is...how is a fixed point multiplier
implemented and how is a floating point multiplier implemented. Can
someone give a real world processor example of each. It would be much
appreciated.

Frank Buss · Jun 15, 2008

So one good question to ask is...how is a fixed point multiplier
implemented and how is a floating point multiplier implemented. Can
someone give a real world processor example of each. It would be much
appreciated.

I don't know how it is implemented in hardware, but I know how you can
implement it in software. Below is an implementation I've written some time
ago for MIDP 1.0 (Java on mobile phones, because MIDP 1.0 doesn't have
float). The algorithms are described by Knuth in The Art of Computer
Programming.

I guess it can be implemented in hardware like this, too. There are some
tricks with fast hardware multipliers, try Google, which can be used for
fixed point, too.

public static int mul(int u, int v) {
int fu = u & 0x7FFFFF;
int eu = ((u >> 23) & 0xff) - 127;

int fv = v & 0x7FFFFF;
int ev = ((v >> 23) & 0xff) - 127;

if (u < 0)
fu = -fu;
if (v < 0)
fv = -fv;

return normalize(((long) fu) * ((long) fv), eu + ev);
}

Frank Buss · Jun 15, 2008

Tim said:
Where floating point _really_ starts to slow down is for subtraction,
and for exceptions.

I don't see why subtraction is slow, e.g. in my Java implementation (and in
Knuth's description) it is implemented with add, which works with shifts
and integer add and sub, only:

public static int add(int u, int v) {
int fu = u & 0x7FFFFF;
int eu = ((u >> 23) & 0xff) - 127;

int fv = v & 0x7FFFFF;
int ev = ((v >> 23) & 0xff) - 127;

if (fv == 0)
return u;
if (fu == 0)
return v;

if (u < 0)
fu = -fu;
if (v < 0)
fv = -fv;

if (eu < ev) {
if (ev - eu > 24)
return v;
fu >>= ev - eu;
return normalize(fu + fv, ev);
} else {
if (eu - ev > 24)
return u;
fv >>= eu - ev;
return normalize(fu + fv, eu);
}
}

public static int sub(int u, int v) {
return add(u, v ^ 0x80000000);
}

Just for completness, the normalize method, including the TODO

private static int pack(int f, int e) {
boolean s = f < 0;
if (s)
f = -f;
int z = f | ((e + 127) << 23);
if (s)
z |= 0x80000000;
return z;
}

private static int normalize(long f, int e) {
if (f == 0)
return pack(0, 0);
boolean s = f < 0;
if (s)
f = -f;
int c = 0; // TODO: check break
while (f < (0x400000)) { // 1 << 22
f *= 2;
e--;
if (c++ > 64)
break;
}
c = 0; // TODO: check break
while (f > 0x7fffff) {
f /= 2;
e++;
if (c++ > 64)
break;
}
// TODO: rounding
if (s)
f = -f;
return pack((int) f, e);
}

skilambi@gmail.com · Jun 15, 2008

Hi tim,
Thanks for the quick response. I totally understand what you have
explained. But lets take the same example you have given. Let us
assume we have a micrprocessor that does only 4 bit multiplies. Lets
say the numers are now, 11.01 and 01.01. For the hardware, the binary
point is not really a concern, so really its doing 1101 * 0101. Now
the answer is 0100.0001. This is where I lose my wheels. What comes
out of the microprosessor (assuming we have enough data word bits to
obtain the result) is 11010101. How do i know where the decimal point
is. That number can be interpreted in a myriad number of ways.

Can it be that we first scale the numbers to a known place for the
binary point, so in this case .1101 and .0101 and now WE KNOW that the
answer is .01000001?
Is this how the hardware design is done?

Regards,
Sai.

Jonathan Kirwan · Jun 15, 2008

Thanks for the quick response. I totally understand what you have
explained. But lets take the same example you have given. Let us
assume we have a micrprocessor that does only 4 bit multiplies. Lets
say the numers are now, 11.01 and 01.01. For the hardware, the binary
point is not really a concern, so really its doing 1101 * 0101. Now
the answer is 0100.0001. This is where I lose my wheels. What comes
out of the microprosessor (assuming we have enough data word bits to
obtain the result) is 11010101.

Where did you get that last number? By pasting 1101 to the left side
of 0101?

How do i know where the decimal point
is.

1101
x 0101
-------
1101
1101
-------
1000001

Since you happen to know, a priori, that there are two bits to the
right of the radix point on the multiplier and multiplicand, you then
also know, a priori, that there are four (sum of 2 and 2) bits to the
right of the new number's radix point.

So, start out assuming the radix point at the far right as though they
truly were simple integers:
1000001.
and then move it leftwards four points:
100.0001

This much is grade school math, by the way. 6th or 7th, if I recall.

That number can be interpreted in a myriad number of ways.

Not so.

Can it be that we first scale the numbers to a known place for the
binary point, so in this case .1101 and .0101 and now WE KNOW that the
answer is .01000001?

Is this how the hardware design is done?

The only complexity at all for fixed-point multiplication is handling
signed values. And even that isn't complicated. You might look up
Booth's algorithm, memory serving. There should discussions covering
both unsigned and signed situations. It's not complex.

Also read:
http://www.eed.usv.ro/~andy_t/pns/documentatii/des_sol/gp_sol/16_bit/Apps_1/Chap_2.pdf

Which may help you in understanding some ways of fixed point notation.

Floating point can be complicated if you are looking for a purely
combinatorial approach. The only folks I know of, whom I think did it
for floating point division, was Bipolar Integrated Technologies in
Beaverton, Oregon. But I could be wrong about it.

Jon

Frank Buss · Jun 15, 2008

Lets
say the numers are now, 11.01 and 01.01. For the hardware, the binary
point is not really a concern, so really its doing 1101 * 0101. Now
the answer is 0100.0001. This is where I lose my wheels. What comes
out of the microprosessor (assuming we have enough data word bits to
obtain the result) is 11010101. How do i know where the decimal point
is. That number can be interpreted in a myriad number of ways.

This is basic mathematic. Assuming the same number of fraction bits n it
looks like this:

a*2^n * b*2^n = a*b*2^(2*n) = (a*b*2^n)/2^n

So you have to shift the result by n bits right after the multiplication.

If you have only n bits, but want to multiply bigger numbers, it's again
basic mathematic. Assuming b and d are the lower bits and a and c are the
higher bits, you can calculate it like this (where *2^n means simply
shifting)

(a*2^n+b)*(c*2^n+d)=2^(2*n)*a*c+2^n*b*c+2^n*a*d+b*d

PS: Please quote this newsgroup postings when answering your homework

krw · Jun 15, 2008

Hi All,

I am trying to understand how fixed point and floating point are
implemented in hardware. This is what I understand so far

1) Fixed point means the decimal point is fixed. Floating point has
some bits set for mantissa and some for the exponent ie the decimal
point "floats".

The "decimal" point isn't necessarily fixed in "fixed point"
arithmetic. There is no exponent though.

2) Floating point has more dynamic range for the given number of bits
as compared to fixed.

Because of the exponent.

3) Floating point is harder to implement in hardware than fixed point
but is closer to representing real world values.

Neither are "hard", given an unlimited transistor budget. ;-) In
fact the fixed point unit I worked on was every bit as complicated
as the floating point unit.

4) Fixed point is better when ur application has power consumption
requirements but doesn care as much about the precision.

Not at all clear.

All of this is good. But i am trying to visualize fixed and floating
in hardware. What i am trying to ask is, in fixed point is there a
seprate datapath for fraction and for the the integer part. I wouldnt
think so because at the end of the day its just bits and its upto us
how we interpret them. I am sure fixed point and floating point
hardware are architecturally different.

No, there aren't two paths. As you note, bits are bits.

So one good question to ask is...how is a fixed point multiplier
implemented and how is a floating point multiplier implemented. Can
someone give a real world processor example of each. It would be much
appreciated.

A floating point unit must treat the mantissa and exponent
differently. A multiply would add the exponent and multiply the
mantissa. What sort of "example" do you want?

Frank Buss · Jun 16, 2008

Tim said:
It's the normalizations that slow it down, so any time you end up
subtracting the mantissas (i.e. any like-signed subtraction, or
differently-signed add), then the mantissa has to do the rumba (shift
left, cha cha cha) while the exponent counts the beat.

There are fast algorithms to calculate the log2:

http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious

Of course, in hardware it is easy to implement it in parallel:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity test is
port(
log2: out integer range 0 to 3;
argument: in unsigned(7 downto 0)
);
end entity test;

architecture rtl of test is

begin

log2 <= 8 when argument(7) = '1' else
7 when argument(6) = '1' else
6 when argument(5) = '1' else
5 when argument(4) = '1' else
4 when argument(3) = '1' else
3 when argument(2) = '1' else
2 when argument(1) = '1' else
1 when argument(0) = '1' else
0;

end architecture rtl;

Altera Quartus can implement this with 5 LEs and a propagation delay of 2
LEs for a Cyclone 3. The same for 32 bit needs 46 LEs and a propagation
delay of 7, but I'm sure with some more special gates the propagation delay
can be reduced in ASICs, e.g. with OR gates with up to 32 inputs.

MooseFET · Jun 16, 2008

(HDL code snipped)

I think your fast is my slow, and your small is my big.

On a fixed point machine that doesn't have hardware acceleration for
normalization, 32-bit IEEE floating point hovers around 50x slower than
integer math -- and that's a machine that does multi-cycle integer
multiplies.

If you do a very wasteful base 256 exponent, you can get the speed up
quite a bit. It may seem awful but when you have the extra RAM, it is
a lot better than doing IEEE floats.

john@jjdesigns.fsnet.co.uk · Jun 17, 2008

On Tue, 17 Jun 2008 07:47:17 -0400, Phil Hobbs [clip]

I was just playing with the minimal amount of assembly code that will
make a nice 4096-point sine table. Looks like maybe a dozen lines or
so, but the rounding errors are annnoying. Maybe I'll just stick with
the 2048-word lookup table, which is accurate and fast.

John

I wanted a small 256 sine lookup for an AVR. Being lazy I pulled one
from the web. Didn't look right on the scope or spectrum analyser, so
pulled another which for some reason had slightly different values.
That wasn't right either. Curious I then pulled another 3, all
different all crap!. The annoyance was that they all came with their
own righteous mathematical analyses.
Eventually had to do it myself. Being lazy is sometimes hard work.

Vladimir Vassilevsky · Jun 17, 2008

John Larkin wrote:

For small tables with short word lengths, there must be some subtle
arrangement of rounding that nets the purest spectrum. Hurts me head
to think about that one.

I suppose you could Fourier it and find the amplitude/phase of all the
harmonics, one by one, and try to poke the inverse correction into the
table at the sub-LSB level, sort of buried in the rounding noise.
Hurts me head.

I needed fast and accurate to 16 bits approximations of Sin/Cos. Below
is what I ended up with. The LUT was optimized for the minimax criteria.

//-----------------------
// Sin/Cos approximation (max_error ~ 3.7e-5, avg_error ~ 1e-5)
//

static const u16 sin16_table[257] =
{
0, 402, 804, 1206, 1608, 2010, 2412, 2814,
3216, 3617, 4019, 4420, 4821, 5222, 5623, 6023,
6424, 6824, 7223, 7623, 8022, 8421, 8820, 9218,
9616, 10014, 10411, 10808, 11204, 11600, 11996, 12391,
12785, 13179, 13573, 13966, 14359, 14751, 15142, 15533,
15924, 16313, 16703, 17091, 17479, 17866, 18253, 18639,
19024, 19408, 19792, 20175, 20557, 20939, 21319, 21699,
22078, 22456, 22834, 23210, 23586, 23960, 24334, 24707,
25079, 25450, 25820, 26189, 26557, 26925, 27291, 27656,
28020, 28383, 28745, 29106, 29465, 29824, 30181, 30538,
30893, 31247, 31600, 31952, 32302, 32651, 32999, 33346,
33692, 34036, 34379, 34721, 35061, 35400, 35738, 36074,
36409, 36743, 37075, 37406, 37736, 38064, 38390, 38715,
39039, 39361, 39682, 40001, 40319, 40635, 40950, 41263,
41575, 41885, 42194, 42500, 42806, 43109, 43411, 43712,
44011, 44308, 44603, 44897, 45189, 45479, 45768, 46055,
46340, 46624, 46905, 47185, 47464, 47740, 48014, 48287,
48558, 48827, 49095, 49360, 49624, 49885, 50145, 50403,
50659, 50913, 51166, 51416, 51664, 51911, 52155, 52398,
52638, 52877, 53113, 53348, 53580, 53811, 54039, 54266,
54490, 54713, 54933, 55151, 55367, 55582, 55794, 56003,
56211, 56417, 56620, 56822, 57021, 57218, 57413, 57606,
57797, 57985, 58171, 58356, 58537, 58717, 58895, 59070,
59243, 59414, 59582, 59749, 59913, 60075, 60234, 60391,
60546, 60699, 60850, 60998, 61144, 61287, 61429, 61567,
61704, 61838, 61970, 62100, 62227, 62352, 62475, 62595,
62713, 62829, 62942, 63053, 63161, 63267, 63371, 63472,
63571, 63668, 63762, 63853, 63943, 64030, 64114, 64196,
64276, 64353, 64428, 64500, 64570, 64638, 64703, 64765,
64826, 64883, 64939, 64992, 65042, 65090, 65136, 65179,
65219, 65258, 65293, 65327, 65357, 65386, 65412, 65435,
65456, 65475, 65491, 65504, 65515, 65524, 65530, 65534,
65535
};

s16 Sin_16(u32 phase)
{
u32 tmp_phase;

if(phase & 0x40000000) tmp_phase = 0x80000000 - (phase&0x7FFFFFFF);
else tmp_phase = phase&0x7FFFFFFF;

u16 phase_hi = tmp_phase >> 22;
u32 phase_lo = tmp_phase & 0x3FFFFF;

s32 s0 = sin16_table[phase_hi];
s32 s1 = sin16_table[phase_hi+1];

s0 += ((s1 - s0)*phase_lo)>>22;

if(phase&0x80000000) s0 = -s0;

return (s16)(s0>>1);
}

s16 Cos_16(u32 phase)
{
return Sin_16(phase + 0x40000000);
}
//-----------------------------------

Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com

Spehro Pefhany · Jun 17, 2008

Spehro said:
Spehro said:

It doesn't take much numerical ill-conditioning before you see quite
visible and evil effects from IEEE single-precision floats even in
straightforward operations like relatively low-order poly evaluation.

Of course if you understand the problem you'll just compute the ratio
of max/min singular values or whatever.

Click to expand...

Sure, I do all my EM simulations in single precision because of the
memory, more than the speed (which nowadays is usually quite
comparable--transistors being free).

Polynomials are surprisingly bad if you evaluate them from the
coefficients of powers of X, because of the creeping linear dependence
between your basis functions. If you use chebyshev polynomials instead,
especially with Clenshaw's summation rule, it's much much better
conditioned and almost as fast as Horner's rule for powers of X. [You
already use Horner's rule, it's P(x) = a+x*(b+x*(c+x*(...)))].

Yup, anything other than Horner's would be insane.

Cheers,

Phil Hobbs

Best regards,
Spehro Pefhany

Tim Williams · Jun 17, 2008

Vladimir Vassilevsky said:
I needed fast and accurate to 16 bits approximations of Sin/Cos. Below is
what I ended up with. The LUT was optimized for the minimax criteria.

Not wanting to get into FPU yet, I wrote a sine/cosine table for my various
assembly programming explorations. 0 to 359 degrees (plus same range for
cosine), signed integer so needs a shift after the multiply to get nearly
correct values (SIN(angle) * 65534 is what, 20ppm shy of correct).

I generated it with QBasic, with some string processing to put in the spaces
and "word" and lowercase hex conventions that I prefer. It compiles in
MASM, probably among others.

Sine word 00h, 23ch, 478h, 6b3h, 8eeh, 0b28h, 0d61h, 0f99h, 11d0h
word 1406h, 163ah, 186ch, 1a9dh, 1ccbh, 1ef7h, 2121h, 2348h, 256ch
word 278eh, 29ach, 2bc7h, 2ddfh, 2ff3h, 3203h, 3410h, 3618h, 381ch
word 3a1ch, 3c17h, 3e0eh, 3fffh, 41ech, 43d4h, 45b6h, 4793h, 496ah
word 4b3ch, 4d08h, 4ecdh, 508dh, 5246h, 53f9h, 55a5h, 574bh, 58eah
word 5a82h, 5c13h, 5d9ch, 5f1fh, 609ah, 620dh, 6379h, 64ddh, 6639h
word 678dh, 68d9h, 6a1dh, 6b59h, 6c8ch, 6db7h, 6ed9h, 6ff3h, 7104h
word 720ch, 730bh, 7401h, 74eeh, 75d2h, 76adh, 777fh, 7847h, 7906h
word 79bbh, 7a67h, 7b0ah, 7ba2h, 7c32h, 7cb7h, 7d33h, 7da5h, 7e0dh
word 7e6ch, 7ec0h, 7f0bh, 7f4bh, 7f82h, 7fafh, 7fd2h, 7febh, 7ffah
Cosine word 7fffh, 7ffah, 7febh, 7fd2h, 7fafh, 7f82h, 7f4bh, 7f0bh, 7ec0h
word 7e6ch, 7e0dh, 7da5h, 7d33h, 7cb7h, 7c32h, 7ba2h, 7b0ah, 7a67h
word 79bbh, 7906h, 7847h, 777fh, 76adh, 75d2h, 74eeh, 7401h, 730bh
word 720ch, 7104h, 6ff3h, 6ed9h, 6db7h, 6c8ch, 6b59h, 6a1dh, 68d9h
(And so on . . .)

It makes a reasonable circle, which is good enough for me.

Generating tables in HLL is pretty typical business.

Tim

john@jjdesigns.fsnet.co.uk · Jun 17, 2008

On 17 Jun, 14:26, John Larkin
On Tue, 17 Jun 2008 07:47:17 -0400, Phil Hobbs
[clip]
I was just playing with the minimal amount of assembly code that will
make a nice 4096-point sine table. Looks like maybe a dozen lines or
so, but the rounding errors are annnoying. Maybe I'll just stick with
the 2048-word lookup table, which is accurate and fast.
John
I wanted a small 256 sine lookup for an AVR. Being lazy I pulled one
from the web. Didn't look right on the scope or spectrum analyser, so
pulled another which for some reason had slightly different values.
That wasn't right either. Curious I then pulled another 3, all
different all crap!. The annoyance was that they all came with their
own righteous mathematical analyses.
Eventually had to do it myself. Being lazy is sometimes hard work.

Click to expand...

Click to expand...

For small tables with short word lengths, there must be some subtle
arrangement of rounding that nets the purest spectrum. Hurts me head
to think about that one.

Click to expand...

I suppose you could Fourier it and find the amplitude/phase of all the
harmonics, one by one, and try to poke the inverse correction into the
table at the sub-LSB level, sort of buried in the rounding noise.
Hurts me head.

Click to expand...

John

Click to expand...

What does "small 256 sine lookup" mean? 256 points? How many bits
per point?

...Jim Thompson
--
| James E.Thompson, P.E. | mens |
| Analog Innovations, Inc. | et |
| Analog/Mixed-Signal ASIC's and Discrete Systems | manus |
| Phoenix, Arizona 85048 Skype: Contacts Only | |
| Voice480)460-2350 Fax: Available upon request | Brass Rat |
| E-mail Icon athttp://www.analog-innovations.com| 1962 |

America: Land of the Free, Because of the Brave

A list of 256 values, each value 8 bits. A 256 point approximation of
one complete sine cycle, rather than the accurate, 16 bit, 1/4 cycle
shown by Vladimir.

JosephKK · Jun 18, 2008

Hi tim,
Thanks for the quick response. I totally understand what you have
explained. But lets take the same example you have given. Let us
assume we have a micrprocessor that does only 4 bit multiplies. Lets
say the numers are now, 11.01 and 01.01. For the hardware, the binary
point is not really a concern, so really its doing 1101 * 0101. Now
the answer is 0100.0001. This is where I lose my wheels. What comes
out of the microprosessor (assuming we have enough data word bits to
obtain the result) is 11010101. How do i know where the decimal point
is. That number can be interpreted in a myriad number of ways.

Can it be that we first scale the numbers to a known place for the
binary point, so in this case .1101 and .0101 and now WE KNOW that the
answer is .01000001?
Is this how the hardware design is done?

Regards,
Sai.

Most of what is puzzling you is some lacking of some fundamentals, and
never having had to wade through both integer and floating point at
the gate level, like i did. As always, for both integer and floating
point, divide is the slowest. For multiply Wallace trees can provide
great acceleration, they just do not help all that much for divide.
Perhaps you should start with fundamental theory (gate and algorithm
level) of computer arithmetic, it is not all that difficult. Then
proceed to Booth's algorithm(s) and Wallace trees. Do integer first,
then learn barrel shifters. They are crucial to decent FP
add/subtract performance. At that point everything is just extensions
and combinations of what you already know.

JosephKK · Jun 18, 2008

Fair enough in that case--time and frequency is a teensy little bit more
precise than ADCs, it's true.

Still, though, you know in advance the step size you'll be using, which
helps a lot in designing the arithmetic.

IEEE 754 floating point is great in theory, but there's no hardware
support on any CPU I know of for some of its features--especially
denormalized numbers.(*)

Actually both the 8087 and its successors, and the 68881 and its
successors, as well as later MIPS, SPARCs, and HP-PARISC, and IBM
POWER processors all do correct and complete IEEE 754 floating point.
Most of cause interrupts on each denormal result.

The unimplemented features have to be emulated
by the runtime handling the EM_DENORMAL exception in software on
_every_ _single_ _operation_, which takes *forever*. On my big EM
simulator, I have to fill the entire volume with noise of a few times
FLT_MIN before starting to avoid it running like molasses for several
hundred iterations until all the denormals get flushed out.

Now that is an inefficient implementation issue.

Now
*that's* stupid and inefficient. (Some compilers support turning off
denormals (its' usually called "flush to zero", but gcc doesn't.)

IEEE 754 assumes that there'll be hardware acceleration, which there is
on higher-end CPUs--they've got all those transistors available, so they
might as well do something useful with them. POWER 6 CPUs dispatch 2
floating point and one integer operation per clock, iirc.

For the x86 series the FPU was brought onboard with the 486, and with
the 68020 in that series. I think it was incorporated in MIPS in the
4000 series and later, and the first SPARCs. Superscalar came later.

If the guy designing the instrument is also designing the arithmetic,
all is copacetic--though laborious. Otherwise, give me hardware FP any day.

So I think that maxim of yours goes in the 'curmudgeonly half-truth'
category, along with my favourite one of Rutherfords: "If your
experiment needs statistics, you ought to have done a better experiment."

Cheers,

Phil Hobbs

(*) Denormals are numbers whose exponents are more negative than can be
represented in the IEEE format--so the exponent stays pegged low and
digits shift off to the right as the number shrinks. It's a
theoretically cute way of reducing rounding error due to underflow when
summing terms of widely different sizes. In practice it's a big nuisance.

Correct description of denormals.

Richard Henry · Jun 18, 2008

Most of what is puzzling you is some lacking of some fundamentals, and
never having had to wade through both integer and floating point at
the gate level, like i did. As always, for both integer and floating
point, divide is the slowest. For multiply Wallace trees can provide
great acceleration, they just do not help all that much for divide.
Perhaps you should start with fundamental theory (gate and algorithm
level) of computer arithmetic, it is not all that difficult. Then
proceed to Booth's algorithm(s) and Wallace trees. Do integer first,
then learn barrel shifters. They are crucial to decent FP
add/subtract performance. At that point everything is just extensions
and combinations of what you already know.

In some cases, especially in floating point math, division can be
treated as a variant of multiplication and therefore will run at
multiplier speeds.

Spehro Pefhany · Jun 18, 2008

In some cases, especially in floating point math, division can be
treated as a variant of multiplication and therefore will run at
multiplier speeds.

With constants, the reciprocal can be used, but not for variables:

http://www-vlsi.stanford.edu/papers/dh_arith_97.pdf

Best regards,
Spehro Pefhany

MooseFET · Jun 18, 2008

For small tables with short word lengths, there must be some subtle
arrangement of rounding that nets the purest spectrum. Hurts me head
to think about that one.

I suppose you could Fourier it and find the amplitude/phase of all the
harmonics, one by one, and try to poke the inverse correction into the
table at the sub-LSB level, sort of buried in the rounding noise.
Hurts me head.

Write a bit of code:

Fill a table with all ones

loop:

save a copy of the table
randomly increment or decrement one element
best fit a sine to the new table
if the fit is better keep the new table
if NOT bored yet goto loop

Vladimir Vassilevsky · Jun 18, 2008

Spehro said:
With constants, the reciprocal can be used, but not for variables:
http://www-vlsi.stanford.edu/papers/dh_arith_97.pdf

The floating point division by a variable X is usually done as
multiplication by 1/X. The 1/X can be computed either sequentially by
Newton-Raphson iteration or as a parallel computation using series. Both
methods are quicker then the direct division. BTW, the mistake in the
series computation was the cause of the famous Pentium division bug.

Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com

Moore's Lobby Podcast

Menu

Categories

Platforms

Content

Connect With Us

Network

Fixed point Vs Floating point

Fixed point Vs Floating point

[email protected]

Frank Buss

Frank Buss

[email protected]

Jonathan Kirwan

Frank Buss

krw

Frank Buss

MooseFET

[email protected]

Vladimir Vassilevsky

Spehro Pefhany

Tim Williams

[email protected]

JosephKK

JosephKK

Richard Henry

Spehro Pefhany

MooseFET

Vladimir Vassilevsky

Similar threads