Maker Pro
Maker Pro

Protecting 3x16 bits with 16 bits ?

S

Skybuck Flying

Jan 1, 1970
0
Hello,

Soon I will attempt some gpgpu development/shader development... I have
already done some tests and so forth the gpu and it's memory seem to be
working just fine... (and fast ;) :)) No bit errors so far.

However the thought of bit errors creeping into it is indeed a bit scary...
The hardware is older from 2006: nvidia 7900 gtx with 512 MB of RAM. (OpenGL
will be used for it's development and cg shaders).

It seems this gpu works with 4 floating point fields per register. Either 16
bit or 32 bit. Maybe it's always 32 bit ? I am not sure... probably not
because 16 bit performance is twice as high.

I plan on using: 3x16 bits so that needs a "vector" register with 4x16 bits.

This means only 48 bits are used for example the .x, .y, .z and this leaves
the .w to be used for something else.

For me it could be interesting to make two modes of operations for the final
product:

First mode which uses 3x16 bits in registers and memory.
Second mode which uses 4x16 bits in registers and memory.

This would make the second mode slightly less memory efficient.

I have a feeling that the gpu is very fast and has plenty of processing
power/instruction power.

So I might get away with adding some extra "data integrity checks" to the
..x, .y, .z components and store them in .w component.

So the idea is basically to apply a 16 bit "data integrity check" to 48 bits
of data. Using Shader Model 3.0 instructions/CG language for now...

I wonder what is a good data integrity checking algorithm to detect bit
errors in this situation ?

The data integrity algorithm should not use to many branches... hopefully
just one branch to compare results ?! ;)

It shouldn't use too many memory look ups... that would be bad for
performance ?! ;)...

The 3x16 bits could be unpacked into 6x8 bit quantities which are then
stored in 32 bit floating point registers to do further calculations on...

If the algorithm is in floating point format and limits itself to 8 bit to
16 bit precision than that should work just fine... algorithms can be done
with integers... I can convert it to 16 bit floating points ;) I could even
convert it to 64 bit floating point but that would require 64 bit software
math and would slow things done so better not to do that...

The following operations are definetly available for such an algorithm:
32 bit floating point addition, can be used as 8 bit or 16 bit integer
addition, (max 24 bit)
32 bit floating point subtraction, can be used as 8 bit or 16 bit integer
subtraction, (max 24 bit)
32 bit floating point multiplication, can be used as 8 bit or 16 bit integer
multiplication, (max 24 bit)
32 bit floating point division, can be used as 8 bit or 16 bit integer
division, (max 24 bit)

and ofcourse special graphics instructions:
dot operations
interpolation operations. (But I have never used them and don't quite
understand on the gpu at least but I am willing to learn ;) :))

Some idea's in my head for now:

1. A simply weak "checksum" where everything is summed together... seems
like a very bad algorithm, since bit flips might go undetected.
2. A crc32 ? But the algorithm I have requires a large table and thus memory
lookups... doesn't seem to smart... and crc32 is like a large division ?
maybe overkill for just 48 bits of data ?
3. I can vaguely remember something about parity ? Is that the same as a
checksum or different ? I think that's different... parity counts the bits
set and then stores that ? Doesn't seem so strong ?
4. I can vaguely remember an error correcting code which could correct 1 bit
error ? by using two parities or so ? one vertical, one horizontal ?

So I ask you software programmers/developers and hardware designers and
algorithm designers :) out there the following question:

What kind of error detection algorithms, or maybe even error correction
algorithms are out there that you think would be suited for this special
situation ?

Also maybe you can design something specially for this situation ?

(The algorithm could later also be applied to slightly smaller bits like
only 16 bits, or 32 bits, or maybe even just 8 bits, but stored in 16
bits... that would be nice.)

Bye,
Skybuck.
 
S

Skybuck Flying

Jan 1, 1970
0
Here is a interesting thread about error detection on gpu's:

http://gpgpu.org/forums/viewtopic.php?p=18648&sid=c7bf701c2deed980c0a8745f15a630b8

The first guy says: "run twice see if same results..." I think this is
dangerous and wastefull, if a bit is truely damaged, the same bit error
might simply occur twice. It's also wasting resources big time ;) :)

Another guy says: people do weird things to their systems: like overclock,
or not cool properly.

Another guy says: memory chips might become warmer over time and might start
producing bit errors.

So I am thinking: It's now winter.. the computer is cool... thus no bit
errors... but what happens in the summer when it's fricking hot ? Maybe bit
errors will creep in... I will probably not be running my pc intensively
during the summer, but my "future" products users might be... it's good to
protect them from possible bit errors me thinks ;) :) So I kinda like this
idea of adding some bit error detection capabilities... just in case ! ;)

Then user can decide if he wants it or not by chosing the mode... ;) :) So I
hope by protecting the bits like that it will detect most if not all
problems with gpu/memory corruption ?

Bye,
Skybuck.
 
S

Skybuck Flying

Jan 1, 1970
0
Also I just realized something... CRC32 is too big... it requires 32 bits.

There are only 16 bits available for storing an integrity code...

Also it seems crc32 does 1 memory lookup per byte... so there are 3x2 bytes
is 6 bytes, which would mean 6 memory lookups... which is a bit much for my
taste... it might be acceptable anyway for a first version... but I would
rather have something better... something that doesn't require a memory
lookup and which fits in 16 bits... that would be nice ! ;) :)

Bye,
Skybuck.
 
S

Skybuck Flying

Jan 1, 1970
0
This one is kinda interesting, raptor codes:

http://tools.ietf.org/html/rfc5053

It seems to be something new... it seems to using some floating points and
matrices... normally I don't like matrices but in this case I might make an
exception since the gpu has support for it ;)

But it seems to also use very large tables ?! ;)

It might be overkill for this case... but it might be interesting for other
applications in the future...

I wonder if it's patented ? ;) I would guess so ;) but in Europe that
probably doesn't apply anyway ! ;) :)

Bye,
Skybuck ! ;) :)
 
S

Skybuck Flying

Jan 1, 1970
0
For me it could be interesting to make two modes of operations for the
final product:

First mode which uses 3x16 bits in registers and memory.
Second mode which uses 4x16 bits in registers and memory.

I want to expand on these modes.

There would be three modes:

1. "Casual/Normal mode" without error checking
2. "Paranoid/Strong Error Checking mode" with crc16 error checking.
3. "Cry like a baby mode/Error correcting mode" with ??? error correction.

Mode 1 would give maximum speed.
Mode 2 would be paranoid mode to detect if hardware fails.
Mode 3 would allow people with damaged hardware to still do some decent
calculations.

Also mode 3 would allow the program/product to simply continue as good as
possible, while mode 2 would break off which is kinda harsh but ok that's
what mode 3 is for if you want to continue as good as possible ;) :)

The remaining question is therefore:

What error correcting code to use for mode 3 ?

Bye,
Skybuck.
 
S

Skybuck Flying

Jan 1, 1970
0
Dave -Turner said:
so use crc16

I will try...

However I still need an error correcting code for mode 3 (See new
posting)...

Any idea's ?

I have 0% experience with error correcting codes I am afraid ! ;) :)

Bye,
Skybuck.
 
S

Skybuck Flying

Jan 1, 1970
0
The document I linked to in another sub thread had something
extra/interesting:

It suggested to use a number of extra bits to do a parity check on the
memory address ?! To check/test if it came indeed from the correct memory
cell.

Because if the bits on the address wires would screw up it would fetch the
wrong memory address, but the results of it, the memory/data, would still
pass the tests with flying colors ;) :)

So extra bits would be needed to protect against that. I haven't completely
read the document yet... so I am a bit fuzzy how to do that... it probably
does a parity bit on the address bits... and then simply stores that parity
bit with it... then when it's fetched the cpu could check that... with the
address it has in it's cpu registers.

For me I have 16 bits available... It would need 6 to 7 bits or so... that
means plenty of bits are left to do something else... I was thinking...
maybe do some extra hamming code on maybe the reversed bits or so ?

Or maybe even better...: use those extra bits to do a full hamming code on
the address bits and store that in it as well... it's possible me thinks...
very interesting idea, since this would protect against memory corruption
and address lines corruption.

I wonder... is there any other corruption ?

The extra memory checks are just sanity/paranoya checking... it's very
important that the results can be trusted :) One document mentioned 3 out of
300 gpu's had problems, but these were "beta" products.

Also this is the first time I will be using gpu's and it's memory so I want
to be sure that nothing weird is going on... now or in the future... it's
also a "feel good thing"... "user doesn't have to worry about it"... and
users with bad hardware are protected as well and might be able to get some
results even with bad hardware ;)

And finally it's always interesting to learn new programming techniques ;)
:) and make products more reliable if wanted :) I can add it too my toolbelt
of skills ! ;) :)

Bye,
Skybuck.
 
S

Skybuck Flying

Jan 1, 1970
0
I investigated hamming a bit and wondered what would happen if I left out 1
bit from all data bits in a parity calculation, all data bits would receive
a parity bit... from that "experiment" it was pretty clear that this hamming
idea can only correct single bit errors ?!? Well maybe not because the
wikipedia link says it could correct more but doesn't explain how...

Anyway I think it might be necessary to "parity check" the parity bits as
well in a way... which reminds me of LDPC codes...

It seems LDPC code calculations happen on module 2 and matrices which might
just be up the gpu's alley ?! ;)

Here some links:

Link to hamming code on wikipedia:

http://en.wikipedia.org/wiki/Hamming_code

Link to LDPC code on wikipedia:

http://en.wikipedia.org/wiki/Low-density_parity-check_code

I am also looking at some LDPC document which came with some LPDC c software
which is on the wikipedia links section as well... seeing if it's any good
;) :) so far seems interesting ;)

Though I am not sure if LDPC's would apply to my case because they do seem
to require as many bits as data bits to be effective ?

Bye,
Skybuck.
 
S

Skybuck Flying

Jan 1, 1970
0
Coding the hamming code thing... I think I can manage that ;) It might take
some time but that's ok :)

LDPC codes is a different matter though ;)

For now it seems LDPC has large overhead ?! so that's kinda shitty... but do
correct me if I am wrong ;) :)

If the overhead ain't that bad I might be willing to help some "scientist"
with coding up something "good" in delphi ;) :) You do the thinking, I do
the programming =D

When it comes to saving time maybe some people could code for me but I doubt
there are any good "Delphi" codes out there to work for absolutely "budget"
hours ;) :)

And I am very demanding when it comes to Delphi code quality ! ;) :) It most
be top notch ! ;) :) =D

And that's exactly what's gonna happen:

1. First generic Delphi code.
2. Then maybe optimized Delphi code for this specific case.
3. Then gpu code.
4. Then maybe further optimized gpu code if possible ;) :)

However shader model 3.0 /cg doesn't seem to good at bit fiddling... so that
might get a bit tricky ;) :)

Also error correction is something I wanted to do for some time now ;) :)

Thanks to better documentation explaining the idea's/concepts I should be
able to program it now and ofcourse thanks to my increased programming skill
level ;) :)

Bye,
Skybuck ;) :)
 
S

Skybuck Flying

Jan 1, 1970
0
However suppose this goes through, then there is a weird funny situation:

The GPU and it's code/memory is actually better protected then the rest of
the system ?! ;) :)

Though most of it should happen in the gpu anyway... but it's kinda odd...
as soon as it comes out it could still be affected by the cpu/main memory...
but that's an issue for the user he will have to deal with himself I
guess... or maybe I could do some extra checking on the outputs as well and
include them on the cpu side as well... it wouldn't be that much of a big
deal...

I think that's actually a cool idea too... since I did have one bad memory
chip not so long ago =D

So it should be total protection for cpu and gpu ! ;) :)

Bye,
Skybuck ;) :)
 
I

Ian Bell

Jan 1, 1970
0
mpm said:
Trust me (and everyone else here).
If Skybuck were to ever pay for a specialist, that person should have
nothing to do with electronics!!


The first thing you need to do is decide:

1. What type of errors are likely to occur - single or multiple bit
errors in a single packet of 4x16 bits or bursts in a stream of packets.

2. Whether to reformat groups of packets so as to spread bursts of
errors amongst a number of packets.

3. How many bits in error do you want to be able to detect and how many
of these do you want to be able to correct

Only then can you begin to determine a suitable coding scheme.

Cheers

Ian
 
S

Skybuck Flying

Jan 1, 1970
0
Ian Bell said:
The first thing you need to do is decide:

1. What type of errors are likely to occur - single or multiple bit errors
in a single packet of 4x16 bits or bursts in a stream of packets.

2. Whether to reformat groups of packets so as to spread bursts of errors
amongst a number of packets.

3. How many bits in error do you want to be able to detect and how many of
these do you want to be able to correct

Only then can you begin to determine a suitable coding scheme.

Not really lol ! ;) :)

One first has to detect the errors ! LOL.

Bye,
Skybuck ;) :)

P.S.: Sorry to burst your bubble ! ;) :)
 
I

Ian Bell

Jan 1, 1970
0
Skybuck said:
Not really lol ! ;) :)

One first has to detect the errors ! LOL.

Bye,
Skybuck ;) :)

P.S.: Sorry to burst your bubble ! ;) :)


YAWN
 
S

Skybuck Flying

Jan 1, 1970
0
Another interesting idea could be to do the following:

Spent 8 bits on a crc8 for strong error detection.
Spent 6 bits on a hamming code for 1-bit error correction.
Spent 1 bit on hamming code extension for 2 bit error detection.

Total of 15 bits used... one spare bit (?).

This would probably create a very strong mode which detects lot's of errors,
and has error correcting capabilities !?! ;)

And uses the bits most efficiently...

A CRC16 might be overkill for only 48 bits ?

Maybe CRC8 is good enough as well ? ;)

I haven't looked yet at CRC8 but gonna do that now ! ;)

Bye,
Skybuck.
 
S

Skybuck Flying

Jan 1, 1970
0
Hello,

I just had a better idea, since I forgot about the address check...

Since a hamming code has a certain range I might as well use the full range
to "ham" more bits ! ;) :)

So there are 48 "gpu data bits" and the maximum addressing range is probably
"16 bits" So that creates a total of:

48 true data bits + 16 artificial data bits, for a total of 64 bits to be
"hammed" :)

Including the extension bit for "double error detection" this leads to a
perfect 8 bit code for the hamming code !

And thus the full 16 bits of additional space in the .w component can be
fully used for maximum error checking:

8 bits for CRC8 + 8 bits for hamming code.

The CRC8 could be done on full 64 bits, and/or maybe even the 8 bit hamming
code for cross reference.
The HAM8 could be done on full 64 bits, and/or maybe even the crc8.

However this ofcourse can't be because this creates a little paradox... One
or the other has to be done first ! ;) :)

Maybe it's better to do the ham8 last, so it could correct the crc8 ? and
then figure out if there was an error ?

Or maybe the ham8 should only be done on the data bits... since correcting a
crc8 doesn't seem that usefull ? actually it does some usefull for stronger
error detection not so much error correction ;) :)

Actually with ham8 it's possible to "ham" 120 bits or so ! So even more
"artificial data bits" could be hammed ! :)

However does that make sense... this "artifical address bits" ?

How would that work... the cpu "pretends the data is there" and calculates
the ham8's... and crc8's and then simply compares those to stored in
memory...

So I guess the actually address doesn't need to be stored in memory... to me
it does seem to make sense ! ;) :)

Well mabe I shouldn't over do it and add more bits to be hammed to the
already 64 bits and maybe even 8 bits of crc32... so that would leave:
roughly 120-72 = 48 bits or so for even more artificial data...

Leaving some room in there could be good if the address range ever needs to
be expanded... or something else needs to be in there... so for now I will
let it be ! ;) :)

Now me go investigate crc8 ! ;) :)

Bye,
Skybuck.
 
S

Skybuck Flying

Jan 1, 1970
0
Not really, Today I programmed a first try of the hamming code, and it's
working, overall bit still needs to be added, maybe tomorrow hamming code
will be done.

I also found many many many crc8 routines.

I am not sure what code/polynomial to use, but I think this can be tested to
see which ones work best.

So ultimately thanks to other people, my time is way shortened.

Also crc's are more of a black art and so is probably ldpc's there is no
really good answer, so that will drive many people nuts ;) :)

I thank people spending use ammounts of time analyzing polynomials so I
don't have too ! ;) :)

Ultimately I might have to roll my own crc too so I understand it enough to
implement it in floating points... though I think I already found some code
with floating points so I might be done real fast if it's any good ;) :)

And for LDPC's I still don't understand those... too wacky for me... do you
understand those ? ;)

Bye,
Skybuck =D
 
Top