Connect with us

Protecting 3x16 bits with 16 bits ?

Discussion in 'Electronic Design' started by Skybuck Flying, Feb 11, 2010.

Scroll to continue with content
  1. Hello,

    Soon I will attempt some gpgpu development/shader development... I have
    already done some tests and so forth the gpu and it's memory seem to be
    working just fine... (and fast ;) :)) No bit errors so far.

    However the thought of bit errors creeping into it is indeed a bit scary...
    The hardware is older from 2006: nvidia 7900 gtx with 512 MB of RAM. (OpenGL
    will be used for it's development and cg shaders).

    It seems this gpu works with 4 floating point fields per register. Either 16
    bit or 32 bit. Maybe it's always 32 bit ? I am not sure... probably not
    because 16 bit performance is twice as high.

    I plan on using: 3x16 bits so that needs a "vector" register with 4x16 bits.

    This means only 48 bits are used for example the .x, .y, .z and this leaves
    the .w to be used for something else.

    For me it could be interesting to make two modes of operations for the final

    First mode which uses 3x16 bits in registers and memory.
    Second mode which uses 4x16 bits in registers and memory.

    This would make the second mode slightly less memory efficient.

    I have a feeling that the gpu is very fast and has plenty of processing
    power/instruction power.

    So I might get away with adding some extra "data integrity checks" to the
    ..x, .y, .z components and store them in .w component.

    So the idea is basically to apply a 16 bit "data integrity check" to 48 bits
    of data. Using Shader Model 3.0 instructions/CG language for now...

    I wonder what is a good data integrity checking algorithm to detect bit
    errors in this situation ?

    The data integrity algorithm should not use to many branches... hopefully
    just one branch to compare results ?! ;)

    It shouldn't use too many memory look ups... that would be bad for
    performance ?! ;)...

    The 3x16 bits could be unpacked into 6x8 bit quantities which are then
    stored in 32 bit floating point registers to do further calculations on...

    If the algorithm is in floating point format and limits itself to 8 bit to
    16 bit precision than that should work just fine... algorithms can be done
    with integers... I can convert it to 16 bit floating points ;) I could even
    convert it to 64 bit floating point but that would require 64 bit software
    math and would slow things done so better not to do that...

    The following operations are definetly available for such an algorithm:
    32 bit floating point addition, can be used as 8 bit or 16 bit integer
    addition, (max 24 bit)
    32 bit floating point subtraction, can be used as 8 bit or 16 bit integer
    subtraction, (max 24 bit)
    32 bit floating point multiplication, can be used as 8 bit or 16 bit integer
    multiplication, (max 24 bit)
    32 bit floating point division, can be used as 8 bit or 16 bit integer
    division, (max 24 bit)

    and ofcourse special graphics instructions:
    dot operations
    interpolation operations. (But I have never used them and don't quite
    understand on the gpu at least but I am willing to learn ;) :))

    Some idea's in my head for now:

    1. A simply weak "checksum" where everything is summed together... seems
    like a very bad algorithm, since bit flips might go undetected.
    2. A crc32 ? But the algorithm I have requires a large table and thus memory
    lookups... doesn't seem to smart... and crc32 is like a large division ?
    maybe overkill for just 48 bits of data ?
    3. I can vaguely remember something about parity ? Is that the same as a
    checksum or different ? I think that's different... parity counts the bits
    set and then stores that ? Doesn't seem so strong ?
    4. I can vaguely remember an error correcting code which could correct 1 bit
    error ? by using two parities or so ? one vertical, one horizontal ?

    So I ask you software programmers/developers and hardware designers and
    algorithm designers :) out there the following question:

    What kind of error detection algorithms, or maybe even error correction
    algorithms are out there that you think would be suited for this special
    situation ?

    Also maybe you can design something specially for this situation ?

    (The algorithm could later also be applied to slightly smaller bits like
    only 16 bits, or 32 bits, or maybe even just 8 bits, but stored in 16
    bits... that would be nice.)

  2. Here is a interesting thread about error detection on gpu's:

    The first guy says: "run twice see if same results..." I think this is
    dangerous and wastefull, if a bit is truely damaged, the same bit error
    might simply occur twice. It's also wasting resources big time ;) :)

    Another guy says: people do weird things to their systems: like overclock,
    or not cool properly.

    Another guy says: memory chips might become warmer over time and might start
    producing bit errors.

    So I am thinking: It's now winter.. the computer is cool... thus no bit
    errors... but what happens in the summer when it's fricking hot ? Maybe bit
    errors will creep in... I will probably not be running my pc intensively
    during the summer, but my "future" products users might be... it's good to
    protect them from possible bit errors me thinks ;) :) So I kinda like this
    idea of adding some bit error detection capabilities... just in case ! ;)

    Then user can decide if he wants it or not by chosing the mode... ;) :) So I
    hope by protecting the bits like that it will detect most if not all
    problems with gpu/memory corruption ?

  3. Also I just realized something... CRC32 is too big... it requires 32 bits.

    There are only 16 bits available for storing an integrity code...

    Also it seems crc32 does 1 memory lookup per byte... so there are 3x2 bytes
    is 6 bytes, which would mean 6 memory lookups... which is a bit much for my
    taste... it might be acceptable anyway for a first version... but I would
    rather have something better... something that doesn't require a memory
    lookup and which fits in 16 bits... that would be nice ! ;) :)

  4. This one is kinda interesting, raptor codes:

    It seems to be something new... it seems to using some floating points and
    matrices... normally I don't like matrices but in this case I might make an
    exception since the gpu has support for it ;)

    But it seems to also use very large tables ?! ;)

    It might be overkill for this case... but it might be interesting for other
    applications in the future...

    I wonder if it's patented ? ;) I would guess so ;) but in Europe that
    probably doesn't apply anyway ! ;) :)

    Skybuck ! ;) :)
  5. For me it could be interesting to make two modes of operations for the
    I want to expand on these modes.

    There would be three modes:

    1. "Casual/Normal mode" without error checking
    2. "Paranoid/Strong Error Checking mode" with crc16 error checking.
    3. "Cry like a baby mode/Error correcting mode" with ??? error correction.

    Mode 1 would give maximum speed.
    Mode 2 would be paranoid mode to detect if hardware fails.
    Mode 3 would allow people with damaged hardware to still do some decent

    Also mode 3 would allow the program/product to simply continue as good as
    possible, while mode 2 would break off which is kinda harsh but ok that's
    what mode 3 is for if you want to continue as good as possible ;) :)

    The remaining question is therefore:

    What error correcting code to use for mode 3 ?

  6. I will try...

    However I still need an error correcting code for mode 3 (See new

    Any idea's ?

    I have 0% experience with error correcting codes I am afraid ! ;) :)

  7. The document I linked to in another sub thread had something

    It suggested to use a number of extra bits to do a parity check on the
    memory address ?! To check/test if it came indeed from the correct memory

    Because if the bits on the address wires would screw up it would fetch the
    wrong memory address, but the results of it, the memory/data, would still
    pass the tests with flying colors ;) :)

    So extra bits would be needed to protect against that. I haven't completely
    read the document yet... so I am a bit fuzzy how to do that... it probably
    does a parity bit on the address bits... and then simply stores that parity
    bit with it... then when it's fetched the cpu could check that... with the
    address it has in it's cpu registers.

    For me I have 16 bits available... It would need 6 to 7 bits or so... that
    means plenty of bits are left to do something else... I was thinking...
    maybe do some extra hamming code on maybe the reversed bits or so ?

    Or maybe even better...: use those extra bits to do a full hamming code on
    the address bits and store that in it as well... it's possible me thinks...
    very interesting idea, since this would protect against memory corruption
    and address lines corruption.

    I wonder... is there any other corruption ?

    The extra memory checks are just sanity/paranoya checking... it's very
    important that the results can be trusted :) One document mentioned 3 out of
    300 gpu's had problems, but these were "beta" products.

    Also this is the first time I will be using gpu's and it's memory so I want
    to be sure that nothing weird is going on... now or in the future... it's
    also a "feel good thing"... "user doesn't have to worry about it"... and
    users with bad hardware are protected as well and might be able to get some
    results even with bad hardware ;)

    And finally it's always interesting to learn new programming techniques ;)
    :) and make products more reliable if wanted :) I can add it too my toolbelt
    of skills ! ;) :)

  8. I investigated hamming a bit and wondered what would happen if I left out 1
    bit from all data bits in a parity calculation, all data bits would receive
    a parity bit... from that "experiment" it was pretty clear that this hamming
    idea can only correct single bit errors ?!? Well maybe not because the
    wikipedia link says it could correct more but doesn't explain how...

    Anyway I think it might be necessary to "parity check" the parity bits as
    well in a way... which reminds me of LDPC codes...

    It seems LDPC code calculations happen on module 2 and matrices which might
    just be up the gpu's alley ?! ;)

    Here some links:

    Link to hamming code on wikipedia:

    Link to LDPC code on wikipedia:

    I am also looking at some LDPC document which came with some LPDC c software
    which is on the wikipedia links section as well... seeing if it's any good
    ;) :) so far seems interesting ;)

    Though I am not sure if LDPC's would apply to my case because they do seem
    to require as many bits as data bits to be effective ?

  9. Coding the hamming code thing... I think I can manage that ;) It might take
    some time but that's ok :)

    LDPC codes is a different matter though ;)

    For now it seems LDPC has large overhead ?! so that's kinda shitty... but do
    correct me if I am wrong ;) :)

    If the overhead ain't that bad I might be willing to help some "scientist"
    with coding up something "good" in delphi ;) :) You do the thinking, I do
    the programming =D

    When it comes to saving time maybe some people could code for me but I doubt
    there are any good "Delphi" codes out there to work for absolutely "budget"
    hours ;) :)

    And I am very demanding when it comes to Delphi code quality ! ;) :) It most
    be top notch ! ;) :) =D

    And that's exactly what's gonna happen:

    1. First generic Delphi code.
    2. Then maybe optimized Delphi code for this specific case.
    3. Then gpu code.
    4. Then maybe further optimized gpu code if possible ;) :)

    However shader model 3.0 /cg doesn't seem to good at bit fiddling... so that
    might get a bit tricky ;) :)

    Also error correction is something I wanted to do for some time now ;) :)

    Thanks to better documentation explaining the idea's/concepts I should be
    able to program it now and ofcourse thanks to my increased programming skill
    level ;) :)

    Skybuck ;) :)
  10. However suppose this goes through, then there is a weird funny situation:

    The GPU and it's code/memory is actually better protected then the rest of
    the system ?! ;) :)

    Though most of it should happen in the gpu anyway... but it's kinda odd...
    as soon as it comes out it could still be affected by the cpu/main memory...
    but that's an issue for the user he will have to deal with himself I
    guess... or maybe I could do some extra checking on the outputs as well and
    include them on the cpu side as well... it wouldn't be that much of a big

    I think that's actually a cool idea too... since I did have one bad memory
    chip not so long ago =D

    So it should be total protection for cpu and gpu ! ;) :)

    Skybuck ;) :)
  11. Ian Bell

    Ian Bell Guest

    The first thing you need to do is decide:

    1. What type of errors are likely to occur - single or multiple bit
    errors in a single packet of 4x16 bits or bursts in a stream of packets.

    2. Whether to reformat groups of packets so as to spread bursts of
    errors amongst a number of packets.

    3. How many bits in error do you want to be able to detect and how many
    of these do you want to be able to correct

    Only then can you begin to determine a suitable coding scheme.


  12. Not really lol ! ;) :)

    One first has to detect the errors ! LOL.

    Skybuck ;) :)

    P.S.: Sorry to burst your bubble ! ;) :)
  13. Ian Bell

    Ian Bell Guest

  14. Another interesting idea could be to do the following:

    Spent 8 bits on a crc8 for strong error detection.
    Spent 6 bits on a hamming code for 1-bit error correction.
    Spent 1 bit on hamming code extension for 2 bit error detection.

    Total of 15 bits used... one spare bit (?).

    This would probably create a very strong mode which detects lot's of errors,
    and has error correcting capabilities !?! ;)

    And uses the bits most efficiently...

    A CRC16 might be overkill for only 48 bits ?

    Maybe CRC8 is good enough as well ? ;)

    I haven't looked yet at CRC8 but gonna do that now ! ;)

  15. Hello,

    I just had a better idea, since I forgot about the address check...

    Since a hamming code has a certain range I might as well use the full range
    to "ham" more bits ! ;) :)

    So there are 48 "gpu data bits" and the maximum addressing range is probably
    "16 bits" So that creates a total of:

    48 true data bits + 16 artificial data bits, for a total of 64 bits to be
    "hammed" :)

    Including the extension bit for "double error detection" this leads to a
    perfect 8 bit code for the hamming code !

    And thus the full 16 bits of additional space in the .w component can be
    fully used for maximum error checking:

    8 bits for CRC8 + 8 bits for hamming code.

    The CRC8 could be done on full 64 bits, and/or maybe even the 8 bit hamming
    code for cross reference.
    The HAM8 could be done on full 64 bits, and/or maybe even the crc8.

    However this ofcourse can't be because this creates a little paradox... One
    or the other has to be done first ! ;) :)

    Maybe it's better to do the ham8 last, so it could correct the crc8 ? and
    then figure out if there was an error ?

    Or maybe the ham8 should only be done on the data bits... since correcting a
    crc8 doesn't seem that usefull ? actually it does some usefull for stronger
    error detection not so much error correction ;) :)

    Actually with ham8 it's possible to "ham" 120 bits or so ! So even more
    "artificial data bits" could be hammed ! :)

    However does that make sense... this "artifical address bits" ?

    How would that work... the cpu "pretends the data is there" and calculates
    the ham8's... and crc8's and then simply compares those to stored in

    So I guess the actually address doesn't need to be stored in memory... to me
    it does seem to make sense ! ;) :)

    Well mabe I shouldn't over do it and add more bits to be hammed to the
    already 64 bits and maybe even 8 bits of crc32... so that would leave:
    roughly 120-72 = 48 bits or so for even more artificial data...

    Leaving some room in there could be good if the address range ever needs to
    be expanded... or something else needs to be in there... so for now I will
    let it be ! ;) :)

    Now me go investigate crc8 ! ;) :)

  16. Not really, Today I programmed a first try of the hamming code, and it's
    working, overall bit still needs to be added, maybe tomorrow hamming code
    will be done.

    I also found many many many crc8 routines.

    I am not sure what code/polynomial to use, but I think this can be tested to
    see which ones work best.

    So ultimately thanks to other people, my time is way shortened.

    Also crc's are more of a black art and so is probably ldpc's there is no
    really good answer, so that will drive many people nuts ;) :)

    I thank people spending use ammounts of time analyzing polynomials so I
    don't have too ! ;) :)

    Ultimately I might have to roll my own crc too so I understand it enough to
    implement it in floating points... though I think I already found some code
    with floating points so I might be done real fast if it's any good ;) :)

    And for LDPC's I still don't understand those... too wacky for me... do you
    understand those ? ;)

    Skybuck =D
Ask a Question
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.
Electronics Point Logo
Continue to site
Quote of the day