Connect with us

EEPROM checksum error

Discussion in 'Electronic Basics' started by Dummy, Mar 2, 2005.

Scroll to continue with content
  1. Dummy

    Dummy Guest

    What is the possible cause of EEPROM checksum error?
    Could magnetic field corrupt the EEPROM data? Any design guidelines to
    prevent this potential failure?
     
  2. Luhan Monat

    Luhan Monat Guest

    Eeproms, as far as I know, do not have this feature. Something else
    (the programming device) must be making one up and storing it in the
    eeprom - somewhere.

    You will need to provide more specific information.
     
  3. Ken Taylor

    Ken Taylor Guest

    When did this occur? After a long time in circuit? After programming?

    Ken
     
  4. Dummy

    Dummy Guest

    The range of usage spanned from 6 months to 2 years. EEPROM is used in
    mobile radios. All radios are equipped in car or truck. Suspected to
    be noise at the supply line that caused the EEPPROM checksum error,
    but experiment showed that the noise injected at supply line is
    filtered by regulator circuit. Power supply line to EEPROM is
    confirmed to be clean regardless the amount of noise exist at main
    supply. What could have caused the checksum error?
     
  5. Make sure you follow all the recommendations of the manufacturer of the
    eeprom.

    Is the eeprom programmed in the field, or is it just programmed once at
    the factory and then used from then on? Is it possible there is a
    software error causing this? EEPROMs usually use keyed programming
    sequences to prevent inadvertent corruption.

    Make sure you lock out interrupts while programming the thing.

    --
    Regards,
    Robert Monsen

    "Your Highness, I have no need of this hypothesis."
    - Pierre Laplace (1749-1827), to Napoleon,
    on why his works on celestial mechanics make no mention of God.
     
  6. Ken Taylor

    Ken Taylor Guest

    All that good stuff....

    Also, are these two-way radios? Do the EEPROM's get altered during normal
    use, in which case is it possible RF is causing problems?

    Ken
     
  7. Pooh Bear

    Pooh Bear Guest

    *What* checksum ? How do you calculate 'your' checksum ?

    What type of Eeprom ? 24Cxx family for example ?

    Far too little info supplied to meaningfully respond.


    Graham
     
  8. Pooh Bear

    Pooh Bear Guest

    The serial interface is timing tolerant IME. Never seen false data as a result of background
    interrupts.


    Graham
     
  9. Harold Ryan

    Harold Ryan Guest

    Checksum is just the addition of each byte of data. At the end of the file,
    another byte or word is added that will total all of the bytes to zero. If
    any of the bytes are corrupt, the total sum of all the bytes will not be
    zero. A loose wire or strong magnetic field may cause this problem.
    Harold
     
  10. What kind of EEPROM? Data corruption in EEPROMs is not uncommon-
    caused directly by electrical noise, or by faulty design of the
    controlling microprocessor system, either wrt to EMI or power supply
    supervision. Redesign to decrease EMI susceptibility and PS issues,
    and then (and ONLY then) tweaks to add redundancy to the non-volatile
    storage can reduce the issue to insignificance even for large
    quantities of units in challenging applications.


    Best regards,
    Spehro Pefhany
     
  11. Pooh Bear

    Pooh Bear Guest

    I'm broadly familiar with this thanks. I'm less familiar with why Eprom
    programmers of old seemed to produce different checksums according to
    manufacturer.

    The OP still hasn't explained *what checksum* he's talking about under what
    conditions.

    Can he even validate the file ?


    Graham
     
  12. Lord Garth

    Lord Garth Guest

    Maybe a CRC was used rather than a checksum. How old is the code in the
    EPROM?
     
  13. Dummy

    Dummy Guest


    The EEPROM will be programmed in factory before shipping out to
    customer.
    Everytime when radio is turned on, checksum will be verified. Checksum
    error will occur when any bytes are corrupted in the EEPROM. If data
    corrupted during radio ON, any checksum error won't be detected until
    the next radio turned OFF and ON cycle.

    The corrupted bytes are at random EEPROM address.
    Some of the parts could be recovered after re-programming while some
    could not. For those parts which damaged permanently, failure analysis
    showed cell overwritten. Trying to inject some noises to EEPROM data
    or supply line while performing write operation could cause checksum
    error. But all the voltages supplied to EEPROM are clean when in
    normal use. The filter and regulator have taken care of the noises. So
    it's not right to point to the noise as the culprit.

    Most of the radios failed after being in the field from 6 months to 2
    years.
     
  14. You say they are preprogrammed, but this implies that you are writing
    them during normal operation. Which is it?
    ^ Famous last words. :)

    The filter and regulator have taken care of the noises. So
    If the eeproms aren't being reprogrammed in the field during normal use,
    then a software error is unlikely, unless the magic write sequence is
    stumbled upon during a freak crash. If they *are* being reprogrammed
    (ie, you are saving some value when the user retunes the radio) then
    I'll again say software. I'm telling you, lock out those interrupts!

    The other possibility is a bad batch of eeproms. This is fairly
    unlikely, but not without precedent*. Attempt to correlate the bad ones
    with some lot. Talk to the manufacturer, and ensure that they don't have
    a 'known' problem. Also, I wouldn't reuse the corrupted ones just
    because you managed to program them. I'd swap them out as soon as practical.

    * A company I used to work for decided to save 10 cents a ram chip and
    forgo individual testing of the chips by the manufacturer. Sadly, it
    turned out that those chips were bad 5 to 10 percent of the time. They
    were selling high availability purple ethernet switches for hundreds of
    thousands of dollars each. The engineer responsible was of course
    promoted to VP, and given vast new responsibilites.

    --
    Regards,
    Robert Monsen

    "Your Highness, I have no need of this hypothesis."
    - Pierre Laplace (1749-1827), to Napoleon,
    on why his works on celestial mechanics make no mention of God.
     
  15. Okay. As I suspected.
    Well, what about "abnormal" use, say something that might happen only
    rarely? Are you claiming that the supply voltage on these parts was
    maintained at 5.0V +/- 5% constantly, never straying lower or higher,
    from factory to failure? And noise injected from the supply or other
    pins could cause the micro's PC to point to random bits of code.
    I sure don't think you can conclude that.
    My original comments definitely apply to this situation. Can you post
    a link to the schematic of the power supply, micro and EEPROM?


    Best regards,
    Spehro Pefhany
     
  16. Take a GOOD look at power up and power down sequences. A few years ago,
    a vendor of mine was having problems with a similiar situation, where an
    EEPROM kept getting programmed to random bits here and there. Seemed
    that on start up (this was on a parallel port) there were voltage
    glitches that JUST HAPPENED to mimic the programming sequence on the
    device, which was not supposed to be field programmable! Since this was
    a security dongle, and the bits were sometimes the security ID codes,
    this was considered a very bad thing!

    So, take a look at what occurs during start up and shut downs, and see
    if there are any glitches then that can cause you problems!
     
  17. Were they actually able to observe this, or was it assumed? Dealing with
    hardware/software interfaces, it is quite common for programmers to
    blame software bugs on hardware 'glitches'. I've seen this again and
    again. It is usually a bug that just seems to come and go, possibly due
    to some unrelated change in the software that changes the timing or
    place in memory where a random pointer is hitting. I have made a living
    out of consulting on these kinds of issues.
    Yet another goblin to beware of. Thanks.

    --
    Regards,
    Robert Monsen

    "Your Highness, I have no need of this hypothesis."
    - Pierre Laplace (1749-1827), to Napoleon,
    on why his works on celestial mechanics make no mention of God.
     
  18. From their rep, it had definitely been observed. Device had worked for
    years, then they came out with a new package. New package also came at
    same time they went to a new fab, which had different processes. New
    processes made the programming sequence MUCH MORE sensitive, so that
    random glitches now created random bits programmed in their devices.
    They needed to replace a whole lot of devices in the field, and got a
    lot of bad will because of the random failues. We are still replacing
    these as they go bad...

    You see, we only use on small field on the whole EEPROM. Problem
    doesn't happen every time, and may be worse on some system, and less on
    others. Also, some people just don't use the things that often to break
    them!
     
  19. Dummy

    Dummy Guest

    Introducing random noise at supply line won't be able to cause any
    glitches at EEPROM lines because the noise was riding on the supply.
    However, when glitches are introduced to main supply line by creating
    a temporary dip of voltage at certain period, glitches can passed
    through to EEPROM lines.

    Previously, we have been able to see the EEPROM checksum error by
    introducing noise to EEPROM directly, bypassing the regulator. So I
    reckon that if the glitches get through the regulator, most probably
    checksum error will occur. We are checking on that. We thought of
    noise, but missed out the glitches.

    If that's the root cause, any method to prevent glitches? I guess
    regulator is only able to filter the noise that rides on the Vcc. Any
    sudden dip in voltage is not recoverable.
     
  20. For you, this might be a design issue. As power ramps up/ramps down,
    different components react differently. Some have internal caps that
    make them hold state a little longer than others, or are just more
    sensitive to power supply levels. Think about the programming sequence.
    What could provide it in your circuit. What could PREVENT it in your
    circuit!

    I have often found that start up conditions are not fully considered in
    design. You just assume that the power comes up all at once, smoothly.
    In reality, different voltage rails come up differently. Filter caps
    take time to charge up to voltage. Good design takes that into account,
    sometimes adding POR circuits to make sure that power is steady before
    starting things up, and quick shut down sequences to turn everything off
    before the power goes below limits. It's like preventing race
    conditions and logic glitches. Sometimes, you just have to take a good
    look at the failure modes...
     
Ask a Question
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.
Electronics Point Logo
Continue to site
Quote of the day

-