Connect with us

Confused about Flash

Discussion in 'Electronic Design' started by John Larkin, Nov 3, 2006.

Scroll to continue with content
  1. John Larkin

    John Larkin Guest

    One of my engineers left to go Indonesia and teach, or something, and
    I have inherited 17,000 lines of really ghastly, buggy, ugly assembly
    code for an embedded product. It looks easier to rewrite it from
    scratch than to try to fix it, so that keeps me off the streets for
    the rest of the month.

    This thing has an ST flash chip, M29W400BB, which is 4M bits, used in
    256kx16 mode. The datasheet is typically confusing. So please check me
    on this:

    If I write a secret combination of words to a secret list of addresses
    in the chip, six writes total, I can tell it to erase one of its 11
    sub-blocks of memory. Apparently I can't do normal reads during erase,
    so I can't run the code out of the same flash I'm erasing. I have to
    erase a block (to all 1's, like an eprom) before I can program it. A
    block erase can take up to 6 seconds, but I can poll it to see when
    it's done. Apparently I select which block is to be erased by writing
    0x30 into any address of that block, as the last operation of the
    erase command.

    (The datasheet is cute. I's not obvious whether writing to address
    "BA" means "write to address 0xBA" or "write to an address in the
    block". Seems like the latter makes sense.)

    Write 0xAA to 0x555
    0x55 to 0x2AA
    0x80 to 0x555
    0xAA to 0x555
    0x55 to 0x2AA
    0x30 to any address in block to be erased

    wait 6 secs or poll for erase done

    Programming flash is less clear. Apparently I execute a chunk of
    secret writes, one for each word I want to load, each with three
    command code writes followed by an address+data word write. "The final
    write operation... starts the write state machine." I assume from this
    that the actual burn of a single word begins after each
    poke-a-write-word command sequence, and it seems to take 10 us typ,
    200 us max, and is again pollable for done.

    Write 0xAA to 0x555
    0x55 to 0x2AA
    0xA0 to 0X555
    data to target address

    wait 200 usec or poll for write done

    It sounds like here, once I erase a whole block, I can program any
    addresses within that block, as many or as few as I like, at any
    desired addresses, at any time. There seems to be no time constraints
    on how long it takes me to do this.

    During erase or program, I again can't execute code out of flash, so
    I'll have to relocate the flash erase and write routines into CPU ram
    and run them from there.

    Of course the datasheet has no straightforward "to write a block, do
    this..." stuff, or any examples.

    Oh well, even if nobody answers this post, just typing it has helped
    me figure out what's probably going on.

  2. I can sympathize. I wrote an assembly procedure for an AT161B atmel
    data flash ... same idea. Lots of command codes, etc. I'll send it to
    you if you wish. It's commented ... sorta.

  3. Along the same idea, the datasheets and appnotes for the Atmel dataflash
    are pretty clear, and give good details and even flow diagrams. While
    the actual commands are probably different from the ST one, the other
    info might give you some clues.


    Adrian Jansen adrianjansen at internode dot on dot net
    Design Engineer J & K Micro Systems
    Microcomputer solutions for industrial control
    Note reply address is invalid, convert address above to machine form.
  4. Fred Bloggs

    Fred Bloggs Guest

    Right- well it is clear you lack any capability for hierarchical
    partitioning of information, as I always suspected, and this will make
    the job doubly difficult for you. However, I applaud your openness and
    can't help but acknowledge your status as a symbol of hope for other
    untalented overachievers...
  5. The Atmel Dataflash has RAM buffers on chip. Meaning you first fill
    the RAM then tell the chip to copy the RAM to the flash. Plus it is
    much, much faster.

  6. mkaras

    mkaras Guest


    Many of the x8 and x16 FLASH devices in the market place use very
    similar programming algorithms. As such a search for App notes at other
    vendors sites can lead you to good insights for your ST part. For
    example look at this web link for some AMD algorithn flow charts that
    may help you:

    It is also possible to find some C and assembly language code samples
    that perform the standardized programming algorithms. Please let me
    know if you cannot find any and I can send you some samples to look at.
    My code is in x86 assembly language so should not be that hard to
    translate to other microcontroller platforms.

    - mkaras
  7. John,
    yes, the hard part is to have the Code in the RAM while
    doing the flash and hope the power stays on. Copy the
    code from Flash to the RAM, jump to it, and when it is
    done, whatever was done, do a clean reset. Since the code
    copied from the flash to the RAM is on a different adress,
    the code shouldn't contain any absolute jumps. Hmm, yes,
    the interrupts... This procedure used be done 20 years ago.

    Perhaps there is a data line that reflects the (busy-)

    Nowadays the controllers have boot code sections in the
    flash from where the application code flash can be handled.
    I know you already checked whether a redesign with modern
    hardware would make sense. Apparently it didn't.

  8. mkaras

    mkaras Guest

    You may also want to reference this link:

    Intel has been a champion of the CFI (_C_ommon _F_lash _I_nterface) for
    a long time now.

    - mkaras
  9. mkaras

    mkaras Guest

  10. Arlet

    Arlet Guest

    This is easily done by adding an appropriate section in the linker
    script file. You'll have to set the address to a RAM address, and the
    storage to somewhere in the flash. During code init, you initialize the
    RAM by copying the code, using symbolic labels that the linker
    provides, or that you add to the linker script yourself. The exact
    mechanism depends on the tool chain. It is very similar to the section
    that hold the initialized data in RAM, so it could be just a matter of
    copy/paste a few lines in the existing linker script, and make a few

    This way, you can use absolute addresses, and just call the functions
    as normal code. The compiler/linker will take care of the details.

    Vital interrupt handlers can be programmed in RAM, or another available
    memory area in the same way. Less critical interrupts can be disabled
    during flash programming/erasing.

    If the CPU has cache support, you may be able to run the code from
    flash, as long as you can guarantee all the code is actually in the
    cache (some CPUs allow code to be locked in the cache).
  11. John  Larkin

    John Larkin Guest

    I'm going to completely rewrite and test 17,000 lines of realtime
    assembly code (figure it'll be about 6-8 Kloc when I'm done) in the
    remaining days of November, and integrate it into an FPGA-managed,
    Ethernet equipped, DDS-clocked, picosecond-resolution timing box, and
    I'm going to get it right and elegant besides. One of my guys is
    concurrently redesigning the FPGA mess, in tight sync with the new uP
    code. Today's task is to redo the main program loop, the serial
    interrupt handler, and the command parser, taking time out only for
    beers and burgers with a couple of guys at the Beach Chalet. I've got
    a number of OEM customers waiting to design this box into their
    systems, and we have great hopes for this one.

    What are you up to lately?

  12. Jim Thompson

    Jim Thompson Guest

    Sheeesh! I doubt that Fred can tie his own shoes. He's just a
    blow-hard. Really good at criticizing others, but anything over 5
    transistors is well beyond his skill set ;-)

    ...Jim Thompson
  13. Arlet,
    that sounds somewhat familiar. Did you have a look
    at this stuff in the past 15 years ? And the manuals
    still around ?

  14. John  Larkin

    John Larkin Guest

    I'm programming in absolute assembly, no linkers or anything like
    that. The CPU is a 68K, so writing position-independent routines and
    relocating/running them dynamically is easy. The amount of code that
    has to be run in ram is actually tiny. If I intend to reflash, I'll do
    a hard reset and kill everything first.

    We bought a clamshell adapter for our programmer so production can
    program the entire flash chip and then solder it to the board. If it's
    OK, we ship it. The first flash block will be a boot manager, so if we
    ever need to change the app code, we can connect it serially to a
    laptop, start up a ping program, power cycle the box, and the pc can
    seize control of the boot-block program and potentially reflash the
    application code. If the boot program doesn't get pinged, it starts up
    the application. The intent is that the boot block itself never

    The ultimate fallback is to connect the bdm pod, which would let us
    reflash everything, boot block too, and make a clean start.

    All this makes me nostalgic for plugging eproms into sockets. But, as
    Fred says, I am incapable of hierarchal thinking.

  15. John  Larkin

    John Larkin Guest

    Back when RTL gates were just starting to be available (rotten crap
    they were, too) my boss told me that on IC's, someday transistors
    would cost less than a penny each. I thought he was nuts. The 4 mbit
    flash chip is fairly expensive by current standards, maybe 1e7
    transistors for $2.85.

  16. Arlet

    Arlet Guest


    Actually, yes. Even for a current project, using GCC, I needed to
    modify the linker scripts. For the GCC toolchain, there's the binutils

    Like I said, it's easiest to grab one of the supplied linker scripts
    that come with the tool installation, make a local copy, and modify
    that for a particular project.

    I've also done this for the ARM ADS suite a couple of years ago. The
    syntax is a bit different, but the concepts are the same.

    Unfortunately, there's no standard linker script definition, so you'll
    have to consult the documentation of your particular tools.
  17. Nico Coesel

    Nico Coesel Guest

    Hmm, sounds like a walk in the park. Ever tried to modify a similar
    sized program written by a lumber-jack?
    I think you have already figured the whole thing out by yourself. The
    whole way of erasing / programming is kind of standard anyway. You can
    try to look into the AMD or Intel datasheets for similar devices to
    see if they make more sense to you. I recall the AMD datasheets have
    some examples.
  18. Ben Jackson

    Ben Jackson Guest

    You mention FPGA later, so if you are hooking up pairs of these to
    get x32, keep in mind you get to program both halves in parallel.

    Also, beware of the various block- and chip-level write protects.
    The commands to turn these on are typically short, and easy for runaway
    code (17,000 lines of asm!) to lock a few blocks and leave you pulling
    your hair out when erase/write cycles don't verify.
    Well, like most flash, you can only program 1 bits to 0 bits. 0 bits
    can only be "erased" back to 1s. So you can always go back and clear
    bits with subsequent writes.
    I've never seen one take remotely that long, but maybe this one is
    It's right below the table.
    These sequences are common to many flash chips, and you can google
    yourself a raft of examples by searching for 0xAA 0x555 (for C, try
    hAA and h555 for asm ;-)
    There's a "fast" mode where you can enable a shorter write sequence.
    That's what most bulk programming routines use.
  19. Tim Shoppa

    Tim Shoppa Guest

    At the other end is excessively hierarchical partitioning of
    information. For example, John's application would be, in most of the
    industrial-military complex, "solved" with a cluster of 40 Windows
    servers each running their own special version of some relational
    database and developed by independent teams of foreign contractors :).

    In other words, if his worst problem is stupid FLASH, he's a winner in
    my book!

  20. John  Larkin

    John Larkin Guest

    I have deliberately chosen to do deep-embedded products that have no
    user interface, limited connectivity, and bog-simple microprocessor
    code. Given a choice of selling...

    A benchtop instrument with front panel, display, user interface,
    serial and network connections, power supply, enclosure, fan, six PC
    boards, Windows drivers, LabView drivers, lead-free, UL/FCC/CE
    stickers, five man-years of engineering, and that sells for $900,


    A VME module: one PC board, four LEDs, user interface = dipswitch, one
    manual summarizing register functions, over in three months, and that
    sells for $5200,

    we have chosen to go the "stupid" route.

    I wonder what sort of sophisticated stuff Fred designs.

Ask a Question
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.
Electronics Point Logo
Continue to site
Quote of the day