Connect with us

Best Async FIFO Implementation

Discussion in 'Electronic Basics' started by Davy, Oct 16, 2005.

Scroll to continue with content
  1. Davy

    Davy Guest

    Hi all,

    Does there exist a best implementation of Asynchronous FIFO?

    Any suggestions will be appreciated!
    Best regards,
  2. I guess it depends on what you're looking for.
    At minimum, it should *work* ...
    Then the rest is a compromise of resources/speed/feature(like almost
    empty/full flags,...)/...(reliability?)

  3. Peter Alfke

    Peter Alfke Guest

    All members of the Virtex-4 family from Xilinx have a
    (hard-coded=full-custom) FIFO controller in each of their BlockRAMs. It
    accepts different clocks for read and write (called "asynchronous
    operation") at any frequency up to 500 MHz. Capacity is 18 Kbits, the
    width is 4 to 36 bits, and the depth is accordingly from 4K to 512
    addresses (depth and width can easily be expanded with additional
    There is an EMPTY and a FULL flag, and also an ALMOST EMPTY and an
    ALMOST FULL flag, both fully programmable (with 1-address granularity).

    I designed the crucial asynchronous empty arbitration logic, and it
    works perfectly: We tested it by writing data at ~200 MHz into the
    FIFO, and reading it out at ~500 MHz, and the asynchrous empty-detect
    logic had worked flawlessly for all those >10e14 operations when we
    stopped the test after a week.
    No real FIFO application will probably ever go empty 200 million times
    a second...
    The high performance is due to very fast and compact full-custom logic,
    and our long experience in analyzing and dealing with the effects of

    Peter Alfke, Xilinx Applications (posting from home)
  4. raul

    raul Guest

    For simulation, are the Xilinx FIFO models any faster than before?
    Just recently I had to write fully-synchronous FIFO models to
    accelerate the simulations and achieved 100X (one hundred times)

  5. Peter Alfke

    Peter Alfke Guest

    Simulating asynchronous clocking must be very difficult and time
    consuming (I dare not use the word "impossible" for fear of being
    flamed). How do you cover all clock phase relationship, down to the
    femtosecond level? Synchronizers operate with that kind of timing
    Peter Alfke, speaking for himself.
  6. raul

    raul Guest

    Event-based simulation allows you to have very fine resolutions. Just
    make sure that all your signals crossing clock domains are flopped and
    that there are no Clock-to-Q delays involved in your model. I have run
    the fast FIFO models in ModelSim PE 6.1a and Veritak 1.75A and they
    have indentical behavior to the Xilinx models.
  7. Peter Alfke

    Peter Alfke Guest

    Raul, this may just reveal my ignornce, but anyhow:

    How do you model metastability, which needs sub-femtosecond resolution?
    How do you model that an asynchronous FIFO generates its EMPTY flag in
    time, even under the most adverse timing conditions between the two
    incoming clocks?
    Those have been things that kept me awake at night :-(

    Peter Alfke
  8. Usually in RTL simulations you don't even want to model things like that.
    Most important thing is to get fast simulation times for the whole design.
    And at least in the past Xilinx models were overly complex for pure RTL
    simulations, and usually own simulation models were needed to get the speed.

    The correctness of the async fifos must come from the design, reviews
    etc. It's impossible to simulate all the cases.

    Of course with netlist simulations timing accurate models are needed,
    but that is small part of simulations. That is usually done to check
    timing constraints and synthesis bugs (if formal verification tools are
    not part of the users toolset). Asynch portions are almost impossible to
    simulate. Nowadays there are also formal tools that check clock domain
    crossing correctness etc. Those tools can even inject errors during
    simulation that could be caused by metastability (the places are found by the
    formal portion).

  9. Peter Alfke

    Peter Alfke Guest

    Kim, thank you for that clarification. That means I was right in
    considering any simulation of metastability-causing asynchronous
    clocking impossible. There is no substitute for creativity, circuit
    analysis, some deep thinking, and experimentation. All of that we have
    done to verify the metastable behavior of our flip-flops, and to verify
    the behavior of our asynchronous FIFO in Virtex-4.
    Obviously, one can always simulate the effect that a given metastable
    delay has on the rest of the circuitry, but one cannot simulate the
    origin of the metastable delay.
    Peter Alfke, Xilinx Applications
  10. raul

    raul Guest


    There is no need to simulate metastability. The RTL simulations are
    functional. All conditions of empty and full have been verified with
    directed and random behavior over long simulations with clocks sliding
    past each other. The FIFOs are as assymetrical as 128 bits in and 16
    bits out and with clocks as different as 37.125 MHz and 100 MHz.

    The simulations have been proven correct in the lab on Virtex-2 Xilinx
    FPGAs running for several hours with real data.

    ModelSim PE's code profiler said that time was being spent mostly in
    the Xilinx FIFOs.

  11. Guest

    Hi, Davy -

    You may want to browse a number of papers on my web page for coding
    guidelines and coding styles related to multi-clock design and
    asynchronous FIFO design.

    At the web page:

    Look for the San Jose SNUG 2001 paper:
    Synthesis and Scripting Techniques for Designing Multi-Asynchronous
    Clock Designs

    Look for the San Jose SNUG 2002 paper:
    Simulation and Synthesis Techniques for Asynchronous FIFO Design

    Look for the second San Jose SNUG 2002 paper (co-authored with Peter
    Alfke of Xilinx):
    Simulation and Synthesis Techniques for Asynchronous FIFO Design with
    Asynchronous Pointer Comparisons

    Peter likes the second FIFO style better but the asynchronous nature of
    the design does not lend itself well to timing analysis and DFT.

    I prefer the more synchronous style of the first FIFO paper.

    I hope to have another FIFO paper on my web page soon that uses Peter's
    clever quadrant-based full-empty detection with a more synchronous
    coding style.

    We spend hours covering multi-clock and Async FIFO design in my
    Advanced Verilog Class. These are non-trivial topics that are poorly
    covered in undergraduate training. I have had engineers email me to
    tell me that their manager told them to run all clock-crossing signals
    through a pair of flip-flops and everything should work! WRONG!

    Regards - Cliff Cummings
    Verilog & SystemVerilog Guru
Ask a Question
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.
Electronics Point Logo
Continue to site
Quote of the day