Connect with us

What's Your Favorite Processor on an FPGA?

Discussion in 'Electronic Design' started by rickman, Apr 20, 2013.

Scroll to continue with content
  1. rickman

    rickman Guest

    I have been working on designs of processors for FPGAs for quite a
    while. I have looked at the uBlaze, the picoBlaze, the NIOS, two from
    Lattice and any number of open source processors. Many of the open
    source designs were stack processors since they tend to be small and
    efficient in an FPGA. J1 is one I had pretty much missed until lately.
    It is fast and small and looks like it wasn't too hard to design
    (although looks may be deceptive), I'm impressed. There is also the b16
    from Bernd Paysan, the uCore, the ZPU and many others.

    Lately I have been looking at a hybrid approach which combines features
    of addressing registers in order to access parameters of a stack CPU.
    It looks interesting.

    Anyone else here doing processor designs on FPGAs?
  2. Bill Sloman

    Bill Sloman Guest

    Sounds like something where you'd get more responses on comp.arch.fpga.

    Are you cross-posting?
  3. rickman

    rickman Guest

    The choice of an internal vs. an external CPU is a systems design
    decision. If you need so much memory that external memory is warranted,
    then I guess an external CPU is warranted. But that all depends on your
    app. Are you running an OS, if so, why?

    The sort of stuff I typically do doesn't need a USB or Ethernet
    interface, both great reasons to use an ARM... free, working software
    that comes with an OS like Linux. (by free I mean you don't have to
    spend all that time writing or debugging a TCP/IP stack, etc)

    But there are times when an internal CPU works even for high level
    interfaces. In fact, the J1 was written because they needed a processor
    to stream video over Ethernet and the uBlaze wan't so great at it.

    I get the impression your projects are about other things than the
    FPGA/CPU you use and cost/size really aren't so important. Then you
    have less reason to squeeze on size, power, unit costs, but rather
    minimize development cost. If so, that only makes sense.

    My next project will be similar in hardware requirements to a digital
    watch, but with more processing...
  4. rickman

    rickman Guest

    That is not a useful way to look at RAM unless you are talking about
    buying a larger chip than you need otherwise just to get more RAM. That
    is like saying the routing in an FPGA is "expensive" compared to the
    PCB. It is there as part of the device, use it or it goes to waste.

    If you need Ethernet, then Ethernet is useful. But adding Ethernet to
    an FPGA is no big deal. Likewise for nearly any peripheral.

    No point in discussing this very much. Every system has it's own
    requirements. If external ARMs are what works for you, great!

    What do you do for the networking code. If you write your own, then you
    are doing a lot of work for naught typically, unless you have special

    So you are using networking code, but no OS?

    The soft cores I work with don't bother with that sort of stuff. The
    apps are much smaller and don't need that level of complexity. In fact,
    that is what they are all about, getting rid of unneeded complexity.

    Ethernet comms can be a hunk of code, but the rest of what you describe
    is pretty simple stuff. I'm not sure there is even a need for a
    processor. Lots of designers are just so used to doing everything in
    software they think it is simple.

    Actually, I think everything you listed above is simple enough for a
    uBlaze. What is the issue with that?

    I find HDL to be the "simple" way to do stuff like I/O and serial comms,
    even signal processing. In fact, my bread and butter is a product with
    signal processing in an FPGA, not because of speed, it is just an audio
    app. But the FPGA *had* to be there. An MCU would just be a waste of
    board space which this has very little of.

    Xilinx has that now you know. What do they call it, Z-something? Zync

    How about 144 processors running at 100's of MIPS each? Enough
    processing power that you can devote one to a serial port, one to an SPI
    port, one to flash a couple of LEDs and still have 140 left over. Check
    out the GreenArrays GA144. Around $14 the last time I asked. You won't
    like the development system though. It is the processor equivalent of
    an FPGA. I call it a FPPA, Field Programmable Processor Array. It can
    be *very* low power too if you let the nodes idle when they aren't doing
  5. Guest

    ZYNQ. There is a rather low-cost eval board, named
    Zedboard (, $395 ) which comes with
    Linux pre-installed on a SD card. The ZYNQ chip
    onboard contains a hard dual-core Cortex-A9 and
    ~1M gates worth 7th generation logic.

  6. rickman

    rickman Guest

    Everyone is entitled to their opinion, but this is *far* from fact. The
    CPUs in my designs have so far been *free* in recurring price. They fit
    in a small part of the lowest priced device I can find.

    Most people think of large, complex code that requires lots of RAM and
    big, fast external CPUs. I think in terms of small, internal processors
    that run fast in a very small code space. So they fit inside an FPGA
    very easily, likely not much bigger than the state machines John talks

    BTW, have you looked at any of the soft cores? The J1 is pretty amazing
    in terms of just basic simplicity, fast too at 100 MHz. They talk about
    the source just being 200 lines of verilog, but I don't know how many
    LUTs the design is, but from the block diagram I expect it is not very
    big. I'm not sure I can improve on it in any significant way.
  7. Guest

    xilinx Zynq, arm9 with an fpga on the side

  8. Guest

    at one point it did crash alot, but I haven't had many problems with
    it for the
    past few years

  9. You've just described PCI Express.

    - Industry standard fast serial interface.
    - AC-coupled CML (rather than LVDS, but still differential).
    - scalable bandwidth:
    - 2.5, 5.0, 8.0 Gbps per lane.
    - 1, 2, 4, 8 or 16 lanes.
    - allows single access as well as bursts.
    - multi-master (allows DMA).
    - Fabric can be point-to-point (e.g. CPU-FPGA) or can use switches for
    larger networks.
    - in-band interrupts (saves pins).
    - Peripherals (typically) just appear as chunks of memory in the CPU
    address space.
    - Widely supported by operating systems.
    - Supports hot plug.
    - Many FPGAs have hard cores for PCIe.
    - Supported by ARM SoCs (but not the very cheapest ones).
    - compatible with loads of off the shelf chips and cards.
    - Easy to use (although that might be an "eye of the beholder" type of

    I wouldn't recommend PCIe for the lowest cost or lowest power products,
    but it's great for the stuff that I do.

  10. I agree about it being designed for throughput, not latency. However,
    with a fairly simple design, we can do 32 bit non-bursting reads or
    writes in about 350ns over a single lane of gen 1 through 1 layer of
    switching. I suspect there's some problem with your implementation
    (unless your 2 microsecond figure was just hyperbole).

    I found the spec clear. It's rather large though, and a text book serves
    as more friendly introduction to the subject than the spec itself.

    One of my co-workers was confused by the way addresses come most
    significant octet first, whilst the data come least significant octet
    first. It makes sense on a little endian machine, once you get over the

    Hot plug is the only thing that gives us headaches. PCIe Hot plug is
    needed when reconfiguring the FPGA while the system is running.
    OS support for hot plug is patchy.
    Partial FPGA reconfiguration is one workaround (leaving the PCIe up while
    reconfiguring the rest of the FPGA), although I haven't tried that in any
    production design yet.


  11. I thought it was faster than that. If I remember, I'll measure some in
    the lab tomorrow.

    BTW, the write requires two packets as well.

    I don't know anything about hot plug support on Windows. On Linux,
    however, there are two ways to do it:

    - True hot plug. You need to use a switch (or root complex) that has
    hardware support for the hot plug signals (particularly "Presence Detect"
    that indicates a card is plugged in). The switch turns these into
    special messages that get sent back to the RC, and the OS should honour
    these and do the right thing. This should work on Windows too, as it's
    part of the standard.

    - Fake hot plug. With the Linux "fakephp" driver you can fake the hot
    plug messages if you don't have hardware support for them. This isn't
    supported in all kernel versions though. Read more here:

    In both cases there can be address space fragmentation that can stop the
    system from working. By that I mean that the OS can't predict what will
    be plugged in, so it can't know to reserve a contiguous chunk of address
    space for your FPGA. The OS may do something stupid like put your
    soundcard right in the middle of the space you wanted. Grrr.

    Recent versions of the Linux kernel allow you to specify rules regarding
    address allocation to avoid the fragmentation problem, but I've never
    used them and I'm not a kernel hacker, so I don't know anything about

  12. Guest

    why is it any more or less evil than big endian?

  13. Nico Coesel

    Nico Coesel Guest

    Most entry level scopes consist of an FPGA running a soft processor.
    You mean PCI express? :)
  14. On Mon, 22 Apr 2013 09:27:04 -0700, John Larkin wrote:

    [ snip pcie hot plug discussion ]
    With fakephp, you should just need to rescan that slot. With proper hot
    swap hardware support, it should just happen automatically. (As if
    anything would go wrong with that!)

    When the hot plug removal event happens, the OS is meant to unload the

    The drivers get reloaded after the hot plug insertion event. Possibly
    not the same drivers as before, if the FPGA contains something else.

    Your higher level application needs to be aware that the driver can come
    and go with the hot plug events. You'll need some sort of mechanism to
    inform the application (e.g. a signal).
    Presumably the application is the actual cause of the FPGA
    reconfiguration, in which case it knows when the FPGA is there or not and
    doesn't need to be told.

    I found that just the presence detect was needed for reliable hot plug.
    All the others are optional.


  15. I looked at a trace on a board at work. I was surprised - the writes
    were fast(ish) - about 100 ns was the smallest gap I saw between writes.
    The reads were slower. I didn't see reads closer together than about 2us.

    This seems consistent with Larkin's measurements.

    I'm still surprised though - 2 us is 20000 bit times on a 4 lane gen 1.
    Ok, it's only 16000 bit times before the 8B10B coding.

    Maybe the switch is configured for store-and-forward rather than cut
    through, or something equally silly.

  16. rickman

    rickman Guest

    A *few* years? PCI has been around for 20 years!

  17. I think PCI was a *just* hard intermediary bus. All the other busses
    were tertiary to it, and went though it to get to the CPU.

    I think PCIe is a hard intermediary bus too but it has it's own API
    practically, and I would call that pretty advanced. PCI-X is likely
    fully superseded by e, but elements of the original PCI paradigm
    decidedly must remain. PCI only had about 11 command codes.
  18. rickman

    rickman Guest

    Sure, for that matter PCs are going away for the mainstream. In 10
    years it will literally be like working on the Enterprise... the space
    ship Enterprise. Everyone will be using tablets and pads, there just
    won't be a need for the traditional PC except for specialties... like
    PCB layout, lol

    There won't be any busses really. It will all be wireless. Maybe it
    will all be powered by a Tesla type power source too. lol

    That doesn't change the fact that PCI was mainstream for well over a
    decade, more like 15 years!

    BTW, are you capable of learning? Or have you reached your learning
  19. Robert Baer

    Robert Baer Guest

    What the hell is wrong with PARALLEL?
    You get the _whole_ byte/word/whatever each possible I/O cycle and do
    not have to wait 20+ cycles for preamble bits, 16 data bits, stop bits
    (maybe more for stupid "framing" because designer was too lazy to
    enforce assumptions that would speed things up).
  20. Robert Baer

    Robert Baer Guest

    Just give me his feather headband..
Ask a Question
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.
Electronics Point Logo
Continue to site
Quote of the day