Connect with us

Recommend Book on Basic Video Card Design?

Discussion in 'Electronic Design' started by Jeff Walther, Nov 26, 2003.

Scroll to continue with content
  1. Jeff Walther

    Jeff Walther Guest

    As a hobbyist project I would like to design and build a simple video
    card. I would appreciate any recommendations of books (or other
    references) that address this topic. Ideally, I would like something
    which is thorough about all the things that need to be addressed, without
    bogging down in too much specific detail--at least not in the first few

    This is for fun and education, so no pressure. I have a few reference
    books and have read the relevant sections in a few text books. I have an
    EE degree and about a year of experience doing logic design. I've laid
    out a few PCBs and I can do SM soldering on fine pitched parts by hand.

    I don't have embedded programming experience, but I have done some
    assembly and machine language programming in a couple of classes.

    The problem I'm having is that the materials I can find address how to
    design cards for a specific platform, but seem to assume that the reader
    is familiar with interface card design in general. So they just fill in
    the details one needs to work in a specific environment without explaining
    the bigger picture of why these details are needed to make the larger
    device work.

    The materials I've looked at are kind of like learning to program the
    first time by reading a book that only covers the syntax of 'C' without
    explaining anything about the theory behind programming and how programs
    are put together.

    One book I have even has an abbreviated video card example, but it assumes
    that the reader knows everything there is to know about how computer video
    works. For example, I've gathered that some or all video cards generate
    an interrupt at the end of a vertical refresh cycle, but I have no idea

    Specifically, (when you stop laughing, please suggest a book) I'm trying
    to design a simple video card for the old (ca. 1989) Macintosh SE/30. I
    have Apple's "Designing Cards and Drivers for the Macintosh Family", 3rd
    Edition, and the Inside Macintosh volumes. And a few text books from my
    EE schooling that touch on computer architecture and such. But I haven't
    seen something that will just flat out explain the logical blocks one
    needs to make a video card work, what a monitor expects to come out of the
    video card, the concepts behind interfacing a card to host, the chunks of
    code (functional descriptions, not listings) that are needed to tie it
    together and why they're needed, etc.

    The SE/30 has a PDS slot to a 68030 but the software interface is based on
    the Mac's NuBus system.

    I can probably hack it out with what I know and what I can find, but it
    sure would be nice to have an instructional text...
  2. Tim Shoppa

    Tim Shoppa Guest

    It goes back many years, but Don Lancaster's _Cheap Video Cookbook_ and
    _TV Typewriter Cookbook_ are where I learned the ropes. No, they don't
    tell you about any modern bus, nor do they tell you about any modern
    video standard, but the principles are the same. And they're real fun
    to read.

  3. rms

    rms Guest

    It goes back many years, but Don Lancaster's _Cheap Video Cookbook_ and
    I still own both of these. Hard to admit :(

  4. Jeff Walther

    Jeff Walther Guest

    Thank you, Tim. I will try to track down copies.
  5. I'm a software guy, not a hardware guy, but I can tell you why this is.
    It's to allow software to synchronize screen drawing to screen refresh.
    If the program updates the screen image while the electron beam is
    halfway through drawing it, you will momentarily see the old image in
    the upper part of the screen, and the new image in the lower part, and
    this discontinuity produces a "tearing" effect. This can be particularly
    noticeable where a lot of screen updates are going on (animation, games).

    These days, CPUs are fast enough that the programs can typically
    generate new frames at much higher than the screen refresh rate. So
    synchronizing to the screen refresh can actually slow you down. Also
    since the refresh rates themselves are higher, the tearing effect is
    less noticeable anyway. So very few people worry about synchronizing to
    video refresh these days.
  6. Alex Gibson

    Alex Gibson Guest

    Have you had a look at ?

    opensource hdl - hardware description language projects

    You could do what your after in a reasonable sized fpga
    or parts. Could use small fpga or large cpld for different parts of the interfacing

  7. That is so that programs can write to screen memory while the video card
    is not reading from it.

    Older hardware does not have dual-ported RAM; If one were to write at a
    scanline at the time the video refresh hardware tried to read it, one
    would see awful display artifacts. A clear example for this was the
    Sinclair ZX80. Its Basic had 'Fast' and 'Slow' commands that toggled
    whether screen output was sync'ed with the vertical refresh interrupt.

    With dual-ported RAM, one still may want to sync with screen refresh. If
    one updates the screen while it is being written, users may shortly see
    half the old and half the new image.

    You will find that many cards can also generate interrupts at the end of
    each scan line. These can be used to improve the capabilities of the
    video card, for instance by switching color tables or video modes
    mid-screen (IIRC the original Atari game console even only had a single
    scan line of video memory. It had to be updated in software while the
    electron beam moved to the start of the next scan line)

    Both kinds of interrupts also are useful if one wants to support a light
    pen (NB: these probably will only work with CRT systems)

    I suggest searching for 8-bit computer hobbyist sites (especially the
    Atari 400 and 800) to get an idea about the symbiosis of hard- and
    software in video card capabilities.

  8. This may perhaps not be completely correct,
    the ZX80 in fast mode had no display AT ALL.
    It would calculate / process the BASIC commands or keyboard presses, then
    when finished the processor would jump to the display routine (the same one
    as in slow mode).
    It would flash on key entry in fast mode because of that.
    When in slow mode, the NMI was called every 1/50th of a second, and the
    display routine activated every frame.
    (And the ZX80 / ZX81 would be really slow because of all those display
    I just looked it up, ZX81 ROM disassembly by Dr. Ian Logan
    & Dr. Frank Ohara:
    the NMI routine:

    This routine is entered whenever a 'slow' NMI occurs.
    0066 NMI:
    ex af,af'
    inc a
    jp M, 006d NMI-RET
    jr z, 006f, NMI-CONT
    006d NMI-RET:
    ex af, af'

    At 006f is the 'prepare for slow' display routine.

    This issue is very different from the one at issue here,
    in modern cards you can write to one part of memory,
    while the other part one is displayed, then switch display
    memory (address actually) and write to the other part and
    display the first part.
    (As opposite to writing to the cards memory only during vertical
    flyback, that would be invisible, but slow, as there are only a few
    milliseconds available every frame).
    This way writes do not appear in the screen (could be
    small black stripes if writing directly into the display).
    If you also sync the display memory toggle with the monitor
    vertical scanning, you do not get horizontal cut picture
    parts as already pointed out by Mr. D'Oliviero

    Actually I once designed a display card, maybe I should write
    a book....
  9. Jeff Walther

    Jeff Walther Guest

    It sounds like there are a number of methods of avoiding this problem.
    That's one of the things I'd really like a reference to cover...

    When switching display memory as mentioned above, does one actually have
    two separate video frames and twice the nominally needed VRAM, or is
    something fancier going on?

    And what did folks do before there was dual ported memory--or if one does
    not wish to use dual ported memory? Was writing to the frame buffer
    consigned completely to the vertical flyback interval?

    For my first cut, I'm considering only supporting grayscale on the
    internal monitor (512 X 342). In a stock machine it's 1 bit, so this is a
    modest improvement. The host CPU/68030 bus runs at 16 MHz and is 32 bits
    data and 32 bits address. So it's not running very fast. Of course, my
    video processor may run considerably faster.

    Anyway, to simplify my first cut, I'm considering just using SRAM for the
    video memory. That would save me needing to multiplex the addresses and
    worry about refresh cycles. But dual-port SRAM is *expensive* so I'd
    rather avoid going dual-port if I can.

    Now, I could just install two frame buffers I guess, because SRAM in small
    capacities like this is fairly inexpensive. But in later revisions I may
    wish to support an external Apple and/or VGA monitor and supply something
    like 4 MB of VRAM and that would get kind of expensive to double.

    On the other hand, I want good performance. On the gripping hand,
    performance must have some ceiling because of the inherent limits of the
    system (16 MHz X 32 bits).
    Pretty please....
  10. Jeff Walther

    Jeff Walther Guest

    Thanks, Alex. I just took a look. There's an open core for a VGA/LCD
    driver for a Wishbone bus host. I hope to learn a thing or three by
    looking over the documentation and verilog.

    I was planning to use an FPGA and parts. What is the difference between
    an FPGA and a CPLD, or is there a reference that covers the finer points
    of PLDs?

    I was thinking my main components would be an FPGA, VRAM of some flavor, a
    Flash for the firmware, and a RAMDAC. I might be able to integrate the
    RAMDAC into the FPGA I guess; it looks like the opencore design does that,
    but I haven't examined it closely yet, so perhaps not.

    I may also need some buffers for the interface to the host, but I may be
    able to put that on the FPGA as well. Oh, and I bet I'll need some kind
    of programmable clock generator.
  11. Joel Kolstad

    Joel Kolstad Guest

    FPGAs are -- canonically -- a 'fine grained' architecture ... bazillions of,
    e.g., 4 input look-up tables, registers and tons of routing. CPLDs are,
    then, large 'sum of products' architectures -- a bazillion inputs can be
    ANDed together, some large number (dozens) of these product terms feed an OR
    gate, it gets feed to a register, of which there are a large number (but not
    bazillions as in an FPGA).

    For the 'old school' defitions of PLDs vs. FPGAs., look at, e.g., PAL20V8
    data sheets vs. Xilinx 4000 series data sheets. Check out the
    comp.arch.fpga newsgroup for more than you ever wanted to know on the
    subject... In the past 5 years or so there's been a significant rise in the
    density, speed, and flexibility of FPGAs and CPLDs -- FPGA vendors are very
    much trying to take over designs that have historically required ASICs
    whereas CPLD vendors (note that many vendors make both FPGAs and CPLDs) are
    often after what used to be FPGA territory.

    Very VERY broadly, CPLDs are better for designs requiring fewer registers
    but large logic 'expressions' that can't easily be pipelined. FPGAs are
    better for designs requiring simpler logc expressions, lots of registers...
    or pipelineable designs with complex logic.

    Video controllers are an almost classic example of a highly pipelineable
    design since -- considering a 2D display -- there's usually nothing
    whatsoever that's 'non-deterministic' in rendering a screen, and it's only
    non-determinism that makes pipelining challenging!
    If you can still find VRAM and RAMDACs any more, by all means use them! :)
    If not, SDRAM and regular DACs work fine too.

    ---Joel Kolstad
  12. This sounds like double-buffering, which is a technique used to avoid

    The problem happens when the program is rendering complex graphics, such
    that it happens slowly enough that you can just barely make out the code
    erasing parts of the image and drawing other bits on top, thus producing
    a flickering effect.

    To avoid this, the program does all its drawing into an offscreen buffer
    that is not visible on-screen at all. Then, when the image is complete,
    it tells the video hardware to exchange the offscreen buffer with the
    onscreen one, which can be done simply by exchanging a couple of address
    pointers, rather than swapping the entire contents of the buffers. Thus,
    the new image appears on screen instantly in its entirety, without

    Funnily enough, even this kind of hardware support for double-buffering
    isn't so important these days. Systems are now fast enough that a simple
    QuickDraw CopyBits call (or its equivalent on other platforms) can move
    the entire contents of the offscreen buffer into video RAM faster than
    you can blink. Thus, you don't really need the hardware capability to
    switch address pointers: just maintain your own offscreen buffers in
    software. Which also means that the number of offscreen buffers you're
    allowed to have is only limited by available RAM, not by numbers of
    available video hardware mapping registers or whatever.
    The original compact-form-factor Macintoshes didn't have dual-ported
    memory. Instead, there was some kind of hardware bandwidth-reservation
    scheme, such that the CPU was allowed RAM access for so many cycles,
    after which the next so many cycles were reserved for video, and then
    sound, then the floppy drive, then back to the CPU again.

    As a result of this, even though the nominal clock speed of those
    original Macs was 7.8336 MHz, their effective clock speed was closer to
    6 MHz. Though the Mac SE raised this to 7 MHz as a result of (if I
    recall correctly) allowing the CPU to access RAM 32 bits instead of 16
    bits at a time.

    Even after the Mac II introduced proper dual-ported VRAM in 1987, some
    later-form-factor Mac models still kept reverting to using ordinary DRAM
    for video, notably the IIci and IIsi. (Was this to keep costs down? Yet
    the even cheaper Mac LC could afford to have proper VRAM.) However, the
    video in these models was designed to confine its accesses to just one
    bank of DRAM. Coincidentally (or not), the filesystem cache also resided
    in the same bank. So by setting the cache size large enough (and
    arranging your RAM expansion SIMMs appropriately), you could force all
    other RAM use into the other bank, where it wouldn't be slowed down by
    video accesses.
  13. Even at low-resolution TV modes, that's still something over 15,000
    interrupts per second. No mass-market CPU of 1980s vintage or earlier
    could cope with such a rate of interrupts. And these days, I don't think
    anybody would bother with horizontal-retrace interrupts.

    The only mass-market machine I know of that could do useful things
    between scan lines was the Commodore Amiga. It did this by delegating
    the job to a separate video coprocessor called the "copper" (not to be
    confused with the "blitter", which sped up various raster operations).
    The Amiga didn't have an explicit video buffer as such. Instead, you
    built a linked list of descriptors, each pointing to a segment of main
    memory, called a "copper list". Each item in the list effectively said
    "for the next n scan lines of the display, the pixels come from
    such-and-such a block of memory". Switching pointers in this list could
    be done a lot more quickly than physically moving pixels around.

    (These days, we can physically move pixels around so quickly that nobody
    cares about copper-style cleverness any more.)
    That sounds fairly unlikely to me. The only piece of hardware I've had
    first-hand experience with, that was able to get away with having just
    one scan line of video memory similar to what you describe, was a
    homebrew video-capture system for tracking pucks on an air table for a
    first-year physics experiment. That was connected to a 1980s-vintage BBC
    microcomputer. Even that had custom hardware to detect the brightness
    level transitions within the scan line and latch up to 4 of them into
    counter registers. A machine-language loop running at full speed on the
    CPU could just about read off all these counters once per scan line.
  14. Joel Kolstad

    Joel Kolstad Guest

    There are 'cool video demos' out there for the Commodore 64 that do this.
    But other than creating some fancy video display and playing music, that's
    about _all_ they do since at 1MHz you're getting all of about 65 CPU cycles
    per line to poke at the video chip. (Hence the actual 'processing' is done
    during VBI, etc.)

    Also keep in mind that dozens of PIC/Sceniz/etc.-based projects I mentioned
    in the last post that do 'draw' the display in real time. (PICs get ~4
    MIPs/second, Scenix PCUs get up to ~75!)
    This was absolutely true! However, the Atari 2600's video IC did have a
    'crude' idea of what sprites were (e.g., the CPU didn't have to render every
    single pixel on every single line) and it would just sit around and repeat
    the same thing line after line if you didn't change any of its settings --
    hence many games would only update the display every _other_ scan line,
    dropping the effctive CPU update rate to ~7500 times/second. (And most
    games still did their 'processing' in the VBI.)

    A Google search will produce the data sheet for the Atari's video IC if
    you're curious...

    ---Joel Kolstad
  15. Chris Hanson

    Chris Hanson Guest

    Here are some of the things you'll need for a NuBus video card:

    * A declaration ROM, so the card is recognized for what it is. This
    will also include a basic video driver, so the operating system can
    interact with the card.

    * A NuBus bus interface. I know there used to be ICs that maanged
    interaction with NuBus, since it's a more complex bus protocol than
    (say) ISA. I think Texas Instruments made some, since they and the MIT
    Lisp Machine project invented NuBus...

    * Some dual-ported VRAM or other memory, and support infrastructure for

    * Some sort of digital-to-analog converter that can handle video

    * Information about the timing, shape etc. of the signals you'll need
    to generate for video, and hardware to generate them, possibly based on
    values set by your driver (if you're not just going to support a single
    resolution and bit depth).

    Conceptually, your video card is just going to be ripping through your
    VRAM once per frame, outputting a signal to the DAC based on what it
    reads and conforming to what a monitor expects.

    I think a good place to start might be with TTL video and an old (I
    mean *old*) PC-style black-and-white monitor. Such a monitor will do
    720 by 348 black-and-white video with non-square pixels, and the
    timings and hardware are very widely understood. You won't need much
    RAM (about 32KB) and all the timings should be pretty manageable. It'd
    be the equivalent of the original Hercules Graphics Card for the IBM

    -- Chris

    Chris Hanson <>, Inc.
    Outsourcing Vendor Evaluation
    Custom Mac OS X Development
    Cocoa Developer Training
  16. Scott Alfter

    Scott Alfter Guest

    Hash: SHA1

    Another use of scanline interrupts showed up on the Apple IIGS, where they
    got used most frequently to do 3200-color graphics. You had sixteen
    pallettes of 16 colors each, and each scanline could get its colors from one
    pallette. By rewriting the pallette data once every 16 lines, you could
    display more colors on-screen than usual. There wasn't much CPU time left
    for other stuff, though, so 3200-color mode tends to be used only for image
    viewers and similar types of apps.

    _/_ Scott Alfter (address in header doesn't receive mail)
    / v \ send mail to [email protected]$
    (IIGS( Top-posting!
    \_^_/ rm -rf /bin/laden >What's the most annoying thing on Usenet?

    Version: GnuPG v1.2.3 (Linux)

    -----END PGP SIGNATURE-----
  17. Funny, I have a Spartan II 200k gates FPGA here, as a video test generator /
    video display.
    I use a r2r ladder directly on the 3.3 V output driving a 75 Ohm cable via an
    emitter follower.
    You can play plenty with the thing to make all sorts of things....
    Was digitizing video with an other r2r ladder, using the input comparator of
    the FPGA and successive approximation.
    Speed depends on the r2r settle time...
    On you can find some PAL color bar generator (or
    was it on, maybe an other one).
    It is great to play.
    A real RGB DA is much better, so is a real flash AD, have some TDA7808 AD
    To play, with composite out, using the FPGA you can stick it in a normal TV,
    you need no ISA or PCI card, can use the FPGA proto board (I use digilab
    Then once you know what you want, you can make a board.
    One thing I want to do with this is digitize component video from a DVD
    player, then do the conversion to 32 kHz 50 Hz and RGB DA to a PC monitor.
    This will use the FPGA block RAM, also on the 'experiment' list is a simple
    time base corrector (to make real straight lines from old VHS tapes), delay
    composite one half line (sample with 27 MHz with the TDA7808, then
    say store a line in block RAM, and correct start / line in output, it is easy
    to measure sync start differences, as the sync is also digitized, maybe no
    external memory needed for a simple horizontal timing corrector (so to get
    rid of the waving head switching in old VHS).
    Have most of the design worked out for that (filters etc.. with inductors,
    still need to make a board).
    You need a low pass before digitizing...
    In theory one could perhaps make a 4 chip full field timebase corrector..
    Just a hobby project, preceeds very slowly.
  18. Actually, I think it was the video/sound/floppy system that was able to
    access 32 bits at a time. The CPU in the Mac SE was still 16 bits.
  19. Note that VRAM isn't truly dual-ported. The second port is little more
    than a shift register feeding a continuous stream of bits to the DACs
    for producing the video signal. This is because it needs to provide
    fast, sequential read-only access, which is less demanding than fast,
    random read/write access, which is what the CPU interface needs. It also
    means less slowdown due to contention with the latter.

    By the way, I think you'll find that the video output port is still
    sending out VRAM pixels during the horizontal and vertical
    retrace--these parts of the VRAM are in fact wasted, since their
    contents never appear on-screen. This is because it is cheaper to
    provide extra VRAM for this purpose, than to add special circuitry to
    stop and restart the shift register during the retrace intervals. Don't
    you just love the tradeoffs in VLSI design :).
  20. There is no need to be finished with your drawing at the end of the
    vertical flyback interval. It is not a problem if you are still painting
    line x > 1 at that time.

    This observation leads to the following trick:

    - wait for the vertical interrupt
    - update video RAM, working from top to bottom

    Also, I believe some videochips had a register from which one could read
    out the current scan line. One could use that to check whether it was
    'safe' to write a given scan line.

    You may get more detailed (and more correct) info on ancient video
    hardware on <>

Ask a Question
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.
Electronics Point Logo
Continue to site
Quote of the day