Connect with us

32bit multiplication using TTL logic

Discussion in 'Electronic Basics' started by logjam, Feb 16, 2006.

Scroll to continue with content
  1. logjam

    logjam Guest

    I want to design what would have been a super computer in 1975, using
    parts that would have been "easily" available. Several people have
    done this already, just not on an insane scale. My favorite is the
    Magic-1, which I'm currently designing PCBs for. It has been a GREAT
    learning experience studying the schematics from the Magic-1. I never
    thought much about what a computer did during the mysterious clock
    cycles between instruction cycles, and now with the microcode muxes,
    registers, it all just makes sense.

    Any way, I need my computer to be very good at math. The magic-1 is a
    16bit Add/sub/compare machine. I want to make mine capable of 32bit
    operations and have a 32 bit data path.

    I'm starting with 32bit multiplication because it gets 32bit addition
    out of the way as well.

    I threw together a diagram of a 32*32 multiplier using around 1040 AND
    gates and 400 74283 Adder circuits. This schematic would be what I
    think is called "asynchronous".

    Once I completed my diagrams I remembered something horrible. There is
    a LOT of time spent waiting for the carry-in-out propagation in the
    hundreds of adders which leads me to building an adder out of XOR and
    AND gates. This comes at the added expense of power consumption board
    space, but should speed things up by about 250x. Could anyone suggest
    the best 1 bit addition "block" with carry? Its pretty late here and
    I'm starting to loose brain function.

    Crazy? Remember, a few weeks ago I completed soldering 19,008 LEDs to
    make a display...which I haven't finished yet... (building the drivers)

    One last thought for those of you who, well, you know who you are.
    Idle, what would the power consumption of some 600 odd ICs be? 20ma
    per device? This just might make more heat than a pentium!

    Now...for floating point... ;)
     
  2. You really need (in this order) a girlfriend and an FPGA development
    kit.
     
  3. zoot

    zoot Guest

    In the good old days, as I remmeber, Mulipications /dvisions were done
    by sofwtare ( OS libraries ),
    Some Aritmetics unit did it Async, most of the time using some
    microcode or a hardwwre ( that's even older ) but none did it Sync ( 32
    bits in parall )
    Not sure butseems to to me 32bits* 32 bits = 64bits ??

    Regards
     
  4. Bipolar Integrated Technologies, I think they were in Beaverton,
    Oregon or nearby, used to make pure high-speed floating point ALU
    chips. Might see if there is a white paper or two floating around
    from them.

    Also, the PDP-11 had some entries added later on which supported
    floating point. It's possible that someone has already developed some
    Verilog or VHDL for such a beast and you might be able to look at the
    code for ideas.

    Finally, this was posted over on comp.arch.embedded and it may help
    you think more closely about the class of individuals you are soon to
    join:

    http://www.cs.pdx.edu/~harry/Relay/index.html

    ;)

    Jon
     
  5. Chris

    Chris Guest

    The fabled "Machina Analytica"! Terminal blocks for the address and
    data busses! Relays everywhere! St. Leibowitz and the Venerable
    Boedullos are smiling somewhere, Jonathan.

    But is that a 32K X 8 SRAM I see?...

    "...He turned to the community, spread his hands, shrugged eloquently.

    " 'It's still not Him', he told them sourly, then hobbled away.

    "Afterwards, there was little formality."

    Chris
     
  6. Chris

    Chris Guest

    The fabled "Machina Analytica"! Terminal blocks for the address and
    data busses! Relays everywhere! St. Leibowitz and the Venerable
    Boedullos are smiling somewhere, Jonathan.

    But is that a 32K X 8 SRAM I see?...

    "...He turned to the community, spread his hands, shrugged eloquently.

    " 'It's still not Him', he told them sourly, then hobbled away.

    "Afterwards, there was little formality."

    Chris
     
  7. logjam

    logjam Guest

    I found his web page just a month or two ago also. But the relatively
    short life of a relay worries me. I found a few auctions on ebay for
    3000 relays at ~$200, but I don't want to hand wire such a thing. :)
    SOMEONE needs to make a relay based web server. Maybe it only says
    "Hello World", which would probably take about a minute or two...

    When I thought about it last night I came up with a new solution. A
    compromise between speed and board space. This is my first experience
    with binary multiplication, so my first implimentation was very
    wasteful!

    I wanted to have a 64bit adder circuit with parallel carry generation,
    but I forgot about fan-in/out. Getting to the end of the adder circuit
    would require a LOT of components too. Maybe 16bit parallel by 4
    serial is more reasonable.

    If I try to multiply 1111 by 1111 I should get 11100001.

    Carry bit | multiplicand | multiplier

    Start:
    0 0000 1111 - Next Op, P0=1, add 1111 0000 to 0000 1111
    0 1111 1111 - Next Op, shift right 1 bit
    0 0111 1111 - Next Op, P0=1, add 1111 0000 to 0111 1111
    1 0110 1111 - Next Op, shift right 1 bit
    0 1011 0111 - Next Op, P0=1, add 1111 0000 to 1011 0111
    1 1010 0111 - Next Op, shift right 1 bit
    0 1101 0011 - Next Op, P0=1, add 1111 0000 to 1101 0011
    1 1100 0011 - Next Op, shift right 1 bit
    0 1110 0001 - 4 repetitions? yes
    Done

    Requirements are a register to store the 32bit multiplicand, 64bit flip
    flop shift right register (carry out from adder gets shifted in),
    possibly a 64bit register to store the product/working value.

    This would require the least amount of parts, but limit me to
    multiplication with the circuit. It would be pretty fast since it only
    needs a 32bit adder instead of a 64bit.

    In the end it might be worth having a separate multiply / shift right
    circuit and then an additional add / subtract circuit. It all depends
    on the speed path of the full 64bit adder.

    ALU flow would probably come from a fast ROM or even SRAM loaded by the
    host at boot-up.

    Anything I've missed that could speed up this operation? I'm
    interested in reducing components, but not at a huge speed drop. The
    process will take 32 loops, so 10ns more per loop is a lot.
     
  8. What's really neat for me, I think... it's right here in my city!! So
    I may hope to meet him and watch it work.

    Jon
     
  9. Nor would I.
    Well, remember that each and every "bit" of memory requires a lot of
    space and handiwork to make. It would be excessively large, memory
    wise. I think anyone doing something like this, with relays, is going
    to be very conscious of getting the most from the least effort. And
    so a lot of thought will go into that aspect. (Which is why I'd like
    to visit, because I'm sure I'd learn a few things developed out of
    that kind of close thinking about where each additional relay added
    has the most payoff in the result.)
    I think a lot of folks have been down the same path you are going, as
    well. Trade-offs between combinatorial and sequential approaches.
    Particularly, in this area of division, multiplication, and so on.

    When thinking about an adder, though, why not take this to an extreme
    here and think about doing it with a single full adder and sequential
    logic? One bit at a time. Between that and a fully combinatorial
    solution lays a lot of fun.
    First off, just by way of interesting information on the side, in
    multipliers there is a trade-off in combinatorial vs synchronous
    implementations. A 4x4's combinatorial equivalent gate area just
    about equals a sequential implementation of the same thing, in fact.
    Below a 4x4, combinatorial is actually less gates. Above 4x4,
    sequential is less gates. Your 4x4 suggestion here is right on the
    cusp. You can go either way.
    Since you are talking sequential here, the common approach I've
    studied is Booth's algorithm and "shift-and-add." For shift and add,
    the structure looks like:

    4
    X ================================\
    \\
    4 ,-------, ||
    Y =======>| YR | ||
    '-------' /===\ ||
    ||4 // \\ ||
    || || || ||
    \/ \/ || ||
    ,-----, ,-----, || ||
    \ \ / / || ||
    \ \ / / || ||
    \ \/ / || ||
    /\ / || ||
    / \ / || ||
    1/ '----' // ||
    / || // ||
    / ||4 // 4 || 8
    / || /====== || =========> ANSWER
    | || // || //
    | || // || //4
    v \/ // \/ //
    ,---, ,--------, ,--------,
    0-->| C |--->| ALUR |--->| XR |
    '---' '--------' '--------'
    /\
    ||
    0000

    You start out through your execution control logic (not shown), by
    latching X into XR, Y into YR, 0000 into ALUR, 0 into C, and some
    sequential step counter (not shown) to an initial value of 4. That
    counter will have to be able to decrement and test for 0 (not shown.)

    That is step (0).

    Then the following logic is followed:

    (1) if LSB of XR is 1, latch C:ALUR from ALU.
    (2) shift right, "0":C:ALUR:XR
    (3) decrement counter
    (4) if counter not zero, go to step (1)
    (5) Result is ALUR:XR

    YR C ALUR XR counter
    (0) 1111 0 0000 1111 4
    (1) 1111 0 1111 1111 4
    (2) 1111 0 0111 1111 4
    (3) 1111 0 0111 1111 3
    (1) 1111 1 0110 1111 3
    (2) 1111 0 1011 0111 3
    (3) 1111 0 1011 0111 2
    (1) 1111 1 1010 0111 2
    (2) 1111 0 1101 0011 2
    (3) 1111 0 1101 0011 1
    (1) 1111 1 1100 0011 1
    (2) 1111 0 1110 0001 1
    (3) 1111 0 1110 0001 0
    (5) ============

    Is that about what you are talking about here? It only requires a
    4-bit adder to produce an 8-bit output.
    Of course.
    Have you looked at Booth's?
    I don't want to spend a lot of time thinking more about this. It's
    all been done, somewhere. Have you looked over the literature on the
    subject?

    Jon
     
  10. John Larkin

    John Larkin Guest


    The IBM 704 did floating point with tubes!

    http://www-03.ibm.com/ibm/history/exhibits/mainframe/mainframe_PP704.html


    John
     
  11. Chris

    Chris Guest

    I'd like to see it in operation, too. Obviously a rather extensive
    object lesson in building an ALU -- arcane, but very appropriate to his
    classwork. No one will ever accuse him of not taking his job
    seriously. ;-)

    There are four relays for the clock -- I'm wondering if he's got a
    three-phase or two phase clock.

    Clock speed = 10Hz? Mirth still lives in our world -- if nothing else,
    Professor Porter has a sense of humor.

    Chris
     
  12. logjam

    logjam Guest

    If you guys want to see/hear a much bigger relay computer, look here:

    http://www.relaiscomputer.de/

    Click on Downloads and then listen to all of the sounds. In one of
    them you can here the electric typewriter banging out the answer! :)

    1500 relays vs 400
     
  13. logjam

    logjam Guest

    Well, I've drawn the 4 bit adder in my schematic program and I'm not
    sure the board space consumption from a 32bit parallel carry adder are
    worth the very little speed increase.

    I want to make sure that I'm using parts that were available in the
    late 70s. Does anyone know of a list of parts and their introduction
    date? Or an old databook in PDF format?
     
  14. Rich Grise

    Rich Grise Guest

    http://www.google.com/search?hl=en&q=ttl+data+book

    Cheers!
    Rich
     
  15. Rich Grise

    Rich Grise Guest

    Look up "Carry Lookahead". :)

    Have Fun!
    Rich
     
  16. Guest

    Guest Guest

    : I want to design what would have been a super computer in 1975, using
    : parts that would have been "easily" available. Several people have
    : done this already, just not on an insane scale. My favorite is the
    : Magic-1, which I'm currently designing PCBs for. It has been a GREAT
    : learning experience studying the schematics from the Magic-1. I never
    : thought much about what a computer did during the mysterious clock
    : cycles between instruction cycles, and now with the microcode muxes,
    : registers, it all just makes sense.

    : Any way, I need my computer to be very good at math. The magic-1 is a
    : 16bit Add/sub/compare machine. I want to make mine capable of 32bit
    : operations and have a 32 bit data path.

    : I'm starting with 32bit multiplication because it gets 32bit addition
    : out of the way as well.

    : I threw together a diagram of a 32*32 multiplier using around 1040 AND
    : gates and 400 74283 Adder circuits. This schematic would be what I
    : think is called "asynchronous".

    : Once I completed my diagrams I remembered something horrible. There is
    : a LOT of time spent waiting for the carry-in-out propagation in the
    : hundreds of adders which leads me to building an adder out of XOR and
    : AND gates. This comes at the added expense of power consumption board
    : space, but should speed things up by about 250x. Could anyone suggest
    : the best 1 bit addition "block" with carry? Its pretty late here and
    : I'm starting to loose brain function.

    : Crazy? Remember, a few weeks ago I completed soldering 19,008 LEDs to
    : make a display...which I haven't finished yet... (building the drivers)

    : One last thought for those of you who, well, you know who you are.
    : Idle, what would the power consumption of some 600 odd ICs be? 20ma
    : per device? This just might make more heat than a pentium!

    : Now...for floating point... ;)

    While some others have suggested changes like using a
    carry-lookahead adder for the final carry-propagate adder in the
    multiplier, I would suggest that you also look into Booth encoding (the
    two techniques can both be used.) At the expense of a little bit of extra
    complexity to generate the partial products, you can cut the number of
    rows of partial products approximately in half. As far as using a CLA for
    the final adder, I think that that's a good idea, because, if you are
    hell-bent on building this out of TTL parts,there is a 4-bit TTL
    carry-lookahead block (74x182?, if I recall.)

    So, a Booth-Array Carry-Save portion, followed by a
    Carry-Lookahead Carry Propagate portion, will be fast, and do-able, if
    you're really hell-bent on making this out of TTL parts. I also agree
    with the other poster that said that you really should get yourself a FPGA
    development platform, but whatever floats your boat!

    Have Fun,

    Joe
     
  17. Rich Grise

    Rich Grise Guest

    Yup. Crazy. Nuts. Certifiable. Bonkers. Wacko. Insane. ;-P
    Stupid? Remains to be seen. ;-D ;-D ;-D
    Get the integers going first! ;-)
    [enlightening discussion of carry techniques snipped]

    I've drawn a schematic using Xilinx's WebPack, of a 4x4 multiplier:
    http://www.abiengr.com/~sysop/images/mult-4x4-sch.gif

    The ANDs are just plain ANDs, and the ALU-looking thingies are nothing
    but 1-bit full adders.

    And I've been procrastinating on doing enough of their tutorials to
    figure out how to do the guzintas[1] and the guzoutas[2] and schtuff. ;-)

    Cheers!
    Rich
    [1] Goes Into
    [2] Goes [comes] Out Of
     
  18. logjam

    logjam Guest

    So, a Booth-Array Carry-Save portion, followed by a
    Do you think I will gain anything useful while trying to use the fewest
    parts to generate the result fast? From what I've seen so far of some
    FPGA desings they're pretty wastful "because you can".
     
  19. logjam

    logjam Guest

    I think I've fallen in love with 74S274 and 74S275. I don't think the
    130ns for a 32 bit product includes addition, maybe it does. I wonder
    why these parts had to be killed off???

    The parts are pretty expensive...$1.99 for a 274 and 2.99 for a 275,
    but compared to a few hundred logic gates??? Using these chips I might
    be able to get more than a million multiplications per second.
     
  20. logjam

    logjam Guest

    Here some links I found helpful:
    http://www.ece.ucsb.edu/~kastner/ece154/slides/ece154-5.pdf
    http://www.ecs.umass.edu/ece/koren/arith/slides/
    http://www.iwu.edu/~wzhu/classes/cs256/

    I think I like the combinational method used in the 1976 databook.
    Very fast, exceeds everything else I've found.

    I have two choices...use parts FROM 1975 for the circuit, or make it
    out of modern 283s, etc. Using the origional 74S parts I estimate 2A
    power consumption.

    I've found list prices at:
    74S274s for 1.99, I would need 64 of them
    74S275s for 2.99, (I haven't quite figured this one out yet, somewhere
    around 15)

    I would need 4 74F283s for every 74S274, but they're pretty cheap. In
    the end it would be smarter to go with 74S274s even though 256 16 pin
    devices on a PCB would look cool. I could go either way on this one.
    :) According to the 1975 ttl book a 74S283 completes a carry in 11ns
    and a 74S274 4x4 in 45ns, so I assume the 74S274 is pretty much 4
    74S283 and 16 AND gates The critical speed path is carry-in-out on all
    the 64 partial products is typ 5ns for all the AND gates and then 34ns
    through the adders (maximum of 7.5ns AND and 48ns adders).

    As long as the circuit would work fine with the parts from the 75 I'm
    happy. I don't need to buy a bunch of NOS chips, but I want to make
    sure I account for the delay of the old chips.

    I'll work a little more on this and post some schematics that hopefully
    one of you can look over. :) This is pretty exciting for me. :)
     
Ask a Question
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.
Electronics Point Logo
Continue to site
Quote of the day

-