32bit multiplication using TTL logic

Discussion in 'Electronic Basics' started by logjam, Feb 16, 2006.

1. logjamGuest

I want to design what would have been a super computer in 1975, using
parts that would have been "easily" available. Several people have
done this already, just not on an insane scale. My favorite is the
Magic-1, which I'm currently designing PCBs for. It has been a GREAT
learning experience studying the schematics from the Magic-1. I never
thought much about what a computer did during the mysterious clock
cycles between instruction cycles, and now with the microcode muxes,
registers, it all just makes sense.

Any way, I need my computer to be very good at math. The magic-1 is a
16bit Add/sub/compare machine. I want to make mine capable of 32bit
operations and have a 32 bit data path.

I'm starting with 32bit multiplication because it gets 32bit addition
out of the way as well.

I threw together a diagram of a 32*32 multiplier using around 1040 AND
gates and 400 74283 Adder circuits. This schematic would be what I
think is called "asynchronous".

Once I completed my diagrams I remembered something horrible. There is
a LOT of time spent waiting for the carry-in-out propagation in the
AND gates. This comes at the added expense of power consumption board
space, but should speed things up by about 250x. Could anyone suggest
the best 1 bit addition "block" with carry? Its pretty late here and
I'm starting to loose brain function.

Crazy? Remember, a few weeks ago I completed soldering 19,008 LEDs to
make a display...which I haven't finished yet... (building the drivers)

One last thought for those of you who, well, you know who you are.
Idle, what would the power consumption of some 600 odd ICs be? 20ma
per device? This just might make more heat than a pentium!

Now...for floating point...

2. Anthony FremontGuest

You really need (in this order) a girlfriend and an FPGA development
kit.

3. zootGuest

In the good old days, as I remmeber, Mulipications /dvisions were done
by sofwtare ( OS libraries ),
Some Aritmetics unit did it Async, most of the time using some
microcode or a hardwwre ( that's even older ) but none did it Sync ( 32
bits in parall )
Not sure butseems to to me 32bits* 32 bits = 64bits ??

Regards

4. Jonathan KirwanGuest

Bipolar Integrated Technologies, I think they were in Beaverton,
Oregon or nearby, used to make pure high-speed floating point ALU
chips. Might see if there is a white paper or two floating around
from them.

floating point. It's possible that someone has already developed some
Verilog or VHDL for such a beast and you might be able to look at the
code for ideas.

Finally, this was posted over on comp.arch.embedded and it may help
you think more closely about the class of individuals you are soon to
join:

http://www.cs.pdx.edu/~harry/Relay/index.html

Jon

5. ChrisGuest

The fabled "Machina Analytica"! Terminal blocks for the address and
data busses! Relays everywhere! St. Leibowitz and the Venerable
Boedullos are smiling somewhere, Jonathan.

But is that a 32K X 8 SRAM I see?...

"...He turned to the community, spread his hands, shrugged eloquently.

" 'It's still not Him', he told them sourly, then hobbled away.

"Afterwards, there was little formality."

Chris

6. ChrisGuest

The fabled "Machina Analytica"! Terminal blocks for the address and
data busses! Relays everywhere! St. Leibowitz and the Venerable
Boedullos are smiling somewhere, Jonathan.

But is that a 32K X 8 SRAM I see?...

"...He turned to the community, spread his hands, shrugged eloquently.

" 'It's still not Him', he told them sourly, then hobbled away.

"Afterwards, there was little formality."

Chris

7. logjamGuest

I found his web page just a month or two ago also. But the relatively
short life of a relay worries me. I found a few auctions on ebay for
3000 relays at ~\$200, but I don't want to hand wire such a thing.
SOMEONE needs to make a relay based web server. Maybe it only says
"Hello World", which would probably take about a minute or two...

When I thought about it last night I came up with a new solution. A
compromise between speed and board space. This is my first experience
with binary multiplication, so my first implimentation was very
wasteful!

I wanted to have a 64bit adder circuit with parallel carry generation,
but I forgot about fan-in/out. Getting to the end of the adder circuit
would require a LOT of components too. Maybe 16bit parallel by 4
serial is more reasonable.

If I try to multiply 1111 by 1111 I should get 11100001.

Carry bit | multiplicand | multiplier

Start:
0 0000 1111 - Next Op, P0=1, add 1111 0000 to 0000 1111
0 1111 1111 - Next Op, shift right 1 bit
0 0111 1111 - Next Op, P0=1, add 1111 0000 to 0111 1111
1 0110 1111 - Next Op, shift right 1 bit
0 1011 0111 - Next Op, P0=1, add 1111 0000 to 1011 0111
1 1010 0111 - Next Op, shift right 1 bit
0 1101 0011 - Next Op, P0=1, add 1111 0000 to 1101 0011
1 1100 0011 - Next Op, shift right 1 bit
0 1110 0001 - 4 repetitions? yes
Done

Requirements are a register to store the 32bit multiplicand, 64bit flip
flop shift right register (carry out from adder gets shifted in),
possibly a 64bit register to store the product/working value.

This would require the least amount of parts, but limit me to
multiplication with the circuit. It would be pretty fast since it only

In the end it might be worth having a separate multiply / shift right
circuit and then an additional add / subtract circuit. It all depends
on the speed path of the full 64bit adder.

ALU flow would probably come from a fast ROM or even SRAM loaded by the
host at boot-up.

Anything I've missed that could speed up this operation? I'm
interested in reducing components, but not at a huge speed drop. The
process will take 32 loops, so 10ns more per loop is a lot.

8. Jonathan KirwanGuest

What's really neat for me, I think... it's right here in my city!! So
I may hope to meet him and watch it work.

Jon

9. Jonathan KirwanGuest

Nor would I.
Well, remember that each and every "bit" of memory requires a lot of
space and handiwork to make. It would be excessively large, memory
wise. I think anyone doing something like this, with relays, is going
to be very conscious of getting the most from the least effort. And
so a lot of thought will go into that aspect. (Which is why I'd like
to visit, because I'm sure I'd learn a few things developed out of
has the most payoff in the result.)
I think a lot of folks have been down the same path you are going, as
well. Trade-offs between combinatorial and sequential approaches.
Particularly, in this area of division, multiplication, and so on.

When thinking about an adder, though, why not take this to an extreme
here and think about doing it with a single full adder and sequential
logic? One bit at a time. Between that and a fully combinatorial
solution lays a lot of fun.
First off, just by way of interesting information on the side, in
multipliers there is a trade-off in combinatorial vs synchronous
implementations. A 4x4's combinatorial equivalent gate area just
about equals a sequential implementation of the same thing, in fact.
Below a 4x4, combinatorial is actually less gates. Above 4x4,
sequential is less gates. Your 4x4 suggestion here is right on the
cusp. You can go either way.
Since you are talking sequential here, the common approach I've
the structure looks like:

4
X ================================\
\\
4 ,-------, ||
Y =======>| YR | ||
'-------' /===\ ||
||4 // \\ ||
|| || || ||
\/ \/ || ||
,-----, ,-----, || ||
\ \ / / || ||
\ \ / / || ||
\ \/ / || ||
/\ / || ||
/ \ / || ||
1/ '----' // ||
/ || // ||
/ ||4 // 4 || 8
/ || /====== || =========> ANSWER
| || // || //
| || // || //4
v \/ // \/ //
,---, ,--------, ,--------,
0-->| C |--->| ALUR |--->| XR |
'---' '--------' '--------'
/\
||
0000

You start out through your execution control logic (not shown), by
latching X into XR, Y into YR, 0000 into ALUR, 0 into C, and some
sequential step counter (not shown) to an initial value of 4. That
counter will have to be able to decrement and test for 0 (not shown.)

That is step (0).

Then the following logic is followed:

(1) if LSB of XR is 1, latch C:ALUR from ALU.
(2) shift right, "0":C:ALUR:XR
(3) decrement counter
(4) if counter not zero, go to step (1)
(5) Result is ALUR:XR

YR C ALUR XR counter
(0) 1111 0 0000 1111 4
(1) 1111 0 1111 1111 4
(2) 1111 0 0111 1111 4
(3) 1111 0 0111 1111 3
(1) 1111 1 0110 1111 3
(2) 1111 0 1011 0111 3
(3) 1111 0 1011 0111 2
(1) 1111 1 1010 0111 2
(2) 1111 0 1101 0011 2
(3) 1111 0 1101 0011 1
(1) 1111 1 1100 0011 1
(2) 1111 0 1110 0001 1
(3) 1111 0 1110 0001 0
(5) ============

Is that about what you are talking about here? It only requires a
4-bit adder to produce an 8-bit output.
Of course.
Have you looked at Booth's?
all been done, somewhere. Have you looked over the literature on the
subject?

Jon

10. John LarkinGuest

The IBM 704 did floating point with tubes!

http://www-03.ibm.com/ibm/history/exhibits/mainframe/mainframe_PP704.html

John

11. ChrisGuest

I'd like to see it in operation, too. Obviously a rather extensive
object lesson in building an ALU -- arcane, but very appropriate to his
classwork. No one will ever accuse him of not taking his job
seriously. ;-)

There are four relays for the clock -- I'm wondering if he's got a
three-phase or two phase clock.

Clock speed = 10Hz? Mirth still lives in our world -- if nothing else,
Professor Porter has a sense of humor.

Chris

12. logjamGuest

If you guys want to see/hear a much bigger relay computer, look here:

http://www.relaiscomputer.de/

Click on Downloads and then listen to all of the sounds. In one of
them you can here the electric typewriter banging out the answer!

1500 relays vs 400

13. logjamGuest

Well, I've drawn the 4 bit adder in my schematic program and I'm not
sure the board space consumption from a 32bit parallel carry adder are
worth the very little speed increase.

I want to make sure that I'm using parts that were available in the
late 70s. Does anyone know of a list of parts and their introduction
date? Or an old databook in PDF format?

Cheers!
Rich

Have Fun!
Rich

16. GuestGuest

: I want to design what would have been a super computer in 1975, using
: parts that would have been "easily" available. Several people have
: done this already, just not on an insane scale. My favorite is the
: Magic-1, which I'm currently designing PCBs for. It has been a GREAT
: learning experience studying the schematics from the Magic-1. I never
: thought much about what a computer did during the mysterious clock
: cycles between instruction cycles, and now with the microcode muxes,
: registers, it all just makes sense.

: Any way, I need my computer to be very good at math. The magic-1 is a
: 16bit Add/sub/compare machine. I want to make mine capable of 32bit
: operations and have a 32 bit data path.

: I'm starting with 32bit multiplication because it gets 32bit addition
: out of the way as well.

: I threw together a diagram of a 32*32 multiplier using around 1040 AND
: gates and 400 74283 Adder circuits. This schematic would be what I
: think is called "asynchronous".

: Once I completed my diagrams I remembered something horrible. There is
: a LOT of time spent waiting for the carry-in-out propagation in the
: AND gates. This comes at the added expense of power consumption board
: space, but should speed things up by about 250x. Could anyone suggest
: the best 1 bit addition "block" with carry? Its pretty late here and
: I'm starting to loose brain function.

: Crazy? Remember, a few weeks ago I completed soldering 19,008 LEDs to
: make a display...which I haven't finished yet... (building the drivers)

: One last thought for those of you who, well, you know who you are.
: Idle, what would the power consumption of some 600 odd ICs be? 20ma
: per device? This just might make more heat than a pentium!

: Now...for floating point...

While some others have suggested changes like using a
multiplier, I would suggest that you also look into Booth encoding (the
two techniques can both be used.) At the expense of a little bit of extra
complexity to generate the partial products, you can cut the number of
rows of partial products approximately in half. As far as using a CLA for
the final adder, I think that that's a good idea, because, if you are
hell-bent on building this out of TTL parts,there is a 4-bit TTL
carry-lookahead block (74x182?, if I recall.)

So, a Booth-Array Carry-Save portion, followed by a
Carry-Lookahead Carry Propagate portion, will be fast, and do-able, if
you're really hell-bent on making this out of TTL parts. I also agree
with the other poster that said that you really should get yourself a FPGA
development platform, but whatever floats your boat!

Have Fun,

Joe

17. Rich GriseGuest

Yup. Crazy. Nuts. Certifiable. Bonkers. Wacko. Insane. ;-P
Stupid? Remains to be seen. ;-D ;-D ;-D
Get the integers going first! ;-)
[enlightening discussion of carry techniques snipped]

I've drawn a schematic using Xilinx's WebPack, of a 4x4 multiplier:
http://www.abiengr.com/~sysop/images/mult-4x4-sch.gif

The ANDs are just plain ANDs, and the ALU-looking thingies are nothing

And I've been procrastinating on doing enough of their tutorials to
figure out how to do the guzintas[1] and the guzoutas[2] and schtuff. ;-)

Cheers!
Rich
[1] Goes Into
[2] Goes [comes] Out Of

18. logjamGuest

So, a Booth-Array Carry-Save portion, followed by a
Do you think I will gain anything useful while trying to use the fewest
parts to generate the result fast? From what I've seen so far of some
FPGA desings they're pretty wastful "because you can".

19. logjamGuest

I think I've fallen in love with 74S274 and 74S275. I don't think the
130ns for a 32 bit product includes addition, maybe it does. I wonder
why these parts had to be killed off???

The parts are pretty expensive...\$1.99 for a 274 and 2.99 for a 275,
but compared to a few hundred logic gates??? Using these chips I might
be able to get more than a million multiplications per second.

20. logjamGuest

http://www.ece.ucsb.edu/~kastner/ece154/slides/ece154-5.pdf
http://www.ecs.umass.edu/ece/koren/arith/slides/
http://www.iwu.edu/~wzhu/classes/cs256/

I think I like the combinational method used in the 1976 databook.
Very fast, exceeds everything else I've found.

I have two choices...use parts FROM 1975 for the circuit, or make it
out of modern 283s, etc. Using the origional 74S parts I estimate 2A
power consumption.

I've found list prices at:
74S274s for 1.99, I would need 64 of them
74S275s for 2.99, (I haven't quite figured this one out yet, somewhere
around 15)

I would need 4 74F283s for every 74S274, but they're pretty cheap. In
the end it would be smarter to go with 74S274s even though 256 16 pin
devices on a PCB would look cool. I could go either way on this one.
According to the 1975 ttl book a 74S283 completes a carry in 11ns
and a 74S274 4x4 in 45ns, so I assume the 74S274 is pretty much 4
74S283 and 16 AND gates The critical speed path is carry-in-out on all
the 64 partial products is typ 5ns for all the AND gates and then 34ns