how to implement the "how many" logic function?

Tony Roe · Aug 28, 2003

I have no idea of the correct name for this function, but I need to generate a
3bit output corresponding to how many of 8 inputs are high. Eg if the inputs
were 11010110, then 5 of these are "1", so the output would be 5 (101 binary).
I'd like to implement the function in a CUPL device if possible. Any
tips/suggestions?
Regards,
Tony (remove "_" from email address to reply)

Jim Thompson · Aug 28, 2003

I have no idea of the correct name for this function, but I need to generate a
3bit output corresponding to how many of 8 inputs are high. Eg if the inputs
were 11010110, then 5 of these are "1", so the output would be 5 (101 binary).
I'd like to implement the function in a CUPL device if possible. Any
tips/suggestions?
Regards,
Tony (remove "_" from email address to reply)

I think what you want was called "majority logic" back in the '60's,
although what you describe is really just a 8-input (each input
one-bit) adder.

...Jim Thompson

Sir Charles W. Shults III · Aug 28, 2003

Jim is correct - this is majority logic. You can emulate it any number of
ways, but the most basic is to program a pattern into an EPROM or other memory
device.

Cheers!

Chip Shults
My robotics, space and CGI web page - http://home.cfl.rr.com/aichip

Jim Thompson · Aug 28, 2003

Jim is correct - this is majority logic. You can emulate it any number of
ways, but the most basic is to program a pattern into an EPROM or other memory
device.

Cheers!

Chip Shults
My robotics, space and CGI web page - http://home.cfl.rr.com/aichip

Yep. I was puzzling over how the original was made... it was done in
CML, but used current imbalance to set a "1" or "0".

But the OP wants a value versus a yeh/nay output.

My initial thoughts were a crude D-A followed by a D-A, but addressing
in a PROM is probably the most succinct.

...Jim Thompson

MarkyMark · Aug 28, 2003

Tony,
Here's a C code fragment that I use whenever I need to count ones. I assume
it should be reasonable to convert to a logic implementation. It would be
more efficient than an LUT.
Hope it helps.

unsigned char char_count_bits( unsigned char x )
{
x = ((0xAA & x)>>1) + (0x55 & x);
x = ((0xCC & x)>>2) + (0x33 & x);
x = ((0xF0 & x)>>4) + (0x0F & x);
return x; /* returns number of ones found in x */
}

Regards Mark

GPE · Aug 28, 2003

Actually, a Majority vote circuit looks at three (typically) values and give
you a 'vote of the majority'. If 50%+ bits are set - you get a one, if 50%+
are clear - you get a zero. These are widely used in comm equipment.

What Tony is after is a summation of bits set. I have a 64 bit summation in
VHDL. But at 150Mbps - the result is pipelined.

The way I did it - I first break up initial into 4-bit chunks. Do a quick
summation of four bits -- 0 to 3 plus carry. I then have outputs from
summation stage going into adders. In my case, I register the results of
each stage -- and delay the output data by a matching number of cycles.
Tony's input width will determine how many adders he needs.

-- Ed

Lowly Engineer · Aug 28, 2003

If you don't need full logic speed, a Microchip PIC microcontroller
can easily handle the task

By my estimate, you could latch the 8 bits and produce a result in
about 5uSec.

Fred Bloggs · Aug 28, 2003

I have no idea of the correct name for this function, but I need to generate a
3bit output corresponding to how many of 8 inputs are high. Eg if the inputs
were 11010110, then 5 of these are "1", so the output would be 5 (101 binary).
I'd like to implement the function in a CUPL device if possible. Any
tips/suggestions?
Regards,
Tony (remove "_" from email address to reply)

How about a multi-bit "bubble swap" followed by thermometer code to
binary conversion- seems fun if nothing else. Adders are for drudges
with more macrocells than brains.

John Fields · Aug 28, 2003

I have no idea of the correct name for this function, but I need to generate a
3bit output corresponding to how many of 8 inputs are high. Eg if the inputs
were 11010110, then 5 of these are "1", so the output would be 5 (101 binary).
I'd like to implement the function in a CUPL device if possible. Any
tips/suggestions?

Here's a brute-force way to do it in 6800 assembler:

inreg contains the the input word and sumit will contain the number of
1's in inreg after the return from the call.

jsr sumones

Grog · Aug 28, 2003

Tony Roe said:
I have no idea of the correct name for this function, but I need to generate a
3bit output corresponding to how many of 8 inputs are high. Eg if the inputs
were 11010110, then 5 of these are "1", so the output would be 5 (101 binary).
I'd like to implement the function in a CUPL device if possible. Any
tips/suggestions?
Regards,
Tony (remove "_" from email address to reply)

Perhaps latch the parallel data into a shift register and
use the serial data out to clock a binary counter.
Use the load signal to reset the counter and data latch the
counter output.

There's a bit more to it but you get the idea I trust.
HTHs
Greg the Grog.

Greetings MarkyMark the lurker...

Zak · Aug 28, 2003

Tony said:
I have no idea of the correct name for this function, but I need to generate a
3bit output corresponding to how many of 8 inputs are high. Eg if the inputs
were 11010110, then 5 of these are "1", so the output would be 5 (101 binary).
I'd like to implement the function in a CUPL device if possible. Any
tips/suggestions?

It is called population count or popcount. The latter term returns
useful references in google groups, it seems.

Some processors inplement it for cryptographic/cryptanalisis stuff, and
I remember seeing at least one clever shortcut method. Don't remember
how it worked though, but it was much faster than counting the bits in a
loop.

However the software solutions you will find will not be optimal for
hardware, but they may give interesting ideas.

A fast software solutions seems to be using bit masks to do adding 'in
parallel' by preventing overflows.

I past from Van:Tim Peters ([email protected])'s post in
comp.lang.python:

3) Cleverness via tricking long-int operations into doing many adds "in
parallel":

# Build mask whose value is >= widerthan, of binary form
# width 1's, width 0's, width 1's, ..., width 0's, width 1's.
# E.g., width=1 leads to 0x555555555...55L,
# width=2 leads to 0x333333333...33L,
# width=4 leads to 0xF0F0F0F0F...0FL
# width=8 leads to 0xFF00FFF00...FFL
# and so on.

def buildmask(width, widerthan):
answer = (1L << width) - 1
while answer < widerthan:
width = width << 1
answer = (answer << width) | answer
return answer

def popcount2(x):
width = 1
while x >> width:
lomask = buildmask(width, x)
x = (x & lomask) + ((x >> width) & lomask)
width = width << 1
return int(x)

popcount2 does a number of long-int operations proportional to the *log* of
the (total, not just "set") number of bits in x. There are many ways to
speed it up (e.g., "x >> width" is an expensive common subexpression in
popcount2's loop, and lots of things could be done to cache results and/or
otherwise speed up buildmask -- believe it or not, the code above was
written for clarity <wink>).

I guess the optimal result will be a cascade of adders, leading to a
result in 2log(n) time.

To do 8 bits, you need 4 adders of 1 bit each.

These give 4 results of 2 bits each. Add these with 2 adders with 2 bits
input width.

Finally, an adder with 3 bits in produces the popcount for your input byte.

Thomas

kansas_ray · Aug 28, 2003

MarkyMark said:
Tony,
Here's a C code fragment that I use whenever I need to count ones. I assume
it should be reasonable to convert to a logic implementation. It would be
more efficient than an LUT.
Hope it helps.

unsigned char char_count_bits( unsigned char x )
{
x = ((0xAA & x)>>1) + (0x55 & x);
x = ((0xCC & x)>>2) + (0x33 & x);
x = ((0xF0 & x)>>4) + (0x0F & x);
return x; /* returns number of ones found in x */
}

Regards Mark

Does this really work? I tried 0x35 and calculated 0x02 instead of 0x04.
What am I missing?
Regards,
Ray

x = ((0xAA & 0x35)>>1) + (0x55 & 0x35) = 0x25
x = ((0xCC & 0x25)>>2) + (0x33 & 0x25) = 0x22
x = ((0xF0 & 0x22)>>4) + (0x0F & 0x22) = 0x02

Frank Bemelman · Aug 28, 2003

kansas_ray said:
Does this really work? I tried 0x35 and calculated 0x02 instead of 0x04.
What am I missing?
Regards,
Ray

x = ((0xAA & 0x35)>>1) + (0x55 & 0x35) = 0x25
x = ((0xCC & 0x25)>>2) + (0x33 & 0x25) = 0x22
x = ((0xF0 & 0x22)>>4) + (0x0F & 0x22) = 0x02

Let's do that first line again, 0xAA & 0x35 -> 0x20
and after the shift it's 0x10? + 0x55 & 0x35 makes
0x35, not 0x25?

I'm going to bed now.

Tony Roe · Aug 29, 2003

Unfortunately I DO need the logic's speed (ideally want to do 4 complete sets of
these every 300ns, plus lots of other logic-type stuff). But you reminded me
just how quick the faster PICs can be.

Less than that if you waste a little memory:-

addwf pc,1
retlw 0
retlw 1
retlw 1
retlw 2

Inelegant, but only 2 instruction cycles, that's around 200nsecs with a
40MHz clock

Al

Regards,
Tony (remove "_" from email address to reply)

Tony Roe · Aug 29, 2003

Thanks for the alt.binaries.schematics.electronic tip, and the "ones counter"
name (I replied to that post as well). And thanks to all for the stimulating
replies.

Here's a brute-force way to do it in 6800 assembler:

inreg contains the the input word and sumit will contain the number of
1's in inreg after the return from the call.

jsr sumones
.
.
.

sumones: clr sumit

hot7: brclr 7,inreg,hot6
inc sumit
hot6: brclr 6,inreg,hot5
inc sumit
hot5: brclr 5,inreg,hot4
inc sumit
hot4: brclr 4,inreg,hot3
inc sumit
hot3: brclr 3,inreg,hot2
inc sumit
hot2: brclr 2,inreg,hot1
inc sumit
hot1: brclr 1,inreg,hot0
inc sumit
hot0: brclr 0,inreg,out
inc sumit
out: rts

A way to do it in hardware at alt.binaries.schematics.electronic under
"Ones counter".

Regards,
Tony (remove "_" from email address to reply)

Tony Roe · Aug 29, 2003

Conceptually intriguing, but as a beginner in programmable logic, even the lowly
adder still holds my interest. And it doesn't seem that any alternative to the 3
stage adder will take very much less resources or stages.

How about a multi-bit "bubble swap" followed by thermometer code to
binary conversion- seems fun if nothing else. Adders are for drudges
with more macrocells than brains.

Regards,
Tony (remove "_" from email address to reply)

Tony Roe · Aug 29, 2003

As far as I know (not very far, really), the problem with the straightforward
approach is that CUPL chips (and I suspect all the alternatives as well) are
limited in how many terms may be combined. Eg, output bit 1 should be set if
there are 2, 3, 6 or 7 "1"s in the input byte. And there are 120 different bytes
containing 2, 3, 6 or 7 "1"s, much wider than any PLD I have seen. If I've
misunderstood you, please explain.

Wouldn't that be a simple eight-bit under three-bit Karnaugh map solution
gizmo?

Or better yet, some kind of fancy Venn diagram source diagram???

Just checking. . .

Regards,
Tony (remove "_" from email address to reply)

John Fields · Aug 29, 2003

Unfortunately I DO need the logic's speed (ideally want to do 4 complete sets of
these every 300ns, plus lots of other logic-type stuff). But you reminded me
just how quick the faster PICs can be.

Fred Bloggs · Aug 29, 2003

Fred said:
Tony said:

Conceptually intriguing, but as a beginner in programmable logic, even
the lowly
adder still holds my interest. And it doesn't seem that any
alternative to the 3
stage adder will take very much less resources or stages.

Click to expand...

This nut can be cracked with a 5ns conversion, and with an obtainable
device, if you take it one bit at a time. Here is one way it can be done:

If you start with two 1 of 16 decoders, each operating on half the input
byte, then you end up with two sets of outputs, say DU4, DU3, DU2, DU1,
DU0, and DL4, DL3, DL2, DL1, and DL0 with obvious notation DXn means n
bits are "1" in nibble X. Then the final output can be derived like so:

F7=DU4.DL3+DU3.DL4
F6=DU4.DL2+DU3.DL3+DU2.DL4
F5=DU4.DL1+DU3.DL2+DU2.DL3+DU1.DL4
F4=DU4.DL0+DU3.DL1+DU2.DL2+DU1.DL3+DU0.DL4
F3=DU3.DL0+DU2.DL1+DU1.DL2+DU0.DL3
F2=DU2.DL0+DU1.DL1+DU0.DL2
F1=DU1.DL0+DU0.DL1

-and this is encoded into binary as [ OUT2 OUT1 OUT 0]:

OUTO=F7+F5+F3+F1
OUT1=F7+F6+F3+F2
OUT2=F7+F6+F5+F4

The 1 of 16 decoder operating on [ B3 B2 B1 B0 ] would look something like:

Please view in a fixed-width font such as Courier.

B1
B3 \ B0
B2 \ 00 01 11 10
+-----+-----+-----+-----+
00 | 0| 1| 3| 2|
| D0 | D1 | D2 | D1 |
+-----+-----+-----+-----+
01 | 4| 5| 7| 6|
| D1 | D2 | D3 | D2 |
+-----+-----+-----+-----+
11 | 12| 13| 15| 14|
| D2 | D3 | D4 | D3 |
+-----+-----+-----+-----+
10 | 8| 9| 11| 10|
| D1 | D2 | D3 | D2 |
+-----+-----+-----+-----+

__ __ __ __
D0= B3.B2.B1.B0
__ __ __ __ __ __ __
D1= B3.B2.(B1.B0+B1.B0)+B3.B2.B1.B0
__ __ __ __ __ __ __ __ __ __
D2= B3.B2.B1.B0+B3.B2.(B1.B0+B1.B0)+B3.B2.B1.B0+B3.B2.(B1.B0+B1.B0)
__ __ __ __
D3=B3.B2.B1.B0+B3.B2.(B1.B0+B1.B0)+B3.B2.B1.B0

D4=B3.B2.B1.B0

-so the maximum complexity is 6 minterms with D2.

Then this diagram looks like so:
Please view in a fixed-width font such as Courier.

b b b b b b b b
7 6 5 4 3 2 1 0
/ / / / \ \ \ \
+----------- ---------+
| |
/4 /4
| |
+---------------------+ +--------------------+
| | | |
| DECODER | | DECODER |
| | | |
| DU4 DU3 DU2 DU1 DU0 | |DL4 DL3 DL2 DL1 DL0 |
+---------------------+ +--------------------+
| 5 5 |
+----/----+ +-----/-----+
| |
+-----------------------+
| |
| LOGIC COMBINE |
| |
| F7 F6 F5 F4 F3 F2 F1 |
+-----------------------+
7|
/
|
+----------------------+
| |
| BINARY ENCODE |
| |
+----------------------+
|
|
\|/

3-BIT BINARY

[ OUT2 OUT1 OUT0 ]

So it's down to three gate delays...simple array logic with plain
combinatorial feedthrough- no interstage pipelining/synchronization
required. I leave it to you to double check the equations.

Technically that's a 1 of 16 decoder with outputs combined to yield
number of bits set etc....all in one...

kansas_ray · Aug 29, 2003

MarkyMark said:
Tony,
Here's a C code fragment that I use whenever I need to count ones. I assume
it should be reasonable to convert to a logic implementation. It would be
more efficient than an LUT.
Hope it helps.

unsigned char char_count_bits( unsigned char x )
{
x = ((0xAA & x)>>1) + (0x55 & x);
x = ((0xCC & x)>>2) + (0x33 & x);
x = ((0xF0 & x)>>4) + (0x0F & x);
return x; /* returns number of ones found in x */
}

Regards Mark

Does this really work? I tried 0x35 and calculated 0x02 instead of 0x04.
What am I missing?
Regards,
Ray

x = ((0xAA & 0x35)>>1) + (0x55 & 0x35) = 0x25
x = ((0xCC & 0x25)>>2) + (0x33 & 0x25) = 0x22
x = ((0xF0 & 0x22)>>4) + (0x0F & 0x22) = 0x02

Moore's Lobby Podcast

Menu

Categories

Platforms

Content

Connect With Us

Network

how to implement the "how many" logic function?

how to implement the "how many" logic function?

Tony Roe

Jim Thompson

Sir Charles W. Shults III

Jim Thompson

MarkyMark

GPE

Lowly Engineer

Fred Bloggs

John Fields

Grog

Zak

kansas_ray

Frank Bemelman

Tony Roe

Tony Roe

Tony Roe

Tony Roe

John Fields

Fred Bloggs

kansas_ray

Similar threads