How to develop a random number generation device

MooseFET · Sep 20, 2007

[email protected] says... [...]

The way I had imagined it was that the registers of the virtual CPUs
that are not currently running would be in a different place than the
ones that are actually being used. My concern was not increasing the
fan in and out of the busses on the ALU so that there would be no
increase in the loading and hence delay in those circuits.

Click to expand...

If they're "somewhere" else, they have to be un/re/loaded. That
takes substantial time.

Yes, it may take a clock cycle to do the register swapping. Reducing
the number of registers on the bus allows those clock cycles to be at
a higher frequency so I think the advantage will out weigh the
disadvantage. BTW: I'm assuming several CPUs and lost of sets of
registers are on one chip.

You're going to have to figure out which
registers to un/re/load at point. Remember, if you want to switch
virtual CPUs at any time, you're going to have to not only
save/re/load all architected registers but renamed registers unless
you plan on quiesceing/flushing the execution unit between virtual
CPU switches.

I was thinking in terms of a not very pipelined CPU so that the switch
over could happen in a few cycles. The registers currently being
written would have to stay in place until the write finished. This is
part of why I'm assuming a fairly simple CPU.

More busses => more register file ports, which is worse than adding
registers to the file.

I don't see how you come to that conclusion.

A lot of things change when transistors become less expensive than
the wires between them. ;-)

Yes and when a multiply doesn't draw an amp.

Pipelines lose quickly because you have to subtract (clock_jitter +
setup/hold) * pipe_stages from throughput. The P-IV is a good
example of this. ...about the only thing it's a good example of,
other than how *not* to architect a processor.

Yes, pipelines don't solve everything but for some operations like the
1/sqrt(), they can make a lot of sense. The process can be broken
into three steps or four if you twist things about a bit:

Step 1:
Take the input number and look in a table to get an initial estimate.

Step 2:
ShouldBeOne = Y * Y * X
Y = Y * 0.5 * (2 - ShouldBeOne) * (1 - K1*(ShouldBeOne-1)^2)

Step 3:
Don't do the 2nd order part and repeat several times.

Like many problems, start with a lookup table.

My 32 bit -> 16 bit integer sqrt() for the 8051 doesn't use a look up
table and yet is fairly quick about it. It uses two observations:

1 - The sum of the first N odd numbers is N^2

2 - If you multiply X * 4, sqrt(X) doubles and both are shifts.

JosephKK · Sep 20, 2007

MooseFET [email protected] posted to sci.electronics.design:

On Wed, 19 Sep 2007 05:04:34 -0700, Martin Brown

But in proper synchronous design, controlled by
state machines, immensely complex stuff just works. It's sort of
ironic that in a big logic design, 100K gates and maybe 100 state
machines, everything happens all at once, every clock, across the
entire chip, and it works. Whereas with software, there's only one
PC, only one thing happens, at a single location, at a time, and
usually nobody can predict the actual paths, or write truly
reliable code.

Click to expand...

4G of RAM * 8 Bits is a lot more bits than a 100K gates. You need
to keep your sizes equal if you want to make a fair comparison.

On 18 Sep, 17:12, John Larkin

Click to expand...

On Sep 17, 7:55 pm, John

[....]

Click to expand...

Programmers have pretty much proven that they cannot write
bug-free large systems.

Click to expand...

In every other area, humans make mistakes and yet we seem
surprised that programmers do too.

Click to expand...

In most other areas of endeavour small tolerance errors do not
so often lead to disaster. Boolean logic is less forgiving. And
fence

Click to expand...

Software programming hasn't really had the true transition to a
hard engineering discipline yet. There hasn't been enough
standardisation

Click to expand...

Compare a software system to an FPGA. Both are complex, full of
state machines (implicit or explicit!), both are usually
programmed in a heirarichal language (C++ or VHDL) that has a
library of available modules, but the FPGAs rarely have bugs
that get to the field, whereas most software rarely is ever
fully debugged.

Click to expand...

I think that hardware engineers get a better grounding in logical
design (although I haven't looked at modern CS syllabuses so I may
be out of date).

Click to expand...

Hardware can be spaghetti too, and can be buggy and nasty, if one
does asynchronous design. But in proper synchronous design,
controlled by state machines, immensely complex stuff just works.
It's sort of ironic that in a big logic design, 100K gates and
maybe 100 state machines, everything happens all at once, every
clock, across the entire chip, and it works. Whereas with software,
there's only one PC, only one thing happens, at a single location,
at a time, and usually nobody can predict the actual paths, or
write truly reliable code.

But it is mostly a cultural thing. Software houses view minimum
time to market and first mover advantage to gain maximum market
share as more important than correct functionality. And it seems
they are right. Just look at Microsoft Windows vs IBMs OS/2 a
triumph of superb marketting over technical excellence!

Click to expand...

And I have bought my fair share of hardware that made it onto the
market bugs and all too. My new fax machine caught fire. Early V90
modems that only half work etc.

Click to expand...

So, computers should use more hardware and less software to
manage resources. In fact, the "OS kernal" of my multiple-CPU
chip could be entirely hardware. Should be, in fact.

Click to expand...

You are treating the symptoms and not the disease. Strongly typed
languages already exist that would make most of the classical
errors of C/C++ programmers go away. Better tools would help in
software development, but until the true cost of delivering faulty
software is driven home the suits will always go for the quick
buck.

Click to expand...

No, I am making the true observation that complex digital logic
designs are usually bug-free, simple software systems have a chance
of being so, and complex software systems never are.

John

Click to expand...

May i introduce you to a concept called cyclomatic complexity. The
cyclomatic complexity of 100's of interacting state machines is on
the order of 10^5 to 10^6. A memory array of regular blocks of
storage accessed by a regular decoder has a cyclomatic complexity of
on the order of 10 to 10^2. In the memory there is much
self-similarity across several orders of magnitude in size.

John Larkin · Sep 20, 2007

John said:
John said:

John Larkin wrote:
On Mon, 17 Sep 2007 23:04:03 +0200, David Brown

John Larkin wrote:
On Mon, 17 Sep 2007 18:40:35 +0200, David Brown

John Larkin wrote:
On Sun, 16 Sep 2007 22:07:42 +0200, David Brown

John Larkin wrote:
On Sep 15, 11:09 am, John Larkin
[....]
architecture. In a few years we'll have, say, 1024 processors on a
chip, and something new will be required to manage them. It will be a
thousand times simpler and more reliable than Windows.
I think that the number of virtual cores will grow faster than the
number fo real cores. With extra register banks and a bit of clever
design, a single ALU can look like two slightly slower ones.

I expect to see multicore machines with less actual floating point
ALUs than actual integer ALUs.

Sounds sort of like Sun's Niagra chips, which have (IIRC) 8 cores, each
with 4 threads, but only a few floating point units. For things like
web serving, it's ideal.

Yup. Low-horsepower tasks can just be a thread on a multithread core,
and many little tasks don't need a dedicated floating-point unit.

My point/fantasy is that OS design should change radically if many,
many real or virtual CPUs are available. One CPU would be the manager,
and every task, process, or driver could have its own, totally
confined and protected, CPU, and there would be no context switching
ever, and few interrupts in fact.

That's not going to work for Linux, anyway - there is a utility thread
spawned per cpu at the moment (work is underway to avoid this, because
it is a bit of a pain when you have thousands of cpus in one box).

However, there is no point in having a cpu (or even a virtual cpu)
dedicated to each task. Many sorts of tasks spend a lot of time
sleeping while waiting for other events - a cpu in this state is a waste
of resources.
Only if you think of a CPU as a valuable resource. As silicon shrinks,
a CPU becomes a minor bit of real estate. It makes sense to use it
when there's something to do, and put it to sleep when there's not.
Lots of power gets saved by not doing context switches.

CPUs *are* a valuable resource - modern cpu cores take up a lot of
space, even when you exclude things like the cache (which take more
space, but cost less per mm^2 since you can design in a bit of
redundancy and thus tolerate some faults).

The more CPUs you have, the more time and space it costs to keep caches
and memory accesses coherent. There are some sorts of architectures
which work well with multiple CPU cores, but these are not suitable for
general purpose computing.

My point is that large numbers of CPU cores *will* become common and
cheap, and we need a new type of OS to take advantage of this new
reality. Done right, it could be simple and astoundingly secure and
reliable.

I would be very surprised to see a system where the number of CPU cores
was greater than the number of processes. I expect to see the number of
cores increase, especially for server systems, but I don't expect to see
systems where it is planned and expected that most cores will sleep most
of the time.

Well, I remember 64-bit static rams, and 256-bit DRAMS. I can't see
any reason we couldn't have 256 or 1024 cpu's on a chip, especially if
a lot of them are simple integer RISC machines.

You can certainly get 1024 CPUs on a chip - there are chips available
today with hundreds of cores. But there are big questions about what
you can do with such a device - they are specialised systems. To make
use of something like that - you'd need a highly parallel problem (most
desktop applications have trouble making good use of two cores - and it
takes a really big web site or mail gateway to scale well beyond about
16 cores). You also have to consider the bandwidth to feed these cores,
and be careful that there are no memory conflicts (since cache coherency
does not scale well enough).
No, no, NO. You seem to be assuming that we'd use multiple cores the
way Windows would use multiple cores. I'm not talking about solving
big math problems; I'm talking about assigning one core to be a disk
controller, one to do an Ethernet/stack interface, one to be a printer
driver, one to be the GUI, one to run each user application, and one
to be the system manager, the true tiny kernal and nothing else.
Everything is dynamically loadable, unloadable, and restartable. If a
core is underemployed, it sleeps or runs slower; who cares if
transistors are wasted? This would not be a specialized system, it
would be a perfectly general OS with applications, but no process
would hog the machine, no process could crash anything else, and it
would be fundamentally reliable.

That would be an absurd setup. There is some justification for wanting
multiple simple cores in server systems (hence the Sun Niagara chips),
but not for a desktop system. The requirements for a disk controller, a
browser, and Doom are totally different. With a few fast cores like
today's machines, combined with dedicated hardware (on the graphics
card), you get a pretty good system that can handle any of these. With
your system, you'd get a chip with a couple of cores running flat out
(but without a hope of competing with a ten year old PC, as they could
not have comparable bandwidth, cache, or computing resources in each
core), along with a few hundred cores doing practically nothing. In
fact, most of the cores would *never* be used - they are only there in
case someone wants to do a few extra things at the same time since you
need a core per process.

This is not about performance; hardly anybody needs gigaflops. It's
all about reliability.
Until you can come up with some sort of justification, however vague, as
to why you think one cpu per process is more reliable than context
switches, this whole discussion is useless.

Click to expand...

You define yourself by the ideas you refuse to consider. So I suppose
you'll still be running Windows 20 years from now.

Click to expand...

I run windows (on desktops) and Linux (on a desktop, a laptop, and a
bunch of servers, and on a fairly high-reliability automation system I
am working on), and I'd use something else if I needed an OS in my
embedded systems. If something better came along, I'd use that -
whatever is the right tool for the job.

The relevant saying is "keep an open mind, but not so open that your
brains fall out". I'm happy to accept that doing things in hardware is
often more reliable than doing things in software (I work with small
embedded systems - I know when reliability is important, and I know
about achieving it in practical systems). But what I am not willing to
accept is claims that you alone understand the way to make all computers
reliable,

I have made no such claims.

using a hardware design that is obviously (to me, anyway)

impractical,

Can't help what's obvious to you

and you offer no justification beyond repeating claims that

"hardware is always more reliable than software",

Isn't it?

and therefore you can

practically guarantee that the future of computing will be dominated by
single task per core processors.

I can't guarantee it. My ideas are necessarily simplistic, and would
get more complex in a real system. Like, for example, my multicore
chip would probably have a core+GPU or three optimized for graphics,
and maybe some crypto or compression/decompression gadgets. There's no
point sacrificing performance to intellectual purity.

But the trend towards multiple cores, running multiple threads each,
is a steamroller. So far, it's been along the Microsoft "big OS"
model, but whan we get to scores of processors running hundreds of
threads, wouldn't a different OS design start to make sense? The IBM
Cell is certainly another direction.

I believe I have been open minded - I've tried to point out the problems
with your ideas, and why I think it is impractical to design such chips,

Sorry, I missed that part. Why is it, or more significantly, why *will
it* be impractical to design a chip that will contain, or act like it
contains, a couple hundred CPU cores, all interfaced to a central
cache?

and why they would be impractical for general purpose computing even if
they were made.

Why? Because Windows, and other "big" OS's like Linux, don't support
it?

I've repeatedly asked for justification for your
claims, and received none of relevance. I am more than willing to
discuss these ideas more if you can justify them - but until then, I'll
continue to view massively multi-core chips as useful for some
specialised tasks but inappropriate for general purpose (and desktop in
particular) computing.

It's generally accepted tha a microkernal-based OS will be more
reliable than a macrokernal system, because of its simplicity, but the
microkernal needs too many context switches to be efficient.

http://en.wikipedia.org/wiki/Microkernel

So, let's get rid of the context switches by running each process in
its own real or virtual (ie, multithreaded) CPU. Then nobody can crash
the kernal. A little hardware protection for DMA operations makes even
device drivers safe.

I seem to remember previous discussions reaching similar conclusions -
you had a pretty way-out theory, leading to an interesting discussion
but ending with me giving up in frustration, and you calling me
closed-minded.

Deja vu, I guess.

These sorts of ideas are good for making people think,
but scientific minds are naturally sceptical until given solid evidence
and justification.

"Scientific minds" are often remarkably ready to attack new ideas,
rather than playing with, or contributing to them. I take a lot of
business away from people like that.

And I'm no dreamer: I build stuff that works, and people buy it.

John

John Larkin · Sep 20, 2007

MooseFET [email protected] posted to sci.electronics.design:

On Wed, 19 Sep 2007 05:04:34 -0700, Martin Brown

But in proper synchronous design, controlled by
state machines, immensely complex stuff just works. It's sort of
ironic that in a big logic design, 100K gates and maybe 100 state
machines, everything happens all at once, every clock, across the
entire chip, and it works. Whereas with software, there's only one
PC, only one thing happens, at a single location, at a time, and
usually nobody can predict the actual paths, or write truly
reliable code.

Click to expand...

4G of RAM * 8 Bits is a lot more bits than a 100K gates. You need
to keep your sizes equal if you want to make a fair comparison.

On Wed, 19 Sep 2007 05:04:34 -0700, Martin Brown

On 18 Sep, 17:12, John Larkin
On Tue, 18 Sep 2007 07:05:25 -0700, Martin Brown

On Sep 17, 7:55 pm, John

[....]

Programmers have pretty much proven that they cannot write
bug-free large systems.

In every other area, humans make mistakes and yet we seem
surprised that programmers do too.

In most other areas of endeavour small tolerance errors do not
so often lead to disaster. Boolean logic is less forgiving. And
fence

Software programming hasn't really had the true transition to a
hard engineering discipline yet. There hasn't been enough
standardisation

Compare a software system to an FPGA. Both are complex, full of
state machines (implicit or explicit!), both are usually
programmed in a heirarichal language (C++ or VHDL) that has a
library of available modules, but the FPGAs rarely have bugs
that get to the field, whereas most software rarely is ever
fully debugged.

I think that hardware engineers get a better grounding in logical
design (although I haven't looked at modern CS syllabuses so I may
be out of date).

Hardware can be spaghetti too, and can be buggy and nasty, if one
does asynchronous design. But in proper synchronous design,
controlled by state machines, immensely complex stuff just works.
It's sort of ironic that in a big logic design, 100K gates and
maybe 100 state machines, everything happens all at once, every
clock, across the entire chip, and it works. Whereas with software,
there's only one PC, only one thing happens, at a single location,
at a time, and usually nobody can predict the actual paths, or
write truly reliable code.

But it is mostly a cultural thing. Software houses view minimum
time to market and first mover advantage to gain maximum market
share as more important than correct functionality. And it seems
they are right. Just look at Microsoft Windows vs IBMs OS/2 a
triumph of superb marketting over technical excellence!

And I have bought my fair share of hardware that made it onto the
market bugs and all too. My new fax machine caught fire. Early V90
modems that only half work etc.

So, computers should use more hardware and less software to
manage resources. In fact, the "OS kernal" of my multiple-CPU
chip could be entirely hardware. Should be, in fact.

You are treating the symptoms and not the disease. Strongly typed
languages already exist that would make most of the classical
errors of C/C++ programmers go away. Better tools would help in
software development, but until the true cost of delivering faulty
software is driven home the suits will always go for the quick
buck.

No, I am making the true observation that complex digital logic
designs are usually bug-free, simple software systems have a chance
of being so, and complex software systems never are.

John

Click to expand...

Click to expand...

May i introduce you to a concept called cyclomatic complexity. The
cyclomatic complexity of 100's of interacting state machines is on
the order of 10^5 to 10^6. A memory array of regular blocks of
storage accessed by a regular decoder has a cyclomatic complexity of
on the order of 10 to 10^2. In the memory there is much
self-similarity across several orders of magnitude in size.

So *that's* why Windows is so reliable! It's a single state machine
that traverses a simple linear array of self-similar memory.

Thanks.

John

Martin Brown · Sep 20, 2007

No, I am making the true observation that complex digital logic
designs are usually bug-free, simple software systems have a chance of
being so, and complex software systems never are.

The problem is that complex software systems are orders of magnitude
more complex than the hardware upon which they run. And the pain in
checking all the interactions between subsystems scales with N!

(10N)! >> N! for all N

It should not be a surprise that complexity (especailly when combined
with bad planning and requirements creep) can kill projects stone
dead. Or have them released on an unsuspecting world in a state of
total disarray.

There are design methods and languages to support them that could
deliver more reliable software by making it harder to write buggy
code. The trouble is that too few programmers bother to learn how to
do it.

Ship it and be damned it the business paradigm today (and then get
paid again to put it right).

Regards,
Martin Brown

MooseFET · Sep 20, 2007

MooseFET [email protected] posted to sci.electronics.design:

4G of RAM * 8 Bits is a lot more bits than a 100K gates. You need
to keep your sizes equal if you want to make a fair comparison.

12, John Larkin
55 pm, John
[....]
Programmers have pretty much proven that they cannot write
bug-free large systems.
In every other area, humans make mistakes and yet we seem
surprised that programmers do too.
In most other areas of endeavour small tolerance errors do not
so often lead to disaster. Boolean logic is less forgiving. And
fence
Software programming hasn't really had the true transition to a
hard engineering discipline yet. There hasn't been enough
standardisation
Compare a software system to an FPGA. Both are complex, full of
state machines (implicit or explicit!), both are usually
programmed in a heirarichal language (C++ or VHDL) that has a
library of available modules, but the FPGAs rarely have bugs
that get to the field, whereas most software rarely is ever
fully debugged.
I think that hardware engineers get a better grounding in logical
design (although I haven't looked at modern CS syllabuses so I may
be out of date).
Hardware can be spaghetti too, and can be buggy and nasty, if one
does asynchronous design. But in proper synchronous design,
controlled by state machines, immensely complex stuff just works.
It's sort of ironic that in a big logic design, 100K gates and
maybe 100 state machines, everything happens all at once, every
clock, across the entire chip, and it works. Whereas with software,
there's only one PC, only one thing happens, at a single location,
at a time, and usually nobody can predict the actual paths, or
write truly reliable code.
But it is mostly a cultural thing. Software houses view minimum
time to market and first mover advantage to gain maximum market
share as more important than correct functionality. And it seems
they are right. Just look at Microsoft Windows vs IBMs OS/2 a
triumph of superb marketting over technical excellence!
And I have bought my fair share of hardware that made it onto the
market bugs and all too. My new fax machine caught fire. Early V90
modems that only half work etc.
So, computers should use more hardware and less software to
manage resources. In fact, the "OS kernal" of my multiple-CPU
chip could be entirely hardware. Should be, in fact.
You are treating the symptoms and not the disease. Strongly typed
languages already exist that would make most of the classical
errors of C/C++ programmers go away. Better tools would help in
software development, but until the true cost of delivering faulty
software is driven home the suits will always go for the quick
buck.
No, I am making the true observation that complex digital logic
designs are usually bug-free, simple software systems have a chance
of being so, and complex software systems never are.
John

Click to expand...

Click to expand...

May i introduce you to a concept called cyclomatic complexity. The
cyclomatic complexity of 100's of interacting state machines is on
the order of 10^5 to 10^6. A memory array of regular blocks of
storage accessed by a regular decoder has a cyclomatic complexity of
on the order of 10 to 10^2. In the memory there is much
self-similarity across several orders of magnitude in size.

So what exactly is the definition. It seems to me that just because
the memory is a repeated array in physical space, it needn't be in
logical space.

MooseFET · Sep 20, 2007

]

May i introduce you to a concept called cyclomatic complexity. The
cyclomatic complexity of 100's of interacting state machines is on
the order of 10^5 to 10^6. A memory array of regular blocks of
storage accessed by a regular decoder has a cyclomatic complexity of
on the order of 10 to 10^2. In the memory there is much
self-similarity across several orders of magnitude in size.

Click to expand...

So *that's* why Windows is so reliable! It's a single state machine
that traverses a simple linear array of self-similar memory.

Beware of anything that is claimed to lead to better programming.
When Intel introduced segmentation on the 8086, they said it improved
program modularity etc. At that time I suggested that the program
counter should have been made such that it decremented to help with
top down program design.

One thing about Windows and products like that is that they have to
make buggy code in order to be able to sell upgrades. At one time I
had a toaster with a bug in it. I got tired of turning it upside down
and jiggling the lever to get the system back to its "idle state". I
upgraded to a new toaster. Without the bug, I'd still be using that
old toaster.

John Larkin · Sep 20, 2007

]

You are treating the symptoms and not the disease. Strongly typed
languages already exist that would make most of the classical
errors of C/C++ programmers go away. Better tools would help in
software development, but until the true cost of delivering faulty
software is driven home the suits will always go for the quick
buck.

Click to expand...

No, I am making the true observation that complex digital logic
designs are usually bug-free, simple software systems have a chance
of being so, and complex software systems never are.

May i introduce you to a concept called cyclomatic complexity. The
cyclomatic complexity of 100's of interacting state machines is on
the order of 10^5 to 10^6. A memory array of regular blocks of
storage accessed by a regular decoder has a cyclomatic complexity of
on the order of 10 to 10^2. In the memory there is much
self-similarity across several orders of magnitude in size.

Click to expand...

So *that's* why Windows is so reliable! It's a single state machine
that traverses a simple linear array of self-similar memory.

Click to expand...

Beware of anything that is claimed to lead to better programming.
When Intel introduced segmentation on the 8086, they said it improved
program modularity etc. At that time I suggested that the program
counter should have been made such that it decremented to help with
top down program design.

I was impressed by the COPS processor that used a pseudo-random shift
register as the program counter. That was all-over-the-place program
design.

John

Martin Brown · Sep 20, 2007

I agree entirely. People don't seem to appreciate how complex modern
software is. And for that matter just how difficult it is to write
absolutely bullet proof code that will never fail no matter what the
provocation.

So what exactly is the definition. It seems to me that just because
the memory is a repeated array in physical space, it needn't be in
logical space.- Hide quoted text -

It is actually a better metric for deciding on the number of test
cases needed to excerise every path in a complex decision network at
least once. Essentially it gives a path complexity count of all the
control flows through the code.

http://www.sei.cmu.edu/str/descriptions/cyclomatic_body.html
and
http://en.wikipedia.org/wiki/Cyclomatic_complexity

It should be better known in the industry.

I like McCabes CCI which I find a *very* good indicator of code likely
to contain bugs. It comes from a graph theory analysis of the
decision nodes in a routine. Although I disagree with the proponents
of this metric about exactly where the thresholds should be placed. It
is a good way to find dangerous spaggetti code sections in an
inheritted large project without having to read through everything.
And a good way to check for future maintainence traps.

I can pretty much guarantee that above a certain size or complexity
there will be bugs in a given routine. You will get more hits Googling
with the longer "Tom McCabe" and "cyclomatic complexity index". Sadly
it is yet another useful tool ignored by the mainstream. "McCabe's
CCI" gets mostly my own postings and a medical usage.

Regards,
Martin Brown

Richard Henry · Sep 20, 2007

The problem is that complex software systems are orders of magnitude
more complex than the hardware upon which they run. And the pain in
checking all the interactions between subsystems scales with N!

(10N)! >> N! for all N

It should not be a surprise that complexity (especailly when combined
with bad planning and requirements creep) can kill projects stone
dead. Or have them released on an unsuspecting world in a state of
total disarray.

There are design methods and languages to support them that could
deliver more reliable software by making it harder to write buggy
code. The trouble is that too few programmers bother to learn how to
do it.

Ship it and be damned it the business paradigm today (and then get
paid again to put it right).

Regards,
Martin Brown- Hide quoted text -

- Show quoted text -

If Microsoft followed their own advice (see Code Complete by Stevew
McConnell) , they could develop acceptably bug-free versions of their
operating systems. However, that would require them to follow a
pattern of testing and bug repair before relaease that would mean we
would still be waiting for the release of Windows 98.

Martin Brown · Sep 20, 2007

"Complex" hardware systems are bug free for small values of complex.

If Microsoft followed their own advice (see Code Complete by Stevew
McConnell) , they could develop acceptably bug-free versions of their
operating systems. However, that would require them to follow a
pattern of testing and bug repair before relaease that would mean we
would still be waiting for the release of Windows 98.- Hide quoted text -

The OS/2 Team more or less followed that model and did ship an OS that
after a short while was virtually bullet proof and bug free. IBM
delayed shipping the Presentation Manager GUI until it was (nearly)
right.

MS Windows shipped to mass market bugs and all - the rest is
history,..

IBM also confused the market by linking OS/2 to their new brand of PS/
2 hardware with a proprietory lock-in MCA bus (anyone remember that?).
We only bought Compaqs & Dells afterwards.

I do believe that the MS programmers are for the most part very bright
guys, but that the process is flawed. Senior management claim to
endorse bug free quality, but their bonuses will always depend on the
bottom line.

Regards,
Martin Brown

John Larkin · Sep 20, 2007

"Complex" hardware systems are bug free for small values of complex.

Excellent. Let's make a computer system from a largish number of small
blocks, each being a RISC cpu running one task in one contiguous
address space. Each program owns its cpu and is usually written by one
person. Bugs could be punishable by flogging or mandatory attendance
at the opera.

John

Rich Grise · Sep 20, 2007

I agree entirely. People don't seem to appreciate how complex modern
software is. And for that matter just how difficult it is to write
absolutely bullet proof code that will never fail no matter what the
provocation.

Difficult, yes, but not impossible, and exceedingly satisfying when
completed.

That's what the "engineering" part is. You don't just sit down at a
console and start entering your program before it's even written!

Then again, I grew up with hardware, where when you blow stuff up,
it costs money. ;-)

Hope This Helps!
Rich

Rich Grise · Sep 20, 2007

... Bugs could be punishable by flogging or mandatory attendance
at the opera.

Yikes! Does the perp get a choice of punishment? I'll take the
flogging over opera any day. ;-)

But then again, I'll be safe, because I write bug-free code. ;-)

Speaking of opera, I hear that some store had a problem with
teenagers hanging out in front of the place, blocking traffic
and just being a general nuisance, as is the wont of teenagers;
the store installed external speakers and played opera through
them, and the teenagers dispersed. ;-)

Cheers!
Rich

Rich Grise · Sep 20, 2007

In a statically-typed language, that isn't an option (unless you're
seriously suggesting that X would be declared as a floating-point variable).

Well, that's where the "design" part comes in.

No, it's a simple example of something which can't easily be proven to
terminate. Most static analysis tools don't even try to address
non-termination.

OK, fair enough.

Pointing out that some bugs can't be eliminated by static analysis isn't
the same thing as suggesting that they can't be caught at all.

Having said that, most of the bugs which occur in the wild are of a kind
which could easily be caught using better tools. More powerful type
systems (e.g. those typically found in functional languages) would go a
long way, as would design-by-contract (as in Eiffel).

Well, better tools might help, but the best solution would be to hire
programmers who actually give a hoot about the quality of their work
product.

Cheers!
Rich

David Brown · Sep 20, 2007

Rich said:
Well, duh. People, in general, make some percentage of mistakes, i.e., the
absolute number of mistakes will be proportional to the amount of work
performed, regardless of whether it's programming or building a boat.

With hardware, there's a lot more rigorous checking on the way - their
guidelines aren't as nebulous as software specs. ;-)

That's one of the key points - hardware designs are tested far better.
The other is that hardware designs typically have far lines of code (a
ram device might have millions of gates - but it is generated by a small
loop of code). Thus there are typically fewer bugs to start with, and
better testing to eliminate those that have popped in.

But software that's designed right shouldn't have any bugs; saying
"Oh, software _always_ has bugs" is just a lame excuse for sloppy design.

Yes - unless, of course, you business model relies on selling upgrades!

David Brown · Sep 20, 2007

John said:
I have made no such claims.

You have repeatedly said that current OS's (software OS's running on one
or a few cores) is inherently unreliable, while your idea of a massively
multi-core cpu running a task per core would be totally reliable. As
far as I can see, you are the only person who believes this. If I've
misunderstood (either about your claims, or if you can show that others
share the idea), please correct me.

using a hardware design that is obviously (to me, anyway)

Can't help what's obvious to you

and you offer no justification beyond repeating claims that

Isn't it?

No it isn't. At best, you can compare apples and oranges and note that
a ram chip is more reliable than windows, despite the former having more
transistors than the later has lines of source code.

We agree that typical hardware design processes are more geared to
producing reliable and well-tested designs than common software design
processes. But that does not translate into a generalisation that a
given task can be performed more reliably in hardware than software.

and therefore you can

I can't guarantee it. My ideas are necessarily simplistic, and would

Perhaps "guarantee" was a bit strong - but you stated confidently that
your 1024-core one-core-per-task devices were "gonna happen".

get more complex in a real system. Like, for example, my multicore
chip would probably have a core+GPU or three optimized for graphics,
and maybe some crypto or compression/decompression gadgets. There's no
point sacrificing performance to intellectual purity.

This is beginning to sound a lot more like a practical system - devices
exist today with several specialised cores, particularly in the embedded
market. Arguably graphics cards fall into this category, as do high-end
network cards with offload engines. But that's a far cry from your
cpu-per-thread idea, and it is done for performance reasons - *not*
reliability.

But the trend towards multiple cores, running multiple threads each,
is a steamroller. So far, it's been along the Microsoft "big OS"
model, but whan we get to scores of processors running hundreds of
threads, wouldn't a different OS design start to make sense? The IBM
Cell is certainly another direction.

Forget windows - it's a bad example of an OS, and a it's an extreme
example of unreliable software. There is no "Microsoft big OS" model -
they just have a bad implementation of a normal monolithic kernel OS.

There are uses for computers based on running large numbers of threads
in parallel - the Sun Niagara processors can handle 64 threads in
hardware (running on 8 cores). But these do not use a core (or even a
virtual core) per thread - the cores have context switches as threads
and processes come and go, or sleep and resume. Clearly you will get
better *performance* when you can minimise context switching - but no
one would plan for a system where context switching did not happen.
There is nothing to suggest that the system could be made more reliable
by avoiding context switches, except in the sense of reliably being able
to complete tasks at the required speed - it's a performance issue.

Sorry, I missed that part. Why is it, or more significantly, why *will
it* be impractical to design a chip that will contain, or act like it
contains, a couple hundred CPU cores, all interfaced to a central
cache?

Perhaps I didn't explain it well, or perhaps you didn't read these posts
- it's hard to follow everything on s.e.d.

The problem with so many cores accessing a shared cache is that you have
huge contention for the cache resources. RAM cells get bigger, slower
and more complex the more ports they have - it's rare to get more than
dual-ported RAM blocks. So if you have 1000 cores all trying to access
the same cache, you're going to have huge latencies. You also need
complex multiplexing hierarchies for your cross-switches - as each cpu
needs to access the cache, you basically require a 1000:1 multiplexer.
Assuming your cache has multiple banks and access to some IO or other
buses, you'd need something like a 1000:10 cross-switch. That would be
really horrible to implement - you'd need to find a compromise between
vast switching circuits and multiple levels introducing delays and
bottlenecks.

Here's a brief view of the Niagara II - your device would face similar
challenges, but greatly multiplied:
http://www.theinquirer.net/?article=42256

If each core has an L1 cache to relieve some of the pressure (without
it, the system would crawl), you then have a very nasty problem of
tracking cache coherency. Current cache coherency strategies do not
scale well - they are a big problem on multicore systems.

With existing multiprocessor systems, it is the cache and memory
interconnection systems that are the big problem. If you look at
high-end motherboards with 8 or 16 sockets, the cross-bar switches that
keep memory coherent and provide fast access for all the cores cost more
than the processors themselves. Building it all in one device does not
make it significantly easier (although it saves on some buffers).

There are alternative ways to connect up large numbers of cores - a NUMA
arrangement with cores passing memory requests between each other would
almost certainly be easier. But you would have very significant
latencies and bottlenecks, a very large number of inter-core buses, and
you'd still have trouble with the L1 cache coherence.

With a new OS, and certain significant restraints on the software, you
could perhaps avoid many of the L1 cache coherence problems. In
particular, being even more restrictive about memory segments would
allow you to assume that L1 data is private, and thus always coherent.
For example, if all memory came from either a read-only source for code,
or was private to the task using it, then you'd have coherency. You'd
need a system for read and write locks for memory areas, with a central
controller responsible for dishing out these locks and broadcasting
cache invalidations when these changed, but it might work.

However, you've lost out on a range of requirements here. First off,
your cores are now far from simple, and the glue logic is immense. Thus
you have lost all hope of making the device cheap and reliable.
Secondly, you've still got significant latencies for all memory access,
slowing down the throughput of any given core, crippling your maximum
thread speed. The bottlenecks don't matter so much in the grand view of
the device - the total bandwidth to the cpus should still be more than
if it were a normal multi-core device. Thirdly, you've lost
compatibility with all existing software - it won't run most programs,
as they rely on being able to have shared data access.

Why? Because Windows, and other "big" OS's like Linux, don't support
it?

Yes, that's about it. To be more precise, it will be impractical for
general purpose computing because it won't run common general purpose
programs. Even with the required major changes to the software and
compilation tools, and without the cache restrictions mentioned earlier,
it would run common programs painfully slowly.

It's generally accepted tha a microkernal-based OS will be more
reliable than a macrokernal system, because of its simplicity, but the
microkernal needs too many context switches to be efficient.

A microkernel *may* be more reliable because of its modular design -
each part is relatively simple and communicates through limited,
controlled ports. That's far from saying it always *will* be more
reliable. Much of the theoretical reliability gains of a microkernel do
not actually help in practice. For example, the ability of low-level
services to be restarted if they crash is useless when the service in
question is essential to the system. Thus there are no reliability
benefits from putting your memory management, task management, virtual
file system, or interrupt system outside the true kernel - if one of
these services dies, you're buggered whether it kills the kernel or not.
A similar situation is found in Linux - because X is separate from the
kernel, it can die and restart independently of the OS itself. But to
the desktop user, their system has died - they don't know or care if the
OS itself survived.

Most of the benefits of a microkernel can actually be achieved in a
monolithic kernel - you keep your services carefully modularised,
developed and tested as separate units with clear and clean interfaces.
It's a good development paradigm - it does not matter in practice if
the key services are directly linked with the kernel or not, since they
are all essential to the working of the OS. About the only way a
microkernel improves reliability is by enforcing this model - you are
not able to cheat.

What *does* make sense is keeping as many device drivers as possible out
of the kernel itself. Non-essential services should not be in the kernel.

http://en.wikipedia.org/wiki/Microkernel

So, let's get rid of the context switches by running each process in
its own real or virtual (ie, multithreaded) CPU. Then nobody can crash
the kernal. A little hardware protection for DMA operations makes even
device drivers safe.

You underestimate the power of software bugs - you'll *always* be able
to crash the kernel!

The context switches in this case are completely irrelevant to
reliability. The issue with microkernels and context switches is purely
a matter of performance - they cost a lot of time, especially since they
involve jumps to a different processor mode or protection ring. If you
want to produce a cpu that minimises the cost of a context switch
through hardware acceleration, then it would definitely be a good idea
and would benefit microkernel OS's in particular. But it's a
performance improvement, not a reliability improvement. Other hardware
for accelerating key OS concepts such as locks or IPC would help too.

Deja vu, I guess.

"Scientific minds" are often remarkably ready to attack new ideas,
rather than playing with, or contributing to them. I take a lot of
business away from people like that.

And I'm no dreamer: I build stuff that works, and people buy it.

So do I - but we both make and sell practical solutions which are a step
beyond our competitors. We would not try and sell something that seems
a revolutionary new idea at first sight, but terribly impractical to
implement and lacking the very benefits we first thought.

There's nothing wrong with dreaming, quite the opposite. But you have
to be able to see when it is nothing but a dream.

mvh.,

David

John Larkin · Sep 20, 2007

You have repeatedly said that current OS's (software OS's running on one
or a few cores) is inherently unreliable, while your idea of a massively
multi-core cpu running a task per core would be totally reliable. As
far as I can see, you are the only person who believes this. If I've
misunderstood (either about your claims, or if you can show that others
share the idea), please correct me.

No it isn't. At best, you can compare apples and oranges and note that
a ram chip is more reliable than windows, despite the former having more
transistors than the later has lines of source code.

We agree that typical hardware design processes are more geared to
producing reliable and well-tested designs than common software design
processes. But that does not translate into a generalisation that a
given task can be performed more reliably in hardware than software.

Perhaps "guarantee" was a bit strong - but you stated confidently that
your 1024-core one-core-per-task devices were "gonna happen".

That's probably true. Sun will soon be shipping 8-core, multithread
processors. Looks like the number of cores per chip is at least
doubling every year, now that clock speeds are no longer the holy
grail.

So, in 5 years, with 8 * 2^5 = 256 cores, running maybe 1K threads,
why context switch?

This is beginning to sound a lot more like a practical system - devices
exist today with several specialised cores, particularly in the embedded
market. Arguably graphics cards fall into this category, as do high-end
network cards with offload engines. But that's a far cry from your
cpu-per-thread idea, and it is done for performance reasons - *not*
reliability.

Well, the world is ready for OS reliability.

Forget windows - it's a bad example of an OS, and a it's an extreme
example of unreliable software. There is no "Microsoft big OS" model -
they just have a bad implementation of a normal monolithic kernel OS.

There are uses for computers based on running large numbers of threads
in parallel - the Sun Niagara processors can handle 64 threads in
hardware (running on 8 cores). But these do not use a core (or even a
virtual core) per thread - the cores have context switches as threads
and processes come and go, or sleep and resume. Clearly you will get
better *performance* when you can minimise context switching - but no
one would plan for a system where context switching did not happen.
There is nothing to suggest that the system could be made more reliable
by avoiding context switches, except in the sense of reliably being able
to complete tasks at the required speed - it's a performance issue.

Perhaps I didn't explain it well, or perhaps you didn't read these posts
- it's hard to follow everything on s.e.d.

The problem with so many cores accessing a shared cache is that you have
huge contention for the cache resources. RAM cells get bigger, slower
and more complex the more ports they have - it's rare to get more than
dual-ported RAM blocks. So if you have 1000 cores all trying to access
the same cache, you're going to have huge latencies. You also need
complex multiplexing hierarchies for your cross-switches - as each cpu
needs to access the cache, you basically require a 1000:1 multiplexer.
Assuming your cache has multiple banks and access to some IO or other
buses, you'd need something like a 1000:10 cross-switch. That would be
really horrible to implement - you'd need to find a compromise between
vast switching circuits and multiple levels introducing delays and
bottlenecks.

Here's a brief view of the Niagara II - your device would face similar
challenges, but greatly multiplied:
http://www.theinquirer.net/?article=42256

If each core has an L1 cache to relieve some of the pressure (without
it, the system would crawl), you then have a very nasty problem of
tracking cache coherency. Current cache coherency strategies do not
scale well - they are a big problem on multicore systems.

With existing multiprocessor systems, it is the cache and memory
interconnection systems that are the big problem. If you look at
high-end motherboards with 8 or 16 sockets, the cross-bar switches that
keep memory coherent and provide fast access for all the cores cost more
than the processors themselves. Building it all in one device does not
make it significantly easier (although it saves on some buffers).

There are alternative ways to connect up large numbers of cores - a NUMA
arrangement with cores passing memory requests between each other would
almost certainly be easier. But you would have very significant
latencies and bottlenecks, a very large number of inter-core buses, and
you'd still have trouble with the L1 cache coherence.

With a new OS, and certain significant restraints on the software, you
could perhaps avoid many of the L1 cache coherence problems. In
particular, being even more restrictive about memory segments would
allow you to assume that L1 data is private, and thus always coherent.
For example, if all memory came from either a read-only source for code,
or was private to the task using it, then you'd have coherency. You'd
need a system for read and write locks for memory areas, with a central
controller responsible for dishing out these locks and broadcasting
cache invalidations when these changed, but it might work.

However, you've lost out on a range of requirements here. First off,
your cores are now far from simple, and the glue logic is immense. Thus
you have lost all hope of making the device cheap and reliable.
Secondly, you've still got significant latencies for all memory access,
slowing down the throughput of any given core, crippling your maximum
thread speed. The bottlenecks don't matter so much in the grand view of
the device - the total bandwidth to the cpus should still be more than
if it were a normal multi-core device.

Exactly! Except there is no context switch overhead.

Thirdly, you've lost

compatibility with all existing software - it won't run most programs,
as they rely on being able to have shared data access.

Exactly! We can't run .NET forever.

Yes, that's about it. To be more precise, it will be impractical for
general purpose computing because it won't run common general purpose
programs.

Circular reasoning. Why aren't we still running 1401 code?

Even with the required major changes to the software and
compilation tools, and without the cache restrictions mentioned earlier,
it would run common programs painfully slowly.

If the cache throughput is the limit, you get the same amount of
computing no matter how many CPUs are running. CPUs can also have a
little bit of local instruction cache, since code does not have to be
kept globally coherent.

A microkernel *may* be more reliable because of its modular design -
each part is relatively simple and communicates through limited,
controlled ports. That's far from saying it always *will* be more
reliable. Much of the theoretical reliability gains of a microkernel do
not actually help in practice. For example, the ability of low-level
services to be restarted if they crash is useless when the service in
question is essential to the system. Thus there are no reliability
benefits from putting your memory management, task management, virtual
file system, or interrupt system outside the true kernel - if one of
these services dies, you're buggered whether it kills the kernel or not.
A similar situation is found in Linux - because X is separate from the
kernel, it can die and restart independently of the OS itself. But to
the desktop user, their system has died - they don't know or care if the
OS itself survived.

Most of the benefits of a microkernel can actually be achieved in a
monolithic kernel - you keep your services carefully modularised,
developed and tested as separate units with clear and clean interfaces.
It's a good development paradigm - it does not matter in practice if
the key services are directly linked with the kernel or not, since they
are all essential to the working of the OS. About the only way a
microkernel improves reliability is by enforcing this model - you are
not able to cheat.

What *does* make sense is keeping as many device drivers as possible out
of the kernel itself. Non-essential services should not be in the kernel.

You underestimate the power of software bugs - you'll *always* be able
to crash the kernel!

No. Not if it's small and correct, and it's absolutely protected by
the hardware, and it runs on a CPU that runs nothing else. I've
written RTOS's that never crashed.

There's nothing wrong with dreaming, quite the opposite. But you have
to be able to see when it is nothing but a dream.

Do I have to give all that money back? Roughly $200 million so far.

John

Nobody · Sep 20, 2007

Well, better tools might help, but the best solution would be to hire
programmers who actually give a hoot about the quality of their work
product.

The programmers don't normally get to choose the budget or deadlines.

The choice of "done now" vs "done right" is usually based upon which one
is more likely to result in you keeping your job. Most of the time,
"done now" wins.

David Brown · Sep 20, 2007

John said:
That's probably true. Sun will soon be shipping 8-core, multithread
processors. Looks like the number of cores per chip is at least
doubling every year, now that clock speeds are no longer the holy
grail.

So, in 5 years, with 8 * 2^5 = 256 cores, running maybe 1K threads,
why context switch?

Cores don't scale like that - Sun have done pretty well with their 8
core 64 thread cpu. I don't imagine we'll see so very many more cores
on a device, because the interconnections and cache scaling get too
difficult, and the whole device is too limited in memory bandwidth. It
could get more, but why would anyone bother when there is a better way?
The sort of application that works well with these devices is
multi-process (or multi-thread) server software, such as web, email and
database serving. These also scale well in clusters - there is little
point in trying to run one OS on a 64 core machine when you can just as
easily run 8 OS's on 8 8 core machines and get the same performance
without anything more sophisticated than standard network connections.

If you are looking for a high performance server today, you can buy rack
units with 4 separate PC's, each with 1 or 2 sockets for 4 or 8 core
SMP. Load them all up with Linux in a cluster, and you have the same
processing power as a 32 core system at a fraction of the cost.

And unlike in a single massively multi-core chip, such clusters can have
redundancy built in - thus greatly increasing their reliability.

As a future prediction, I would expect to see motherboards with multiple
independent PC's on the one board, designed specifically for this sort
of cluster. I also expect to see virtual Ethernet links on these boards.

And as for your context switch obsession, you do realise that in an SMP
server system, context switches waste only a fraction of a percent of
the processing power? On a web server with 64 virtual cores, you'd
expect something like a few thousand processes to be alive at a time,
but most of them will be sleeping - the main worker threads will occupy
the cores with very little context switching, while the sleeping threads
can run as needed.

In other words, the organisation that exists today works perfectly well.
There are plenty of things that can be improved in the software and
hardware, but nothing is fundamentally broken (except perhaps the
software development methods of many companies).

On the desktop side, there will be a gradual shift towards more
multi-threading software for processor intensive applications like games
and media converters.

Well, the world is ready for OS reliability.

There *are* reliable OS's available today - they just don't begin with
the letter "W". There is plenty of unreliable software that runs on
these OS's, but *nix systems designed for servers are solid (as are VMS,
Netware, and many embedded OS's).

What you are trying to say, I think, is that the world is ready for
reliable desktop software - that's a very different matter, and one I'd
agree with.

Exactly! Except there is no context switch overhead.

So what? In all practical measurements, there is no context switch
overhead in SMP systems today - it all drowns out in comparison to
delays in I/O. If you are running cpu-intensive work on a desktop with
one core and fast pre-emptive switching, then the switch overhead can be
noticeable - but not on a server.

Thirdly, you've lost

Exactly! We can't run .NET forever.

Who would want to run .NET at all, especially on a server?

Circular reasoning. Why aren't we still running 1401 code?

I'm talking about compatibility at the source code level and above
(i.e., the design of the software, and the way it works), not the object
code. Many essential building blocks of the internet are based on code
15 years old or more - the *nix architecture is 30 years old or so. Any
hardware that can't run this *kind* of software (even after
modifications) can't run common general purpose software.

If the cache throughput is the limit, you get the same amount of
computing no matter how many CPUs are running. CPUs can also have a
little bit of local instruction cache, since code does not have to be
kept globally coherent.

Code does have to be kept globally coherent (though it is easier to do
so than for data), and cores can't keep running without data.

But you are right about the bandwidth limitations being a similar
problem for having a few cores or many. It will be less of an issue for
the few core device, since you'd have fewer latencies in switching all
the data around the device. And if your 256 cores cannot do more real
work than 4 cores could - what is the point in having them? Please
don't just repeat that it avoids context switches - the tiny advantage
that might give does not outweigh the costs of the rest of the device.

No. Not if it's small and correct, and it's absolutely protected by
the hardware, and it runs on a CPU that runs nothing else. I've
written RTOS's that never crashed.

You can make systems small and correct when they are doing a limited and
well-understood job - that's why we can make embedded systems that are
reliable. It is also possible to make big systems that are correct and
reliable, if you do it well enough (look at mainframes). But dividing a
complex system into parts does not by itself make it more reliable - it
only makes it easier for the developer to use solid development and test
methodologies.

I'm not saying that smaller kernels and better protection through more
advanced hardware are not helpful - merely that they are not a magic
bullet (who cares if your RTOS never crashes if the application running
on it dies? It's the whole system's reliability that is important), nor
are they essential.

Do I have to give all that money back? Roughly $200 million so far.

Only if you've sold them nothing more than dreams!

Moore's Lobby Podcast

Menu

Categories

Platforms

Content

Connect With Us

Network

How to develop a random number generation device

How to develop a random number generation device

MooseFET

JosephKK

John Larkin

John Larkin

Martin Brown

MooseFET

MooseFET

John Larkin

Martin Brown

Richard Henry

Martin Brown

John Larkin

Rich Grise

Rich Grise

Rich Grise

David Brown

David Brown

John Larkin

Nobody

David Brown

Similar threads