VON NEUMANN VS HARVARD

UFO Joe · Apr 10, 2006

Excerpted from AVR Butterfly Site at: http://retrodan.tripod.com

VON NEUMANN vs. HARVARD ARCHITECTURE

If you are new to Microcontrollers one of the arguments you are going to
hear bantered about is Harvard Architecture versus the Von Neumann
Architecture.

THE VON NEUMANN ARCHITECTURE

Most computers we are familiar with use an architecture called Von Neumann.
The term arose out of Neumann's 1945 draft report on the ADVAC computer. He
was not, however the original inventor of it.

+---------+
| RAM |
| - - - - |
BottleNeck | PROGRAM |
[CPU] <==========> | - - - - |
| DATA |
+---------+

A Von Neumann machine has one large monolithic RAM structure that contains
both program memory and data memory mixed together. Since both program steps
and data must be loaded from the same place, it can create a problem called
the Von Neumann Bottle-Neck.

THE HARVARD ARCHITECTURE

Most microcontrollers use a different system called Harvard architecture.
The larger program storage and the smaller data memory are separated. The
first such machine, the Harvard Mark I had it's programs hard-coded on
paper-tape and the volatile data was loaded into electric relays.

+----------+ +------+
| PROGRAM | | DATA |
| ROM | <----> [CPU] <----> | RAM |
+----------+ +------+

Harvard style machines allow program steps to be fetched at the same time as
data, thereby creating potentially faster through-put and less of a
bottle-neck. They also have the benefit that run away processes can't damage
the program stored in the non-volatile program area so they're more stable.
Many C programs lack proper boundary checking and a null pointer or an
over-run buffer can overwrite and crash a program that shares RAM with data.
If you are new to this architecture you need to keep this in mind. When
creating a routine that needs a few bytes of storage, I would normally
create that space within the routine itself. On a Harvard machine, those
bytes would not be in volatile RAM but part of the hard coded program memory
stored in ROM (or FlashRAM).

AVR BUTTERFLY PROGRAM SPACE (FLASH)

The AVR Butterfly (Atmega169) program space is 16K long and is divided into
two main areas. The top 1-2K and is usually loaded with a bootloader that is
protected from an overwrite and the rest is available for your programs. At
the beginning of each program space is an area for interupt vectors. The
first such vector is the Power-Up/Reset Vector and should contain an RJMP to
the first line of your program.

.---------------------------------------------------.
| POWER-UP VECTOR (NORMALLY POINTS TO YOUR PROGRAM) |
| OTHER INTERUPT VECTORS |
|- - - - - - - - - - - - - - - - - - - - - - - - - -|
| |
| YOUR PROGRAM SPACE (14-15K) |
| |
| |
.---------------------------------------------------.
| BOOT:INTERUPT VECTORS |
|- - - - - - - - - - - - - - - - - - - - - - - - - -|
| BOOTLOADER PROGRAM SPACE (1-2K) |
`---------------------------------------------------'

AVR BUTTERFLY DATA SPACE (SRAM)

The data space on the AVRs is a little unusual. The 32 internal accumulators
are memory mapped as the first 32 bytes of data memory, followed by all the
I/O ports which are mapped into memory space followed by the 1K of actual
SRAM starting at $100.

.---------------------------------------------------.
|$00-$1F: 32 Internal Accumulator/Working-Registers |
+---------------------------------------------------+
|$20-$5F: 64 Input/Output Ports |
|- - - - - - - - - - - - - |
|$60-$FF 160 Extra I/O Ports |
+---------------------------------------------------|
|$100 |
| 1024 Bytes SRAM $100 to $4FF |
| |
| |
|- - - - - - - - - - - - - |
|$4FF (Normal Location for Stack) |
`---------------------------------------------------'

Fred Bloggs · Apr 10, 2006

So what? This is so mundane and boring.

UFO said:
Excerpted from AVR Butterfly Site at: http://retrodan.tripod.com

VON NEUMANN vs. HARVARD ARCHITECTURE

If you are new to Microcontrollers one of the arguments you are going to
hear bantered about is Harvard Architecture versus the Von Neumann
Architecture.

THE VON NEUMANN ARCHITECTURE

Most computers we are familiar with use an architecture called Von Neumann.
The term arose out of Neumann's 1945 draft report on the ADVAC computer. He
was not, however the original inventor of it.

+---------+
| RAM |
| - - - - |
BottleNeck | PROGRAM |
[CPU] <==========> | - - - - |
| DATA |
+---------+

A Von Neumann machine has one large monolithic RAM structure that contains
both program memory and data memory mixed together. Since both program steps
and data must be loaded from the same place, it can create a problem called
the Von Neumann Bottle-Neck.

THE HARVARD ARCHITECTURE

Most microcontrollers use a different system called Harvard architecture.
The larger program storage and the smaller data memory are separated. The
first such machine, the Harvard Mark I had it's programs hard-coded on
paper-tape and the volatile data was loaded into electric relays.

+----------+ +------+
| PROGRAM | | DATA |
| ROM | <----> [CPU] <----> | RAM |
+----------+ +------+

Harvard style machines allow program steps to be fetched at the same time as
data, thereby creating potentially faster through-put and less of a
bottle-neck. They also have the benefit that run away processes can't damage
the program stored in the non-volatile program area so they're more stable.
Many C programs lack proper boundary checking and a null pointer or an
over-run buffer can overwrite and crash a program that shares RAM with data.
If you are new to this architecture you need to keep this in mind. When
creating a routine that needs a few bytes of storage, I would normally
create that space within the routine itself. On a Harvard machine, those
bytes would not be in volatile RAM but part of the hard coded program memory
stored in ROM (or FlashRAM).

AVR BUTTERFLY PROGRAM SPACE (FLASH)

The AVR Butterfly (Atmega169) program space is 16K long and is divided into
two main areas. The top 1-2K and is usually loaded with a bootloader that is
protected from an overwrite and the rest is available for your programs. At
the beginning of each program space is an area for interupt vectors. The
first such vector is the Power-Up/Reset Vector and should contain an RJMP to
the first line of your program.

.---------------------------------------------------.
| POWER-UP VECTOR (NORMALLY POINTS TO YOUR PROGRAM) |
| OTHER INTERUPT VECTORS |
|- - - - - - - - - - - - - - - - - - - - - - - - - -|
| |
| YOUR PROGRAM SPACE (14-15K) |
| |
| |
.---------------------------------------------------.
| BOOT:INTERUPT VECTORS |
|- - - - - - - - - - - - - - - - - - - - - - - - - -|
| BOOTLOADER PROGRAM SPACE (1-2K) |
`---------------------------------------------------'

AVR BUTTERFLY DATA SPACE (SRAM)

The data space on the AVRs is a little unusual. The 32 internal accumulators
are memory mapped as the first 32 bytes of data memory, followed by all the
I/O ports which are mapped into memory space followed by the 1K of actual
SRAM starting at $100.

.---------------------------------------------------.
|$00-$1F: 32 Internal Accumulator/Working-Registers |
+---------------------------------------------------+
|$20-$5F: 64 Input/Output Ports |
|- - - - - - - - - - - - - |
|$60-$FF 160 Extra I/O Ports |
+---------------------------------------------------|
|$100 |
| 1024 Bytes SRAM $100 to $4FF |
| |
| |
|- - - - - - - - - - - - - |
|$4FF (Normal Location for Stack) |
`---------------------------------------------------'

Rene Tschaggelar · Apr 10, 2006

UFO said:
Excerpted from AVR Butterfly Site at: http://retrodan.tripod.com

VON NEUMANN vs. HARVARD ARCHITECTURE

If you are new to Microcontrollers one of the arguments you are going to
hear bantered about is Harvard Architecture versus the Von Neumann
Architecture.

[snip]

There are theories and there are actual devices that
have some performance, some features and a price tag.
At one point, when you have to actually solve a problem,
get some embedded system doing a job, then you just
compare the available devices, their prices, the
development tools and the support.

And you won't get the ultimate mega machine for
1$ a piece.

Rene

Nico Coesel · Apr 10, 2006

UFO Joe said:
Excerpted from AVR Butterfly Site at: http://retrodan.tripod.com

VON NEUMANN vs. HARVARD ARCHITECTURE

If you are new to Microcontrollers one of the arguments you are going to
hear bantered about is Harvard Architecture versus the Von Neumann
Architecture.

In my opinion everything that uses the harvard architecture is very,
very outdated.

The main problem with the HA is that you have seperate memory spaces.
C doesn't like that. You'll have to test which memory space a pointer
is pointing to. Functions like memcpy, printf, etc, etc, won't work
with any pointer. Ofcourse you can kludge things together, but in the
end you'll find yourself locked to that target. With ever shorter
product life times you'll want to move your code to other platforms or
develop on a PC and then transfer the algorithm to an embedded device.

Modern processors use an instruction and data cache which basically
'emulate' the harvard architecture without its downsides.

Ken Smith · Apr 11, 2006

Nico Coesel said:
In my opinion everything that uses the harvard architecture is very,
very outdated.

The main problem with the HA is that you have seperate memory spaces.
C doesn't like that.

In my opinion everthing that uses C is very, very outdated. The C
language doesn't support the passing of or the returning of complex data
types. It is also stuck with the dangerous and silly notion of the
"pointer".

High performance processors like DSPs often have different classes of
memories. Instructions, integers and floats can all have their own
address spaces. Writing C compilers for such things is a real nightmare
and even with a good compiler writing in C is a sickening task.

Ken Smith · Apr 11, 2006

UFO Joe said:
Harvard style machines allow program steps to be fetched at the same time as
data, thereby creating potentially faster through-put and less of a
bottle-neck.

Actually, you can do better by having two busses, and two mixed
instruction/data spaces. This way, if no data is to be fetched, two
instructions can be fetched at the same time.

Many C programs lack proper boundary checking and a null pointer or an

I'd say "practially every" not just "many". If a C program overwrites its
data space the results can and usually are just as bad as if it overwrote
its code space. The Harvard really gains you nothing in this case.

[...]

The data space on the AVRs is a little unusual. The 32 internal accumulators
are memory mapped as the first 32 bytes of data memory, followed by all the
I/O ports which are mapped into memory space followed by the 1K of actual
SRAM starting at $100.

That sounds like a really silly order to put them in. All the IO ports
sharing the RAM space also sounds like a good way to let a wild pointer in
a C program physically break the attached hardware.

"Gee every Feb 29th, the heater circuit comes on full blast and our
chemical plant catches on fire. I wonder why?"

Nico Coesel · Apr 11, 2006

In my opinion everthing that uses C is very, very outdated. The C
language doesn't support the passing of or the returning of complex data
types. It is also stuck with the dangerous and silly notion of the
"pointer".

A pointer is as dangerous as the programmer using it. By your remark
about complex data it seems you have little experience with C. It is
not a problem to pass a pointer to complex data types or objects (most
programs do this all the time) and alter the contents. C even offers
ways to make function parameters read-only so you'll get a compiler
warning if you overwrite arguments.

Besides, modern DSPs like the Analog Devices Blackfin series have a
MMU which can be used to make certain memory areas read only. Since it
has a single address space, you can run an embedded OS like micro
Linux if you like.

High performance processors like DSPs often have different classes of
memories. Instructions, integers and floats can all have their own
address spaces. Writing C compilers for such things is a real nightmare
and even with a good compiler writing in C is a sickening task.

Old style thinking, old style hardware. All these performance tuning
kludges can be replaced by using a cache memory which allows for
efficient code generation by a C/C++ compiler on a single memory space
CPU. Keep in mind that on CPUs with seperate memory spaces, a lot more
instruction codes are wasted which could be used for nifty things like
read-modify-write bit set/toggle/clear operations.

Ken Smith · Apr 12, 2006

A pointer is as dangerous as the programmer using it.

No, a pointer is by its nature dangerous. Because of it, you have to do
all sorts of strange stuff to do bounds checking in C. In other langages,
range checking is much easier to do.

By your remark
about complex data it seems you have little experience with C. It is
not a problem to pass a pointer to complex data types or objects (most
programs do this all the time) and alter the contents.

Yes, you can pass a pointer but like I said, you can neither pass not
return a complex type. You have to pass a pointer if you wish your
subroutine to work with a complex type. You seem to claim to have a lot
experience with C and yet you missed that point.

The syntax of C hides the fact that it is a pass by reference in the
call statement so you have to look at other lines of the code to know if
the routine modifies the value.

C even offers
ways to make function parameters read-only so you'll get a compiler
warning if you overwrite arguments.

Yes and this has zero to do with the statements in my post.

[....]

Old style thinking, old style hardware. All these performance tuning
kludges can be replaced by using a cache memory which allows for
efficient code generation by a C/C++ compiler on a single memory space
CPU. Keep in mind that on CPUs with seperate memory spaces, a lot more
instruction codes are wasted which could be used for nifty things like
read-modify-write bit set/toggle/clear operations.

The cache uses much more silicon to get the same performance. You can get
better performance by having multiple address spaces.

Rich Grise · Apr 12, 2006

[email protected] (Ken Smith) said:
[email protected] (Ken Smith) said:

[....]
In my opinion everything that uses the harvard architecture is very,
very outdated.

The main problem with the HA is that you have seperate memory spaces.
C doesn't like that.

In my opinion everthing that uses C is very, very outdated. The C
language doesn't support the passing of or the returning of complex data
types.

Click to expand...

Click to expand...

Nonsense. It's trivially easy to pass and return structures - the only
limit is your stack size:
------------------------
/* complex_type.c */
/* to demonstrate passing and returning structures from a C function */
#include <stdio.h>

struct FourItems {
int item1;
int item2;
int item3;
int item4;
} A;

struct FourItems manipulate(struct FourItems Q) {
printf("Q.1 = %d, Q.2 = %d, Q.3 = %d, Q.4 = %d\n",
Q.item1, Q.item2, Q.item3, Q.item4);

Q.item1 = 5;
Q.item2 = 12;
Q.item3 = 3;
Q.item4 = 7;
return Q;
}

int main(int argc, char *argv[]) {
struct FourItems B;

A.item1 = 1;
A.item2 = 2;
A.item3 = 3;
A.item4 = 4;

printf("A.1 = %d, A.2 = %d, A.3 = %d, A.4 = %d\n",
A.item1, A.item2, A.item3, A.item4);

B = manipulate(A);
printf("B.1 = %d, B.2 = %d, B.3 = %d, B.4 = %d\n",
B.item1, B.item2, B.item3, B.item4);

exit(0);
}

-----------------------------
Here's its output:

$ complex_type
A.1 = 1, A.2 = 2, A.3 = 3, A.4 = 4
Q.1 = 1, Q.2 = 2, Q.3 = 3, Q.4 = 4
B.1 = 5, B.2 = 12, B.3 = 3, B.4 = 7

-----------------------------
And here's a list of the generated code:
$ cat complex_type.lst
GAS LISTING /tmp/ccQA6n9P.s page 1

1 .file "complex_type.c"
2 .file 1 "complex_type.c"
10 .Ltext0:
11 .file 2 "/usr/lib/gcc-lib/i486-slackware-linux/3.3.6/include/stddef.h"
12 .file 3 "/usr/include/bits/types.h"
13 .file 4 "/usr/include/stdio.h"
14 .file 5 "/usr/include/wchar.h"
15 .file 6 "/usr/include/_G_config.h"
16 .file 7 "/usr/include/gconv.h"
17 .file 8 "/usr/lib/gcc-lib/i486-slackware-linux/3.3.6/include/stdarg.h"
18 .file 9 "/usr/include/libio.h"
19 .section .rodata
20 .align 32
21 .LC0:
22 0000 512E3120 .string "Q.1 = %d, Q.2 = %d, Q.3 = %d, Q.4 = %d\n"
22 3D202564
22 2C20512E
22 32203D20
22 25642C20
23 .text
24 .globl manipulate
26 manipulate:
27 .LFB3:
1:complex_type.c **** /* complex_type.c */
2:complex_type.c **** /* to demonstrate passing and returning structures from a C function */
3:complex_type.c **** #include <stdio.h>
4:complex_type.c ****
5:complex_type.c **** struct FourItems {
6:complex_type.c **** int item1;
7:complex_type.c **** int item2;
8:complex_type.c **** int item3;
9:complex_type.c **** int item4;
10:complex_type.c **** } A;
11:complex_type.c ****
12:complex_type.c **** struct FourItems manipulate(struct FourItems Q) {
28 .loc 1 12 0
29 0000 55 pushl %ebp
30 .LCFI0:
31 0001 89E5 movl %esp, %ebp
32 .LCFI1:
33 0003 53 pushl %ebx
34 .LCFI2:
35 0004 83EC04 subl $4, %esp
36 .LCFI3:
37 0007 8B5D08 movl 8(%ebp), %ebx
13:complex_type.c **** printf("Q.1 = %d, Q.2 = %d, Q.3 = %d, Q.4 = %d\n",
38 .loc 1 13 0
39 000a 83EC0C subl $12, %esp
40 000d FF7518 pushl 24(%ebp)
41 0010 FF7514 pushl 20(%ebp)
42 0013 FF7510 pushl 16(%ebp)
43 0016 FF750C pushl 12(%ebp)
44 0019 68000000 pushl $.LC0
44 00
45 .LCFI4:
46 001e E8FCFFFF call printf
46 FF

GAS LISTING /tmp/ccQA6n9P.s page 2

47 0023 83C420 addl $32, %esp
14:complex_type.c **** Q.item1, Q.item2, Q.item3, Q.item4);
15:complex_type.c ****
16:complex_type.c **** Q.item1 = 5;
48 .loc 1 16 0
49 0026 C7450C05 movl $5, 12(%ebp)
49 000000
17:complex_type.c **** Q.item2 = 12;
50 .loc 1 17 0
51 002d C745100C movl $12, 16(%ebp)
51 000000
18:complex_type.c **** Q.item3 = 3;
52 .loc 1 18 0
53 0034 C7451403 movl $3, 20(%ebp)
53 000000
19:complex_type.c **** Q.item4 = 7;
54 .loc 1 19 0
55 003b C7451807 movl $7, 24(%ebp)
55 000000
20:complex_type.c **** return Q;
56 .loc 1 20 0
57 0042 8B450C movl 12(%ebp), %eax
58 0045 8903 movl %eax, (%ebx)
59 0047 8B4510 movl 16(%ebp), %eax
60 004a 894304 movl %eax, 4(%ebx)
61 004d 8B4514 movl 20(%ebp), %eax
62 0050 894308 movl %eax, 8(%ebx)
63 0053 8B4518 movl 24(%ebp), %eax
64 0056 89430C movl %eax, 12(%ebx)
21:complex_type.c **** }
65 .loc 1 21 0
66 0059 89D8 movl %ebx, %eax
67 005b 8B5DFC movl -4(%ebp), %ebx
68 005e C9 leave
69 005f C20400 ret $4
70 .LFE3:
72 .section .rodata
73 0028 00000000 .align 32
73 00000000
73 00000000
73 00000000
73 00000000
74 .LC1:
75 0040 412E3120 .string "A.1 = %d, A.2 = %d, A.3 = %d, A.4 = %d\n"
75 3D202564
75 2C20412E
75 32203D20
75 25642C20
76 0068 00000000 .align 32
76 00000000
76 00000000
76 00000000
76 00000000
77 .LC2:
78 0080 422E3120 .string "B.1 = %d, B.2 = %d, B.3 = %d, B.4 = %d\n"
78 3D202564
78 2C20422E

GAS LISTING /tmp/ccQA6n9P.s page 3

78 32203D20
78 25642C20
79 .text
80 .globl main
82 main:
83 .LFB5:
22:complex_type.c ****
23:complex_type.c **** int main(int argc, char *argv[]) {
84 .loc 1 23 0
85 0062 55 pushl %ebp
86 .LCFI5:
87 0063 89E5 movl %esp, %ebp
88 .LCFI6:
89 0065 83EC18 subl $24, %esp
90 .LCFI7:
91 0068 83E4F0 andl $-16, %esp
92 006b B8000000 movl $0, %eax
92 00
93 0070 29C4 subl %eax, %esp
24:complex_type.c **** struct FourItems B;
25:complex_type.c ****
26:complex_type.c **** A.item1 = 1;
94 .loc 1 26 0
95 .LBB2:
96 0072 C7050000 movl $1, A
96 00000100
96 0000
27:complex_type.c **** A.item2 = 2;
97 .loc 1 27 0
98 007c C7050400 movl $2, A+4
98 00000200
98 0000
28:complex_type.c **** A.item3 = 3;
99 .loc 1 28 0
100 0086 C7050800 movl $3, A+8
100 00000300
100 0000
29:complex_type.c **** A.item4 = 4;
101 .loc 1 29 0
102 0090 C7050C00 movl $4, A+12
102 00000400
102 0000
30:complex_type.c ****
31:complex_type.c **** printf("A.1 = %d, A.2 = %d, A.3 = %d, A.4 = %d\n",
103 .loc 1 31 0
104 009a 83EC0C subl $12, %esp
105 009d FF350C00 pushl A+12
105 0000
106 00a3 FF350800 pushl A+8
106 0000
107 00a9 FF350400 pushl A+4
107 0000
108 00af FF350000 pushl A
108 0000
109 00b5 68400000 pushl $.LC1
109 00
110 .LCFI8:

GAS LISTING /tmp/ccQA6n9P.s page 4

111 00ba E8FCFFFF call printf
111 FF
112 00bf 83C420 addl $32, %esp
32:complex_type.c **** A.item1, A.item2, A.item3, A.item4);
33:complex_type.c ****
34:complex_type.c **** B = manipulate(A);
113 .loc 1 34 0
114 00c2 8D45E8 leal -24(%ebp), %eax
115 00c5 83EC0C subl $12, %esp
116 00c8 FF350C00 pushl A+12
116 0000
117 00ce FF350800 pushl A+8
117 0000
118 00d4 FF350400 pushl A+4
118 0000
119 00da FF350000 pushl A
119 0000
120 00e0 50 pushl %eax
121 00e1 E8FCFFFF call manipulate
121 FF
122 00e6 83C41C addl $28, %esp
35:complex_type.c **** printf("B.1 = %d, B.2 = %d, B.3 = %d, B.4 = %d\n",
123 .loc 1 35 0
124 00e9 83EC0C subl $12, %esp
125 00ec FF75F4 pushl -12(%ebp)
126 00ef FF75F0 pushl -16(%ebp)
127 00f2 FF75EC pushl -20(%ebp)
128 00f5 FF75E8 pushl -24(%ebp)
129 00f8 68800000 pushl $.LC2
129 00
130 00fd E8FCFFFF call printf
130 FF
131 0102 83C420 addl $32, %esp
36:complex_type.c **** B.item1, B.item2, B.item3, B.item4);
37:complex_type.c ****
38:complex_type.c **** exit(0);
132 .loc 1 38 0
133 0105 83EC0C subl $12, %esp
134 0108 6A00 pushl $0
135 .LCFI9:
136 010a E8FCFFFF call exit
136 FF
39:complex_type.c **** }
137 .loc 1 39 0
138 .LBE2:
139 .LFE5:
141 .comm A,16,4
212 .Letext0:
2232 .ident "GCC: (GNU) 3.3.6"

GAS LISTING /tmp/ccQA6n9P.s page 5

DEFINED SYMBOLS
*ABS*:0000000000000000 complex_type.c
/tmp/ccQA6n9P.s:26 .text:0000000000000000 manipulate
/tmp/ccQA6n9P.s:82 .text:0000000000000062 main
*COM*:0000000000000010 A

UNDEFINED SYMBOLS
printf
exit
-------------------------

I have no idea why it's claiming 'printf' and 'exit' are undefined. ?:-/

Cheers!
Rich

slebetman@yahoo.com · Apr 13, 2006

Nico said:
In my opinion everything that uses the harvard architecture is very,
very outdated.

Are you kidding me? Every single modern superscalar CPU is internally a
Harvard CPU. x86, PowerPC, Sparc, MIPS, ARM9 and Alpha are
microarchitecturally Harvard, only at the point of memory controllers
do they look like traditional Von Neumann CPUs. What do you think the
separation of icache and dcache is all about? It's about performance:
being able to decode the next istruction while manipulating the current
data. It's about being able to fetch and commit data and instructions
simultaneously.

C has absolutely no problem with Harvard architecture. In fact, C does
not fully support Von Neumann and the C runtime is designed as a
Harvard machine. Tell me where in the C standard does the language
allow you to write self-modifying code. There are some system calls
that allows you to load data (for example, from a file stream) into the
instruction stream/process table. Exec is an example. But such
functions are not covered by the C standard and is NOT part of the C
standard.

Pure Harvard CPUs like the PIC and AVR have supported C for a long
time. In fact, the AVR was not only designed specifically to support C
but was designed with the input of C compiler writers/developers. I
have 5 C compilers designed for pure Harvard machines so don't tell me
that C "doesn't like" Harvard. OK, yeah the PIC is difficult to write a
compiler for but not because or the Harvard architecture but the
convoluted memory map. The AVR was designed from the ground up from the
point of view of the C compiler.

slebetman@yahoo.com · Apr 13, 2006

Nico Coesel wrote:
single memory space

Keep in mind that on CPUs with seperate memory spaces, a lot more
instruction codes are wasted which could be used for nifty things like
read-modify-write bit set/toggle/clear operations.

Keep in mind not all CPUs look like x86. Most pure Harvard machines
like the PIC and AVR and some Von Neumann CPUs (PowerPC I believe?)
have built in bit set/toggle/clear instructions and can do it in
hardware in a single instruction cycle - far less wasted instructions
and much more nifty. Besides, bit read-modify-write is done on the data
memory, nobody's touching the program memory, so it doesn't matter if
it's Harvard or Von Neumann - both behaves the same and if the
instruction sets are similar requires exactly the same amount of
instruction codes. So what exactly are you talking about?

slebetman@yahoo.com · Apr 13, 2006

Ken said:
Yes, you can pass a pointer but like I said, you can neither pass not
return a complex type. You have to pass a pointer if you wish your
subroutine to work with a complex type. You seem to claim to have a lot
experience with C and yet you missed that point.

You seem to miss the point that C CAN IN FACT pass and return a complex
data type. You don't need to use a pointer though lots of C programs
do. If you want you can pass the struct itself:

struct foo {
int x;
int y;
}

struct foo doubleFoo(struct foo f) {
f.x = f.x * 2;
f.y = f.y * 2;
return f;
}

int main(void) {
struct foo bar;
bar.x = 1;
bar.y = 1;
bar = doubleFoo(bar);
return 0;
}

Keith · Apr 13, 2006

Are you kidding me? Every single modern superscalar CPU is internally a
Harvard CPU. x86, PowerPC, Sparc, MIPS, ARM9 and Alpha are
microarchitecturally Harvard, only at the point of memory controllers
do they look like traditional Von Neumann CPUs.

Certainly the internals of these ar harvard, to the external world they
are Princeton. The L2 caches and bus controllers of all of these
processors are classic Von Neumann. Self-modifying code can be an issue
though.

What do you think the separation of icache and dcache is all about?
It's about performance: being able to decode the next istruction while
manipulating the current data. It's about being able to fetch and commit
data and instructions simultaneously.

No, the seperate I and D caches are all about simplicity, thus
performance. The I-Caches can be relatively simple since they can only
have one write and one read port and snooping isn't an issue. D-Caches
are far more complex and L2s even more so. Simplicity => fast.

C has absolutely no problem with Harvard architecture. In fact, C does
not fully support Von Neumann and the C runtime is designed as a Harvard
machine. Tell me where in the C standard does the language allow you to
write self-modifying code.

Where does it not allow this? OTOH, who cares?

slebetman@yahoo.com · Apr 13, 2006

Keith said:
No, the seperate I and D caches are all about simplicity, thus
performance. The I-Caches can be relatively simple since they can only
have one write and one read port and snooping isn't an issue. D-Caches
are far more complex and L2s even more so. Simplicity => fast.

Actually Yes. And yes you are also right. The speed gain achieved
through simplicity mainly affects three or more operand instruction
sets. For single and double operand instruction sets the dcache will
only need one read and one write port. Well, double operand in terms of
one read and one write. Of course an accumulator based two operand
instruction set may require two read ports.

The first significant gain of separating dcache and icache is that you
can now achieve a one instruction cycle per clock cycle execution
because you are no longer constrained by having to multiplex between
instruction access and data access on the same bus. This affects even
single operand machines. The second gain is as you stated for
multi-operand CPUs the reduced complexity of the icache allows for
faster execution.

Where does it not allow this?

Neither the original K&R C nor C89 nor C90 nor C99 allows you to modify
your source at run time. There is no "eval" function in C or anything
similar which allows you to touch the instruction stream
programatically.

OTOH, who cares?

I was responding to Nico's statement:

The main problem with the HA is that you have seperate memory spaces.
C doesn't like that.

which in my opinion is false since C virtually *assumes* a Harvard
architecture in the language specs. By this I mean that C doesn't
support the main extra feature of a Von Neumann CPU: being able to
access your instruction stream. C doesn't allow that.

Keith · Apr 13, 2006

Actually Yes. And yes you are also right. The speed gain achieved
through simplicity mainly affects three or more operand instruction
sets. For single and double operand instruction sets the dcache will
only need one read and one write port.

No, the D-caches must be multi-ported for a number of other
reasons. At least two ports (read/write) are needed for load/store
operations and two more for the L2. Of course there may be more
than one load/store unit (and D-Cache ports to go with them) as
well. The I-cache doesn't need the write port for the load/store
nor the read port for the L2.

Well, double operand in terms of
one read and one write. Of course an accumulator based two operand
instruction set may require two read ports.

How do you propose to load the D-Cache?

The first significant gain of separating dcache and icache is that you
can now achieve a one instruction cycle per clock cycle execution
because you are no longer constrained by having to multiplex between
instruction access and data access on the same bus. This affects even
single operand machines. The second gain is as you stated for
multi-operand CPUs the reduced complexity of the icache allows for
faster execution.

No, this can be done with a shared cache (and was for years) but
the number of ports, thus the complexity goes up.

Neither the original K&R C nor C89 nor C90 nor C99 allows you to modify
your source at run time. There is no "eval" function in C or anything
similar which allows you to touch the instruction stream
programatically.

Source? That's syntax. There is nothing preventing you from
writing anywhere in memory.

I was responding to Nico's statement:
which in my opinion is false since C virtually *assumes* a Harvard
architecture in the language specs. By this I mean that C doesn't
support the main extra feature of a Von Neumann CPU: being able to
access your instruction stream. C doesn't allow that.

How does it disallow it? Instructions aren't tagged any
differently than data. Well, they are in some architectures but
that's not reflected in the language. ISTM that if C disallowed
executing data there wouldn't be all the stack overflow holes in
Windows we see.

Ken Smith · Apr 13, 2006

Rich Grise said:
Nonsense. It's trivially easy to pass and return structures - the only
limit is your stack size:

Yes you are right, it works on gcc. Some years back I tried it on quite a
different C compiler and it choked. Is this a case of a bad compiler or
is it something added in, lets say, the last 7 years.

BTW: Can this be done with an array?

slebetman@yahoo.com · Apr 13, 2006

Keith said:
No, the D-caches must be multi-ported for a number of other
reasons.

Not for single operand CPU. You just need a single read/write port
because no instruction can both read and write at the same time.

How do you propose to load the D-Cache?

By loading it? I don't understand your question.

No, this can be done with a shared cache (and was for years) but
the number of ports, thus the complexity goes up.

Of course it can. But before people tried doing it with shared cache it
was already been done as separate I and D cache. In part becasue in the
olden days multi port memory chips were rare and expensive so the main
reason was to be able to fetch data and instruction at the same time.

Source? That's syntax. There is nothing preventing you from
writing anywhere in memory.

Ah but writing "code" like that is not really C is it? It is writing
direct machine code (which is even lower than assembly programming).
That is way outside the scope of the C language and is highly
platform/compiler/implementation/CPU specific.

How does it disallow it? Instructions aren't tagged any
differently than data. Well, they are in some architectures but
that's not reflected in the language. ISTM that if C disallowed
executing data there wouldn't be all the stack overflow holes in
Windows we see.

OK disallow is probably the wrong word. It doesn't support that. If you
want this sort of thing you have to write the machine code yourself and
pack it in some data in your C program. The compiler won't do it for
you. Scripting languages on the other hand often allow you to write
self-modifying code and is supported directly by the language.

# Tcl example:
# This function will return 1 only once and then
# modifies itself to always return zero:
proc once {} {
proc once {} {return 0}
return 1
}

but regardless of weather it allows or not the C language is perfectly
happy with the Harvard architecture. It is only when you try to go
outside the C language and directly write hand-assembled
binary-machine-code to the instruction stream will there be a
difference between Harvard and Von Neumann machines. Nico's statement
is still wrong.

Keith · Apr 13, 2006

Not for single operand CPU. You just need a single read/write port
because no instruction can both read and write at the same time.

Baloney. You need a write and a read port for the processor and
another set for the L2/bus. Just because one instruction can't
read and write at the same time doesn't mean there is only one
instruction executing. Time multiplexing ports is still creating
ports, and complexity.

By loading it? I don't understand your question.

You have a read and a write port for the processor. How does the
data get into the cache? A: Two more ports (at least).

Of course it can. But before people tried doing it with shared cache it
was already been done as separate I and D cache. In part becasue in the
olden days multi port memory chips were rare and expensive so the main
reason was to be able to fetch data and instruction at the same time.

Actually it was done that way before because the I and D units were
too far apart.

Ah but writing "code" like that is not really C is it? It is writing
direct machine code (which is even lower than assembly programming).
That is way outside the scope of the C language and is highly
platform/compiler/implementation/CPU specific.

Using that definition, one can't even write self modifying code in
assembler either.

OK disallow is probably the wrong word. It doesn't support that. If you
want this sort of thing you have to write the machine code yourself and
pack it in some data in your C program. The compiler won't do it for
you. Scripting languages on the other hand often allow you to write
self-modifying code and is supported directly by the language.

Your definition makes "self modifying code" a useless term.

# Tcl example:
# This function will return 1 only once and then
# modifies itself to always return zero:
proc once {} {
proc once {} {return 0}
return 1
}

Ok, so a language must be interpreted to allow "self modifying"
code. Strange definition.

but regardless of weather it allows or not the C language is perfectly
happy with the Harvard architecture. It is only when you try to go
outside the C language and directly write hand-assembled
binary-machine-code to the instruction stream will there be a
difference between Harvard and Von Neumann machines. Nico's statement
is still wrong.

I didn't say he was right. ...and I still don't like C. ;-)

Rich Grise · Apr 13, 2006

Yes you are right, it works on gcc. Some years back I tried it on quite a
different C compiler and it choked. Is this a case of a bad compiler or
is it something added in, lets say, the last 7 years.

BTW: Can this be done with an array?

Dang! I don't know! How do you declare the function whose return value is
an array?

int[] foo(int[]); // ?

You have to hard-code the size, I think - or maybe union it onto a
structure? Then you'd still have to have a fixed size.

Probably not. Anybody want to try?

I'm gonna crosspost this to comp.lang.c , but set followups to
sci.electronics.design, because those c.l.c folks are a little touchy
about marginally off-topic posts and banter and stuff.

Thanks,
Rich

Rich Grise · Apr 13, 2006

Nico Coesel wrote:
single memory space

Keep in mind not all CPUs look like x86. Most pure Harvard machines
like the PIC and AVR and some Von Neumann CPUs (PowerPC I believe?)
have built in bit set/toggle/clear instructions and can do it in
hardware in a single instruction cycle - far less wasted instructions
and much more nifty. Besides, bit read-modify-write is done on the data
memory, nobody's touching the program memory, so it doesn't matter if
it's Harvard or Von Neumann - both behaves the same and if the
instruction sets are similar requires exactly the same amount of
instruction codes. So what exactly are you talking about?

The one with the two busses can fetch the next instruction while it's
executing the first one, simultaneously. Nowadays, with these insanely
fast processors, it's pretty much unnecessary with all that queueing
going on.

Cheers!
Rich

Moore's Lobby Podcast

Menu

Categories

Platforms

Content

Connect With Us

Network

VON NEUMANN VS HARVARD

VON NEUMANN VS HARVARD

UFO Joe

Fred Bloggs

Rene Tschaggelar

Nico Coesel

Ken Smith

Ken Smith

Nico Coesel

Ken Smith

Rich Grise

[email protected]

[email protected]

[email protected]

Keith

[email protected]

Keith

Ken Smith

[email protected]

Keith

Rich Grise

Rich Grise

Similar threads