Re: Intel details future Larrabee graphics chip

Wilco Dijkstra · Aug 19, 2008

Nick Maclaren said:
I am getting tired of simply pointing out factual errors, and this
will be my last on this sub-thread.

Which factual errors?

|> Most compilers, including the highly optimizing ones, do almost all
|> optimization at a far lower level. This not only avoids most of the issues
|> you're talking about, but it also ensures badly behaved programs are
|> correctly optimized, while well behaved programs are still optimized
|> aggressively.

I spent 10 years managing a wide range of HPC machines (and have advised
on such uses for much longer). You are wrong in all respects, as you
can find out if you look. Try Sun's and IBM's compiler documentation,
for a start, and most of the others (though I can't now remember which).

Your claims that it isn't a problem would make anyone with significant
HPC experience laugh hollowly. Few other people use aggressive
optimisation on whole, complicated programs. Even I don't, for most
code.

Click to expand...

And I laugh in their face about their claims of creating a "highly optimizing
compiler" that generates incorrect code! Any idiot can write a highly
optimizing compiler if it doesn't need to be correct... I know that many
of the issues are caused by optimizations originally written for other
languages (eg. Fortran has pretty loose aliasing rules), but which
require more checks to be safe in C.

My point is that compilers have to compile existing code correctly - even
if it is written badly. It isn't hard to recognise nasty cases, for example it's
common to do *(T*)&var to convert between integer and floating point.
Various compilers treat this as an idiom and use direct int<->FP moves
which are more efficient. So this particular case wouldn't even show up
when doing type based alias analysis.

|> S370, Alpha and PA-RISC all support arithmetic right shifts. There
|> is no information available on the S-3600.

All or almost all of those use only the bottom few bits of the shift.

Click to expand...

That is typical of all implementations, but it is not a big issue, and the
standard is correct in this respect.

I can't remember the recent systems that had only unsigned shifts, but
they may have been in one or of the various SIMD extensions to various
architectures.

Click to expand...

Even if you only have unsigned shifts, you can still emulate arithmetic
ones. My point is there is no excuse for getting them wrong, even if
your name is Cray and you can improve cycle time by not supporting
them in hardware.

|> > Signed left shifts are undefined only if they overflow; that is undefined
|> > because anything can happen (including the CPU stopping). Signed right
|> > shifts are only implementation defined for negative values; that is
|> > because they might be implemented as unsigned shifts.
|>
|> No. The standard is quite explicit that any left shift of a negative value
|> is undefined, even if they there is no overflow. This is an inconsistency
|> as compilers change multiplies by a power of 2 into a left shift and visa
|> versa. There is no similar undefined behaviour for multiplies however.

From the standard:

[#4] The result of E1 << E2 is E1 left-shifted E2 bit
positions; vacated bits are filled with zeros. If E1 has an
unsigned type, the value of the result is E1×2^E2, reduced
modulo one more than the maximum value representable in the
result type. If E1 has a signed type and nonnegative value,
and E1×2^E2 is representable in the result type, then that is
the resulting value; otherwise, the behavior is undefined.

Click to expand...

Exactly my point. It clearly states that ALL leftshifts of negative values are
undefined, EVEN if they would be representable. The "and nonnegative value"
excludes negative values! The correct wording should be something like:

"If E1 has a signed type and E1×2^E2 is representable in the result type, then
that is the resulting value; otherwise, the behavior is implementation defined."

Wilco

Nick Maclaren · Aug 19, 2008

|>
|> > [#4] The result of E1 << E2 is E1 left-shifted E2 bit
|> > positions; vacated bits are filled with zeros. If E1 has an
|> > unsigned type, the value of the result is E1×2^E2, reduced
|> > modulo one more than the maximum value representable in the
|> > result type. If E1 has a signed type and nonnegative value,
|> > and E1×2^E2 is representable in the result type, then that is
|> > the resulting value; otherwise, the behavior is undefined.
|>
|> Exactly my point. It clearly states that ALL leftshifts of negative values are
|> undefined, EVEN if they would be representable. The "and nonnegative value"
|> excludes negative values! The correct wording should be something like:

Yes, you are correct there, and I was wrong. I apologise.

Regards,
Nick Maclaren.

Wilco Dijkstra · Aug 19, 2008

Nick Maclaren said:
|>
|> Byte addressability is still uncommon in DSP world. And no, C
|> compilers for DSPs do not emulate char in a manner that you suggested
|> below. They simply treat char and short as the same thing, on 32-bit
|> systems char, short and long are all the same. I am pretty sure that
|> what they do is in full compliance with the C standard.

Well, it is and it isn't :-( There was a heated debate on SC22WG14,
both in C89 and C99, where the UK wanted to get the standard made
self-consistent. We failed. The current situation is that it is in
full compliance for a free-standing compiler, but not really for a
hosted one (think EOF). This was claimed not to matter, as all DSP
compilers are free-standing!

Eventhough the standard is vague as usual about the relative sizes of integer
types besides the minimum sizes, it is widely accepted that int must be larger
than char and long long larger than int. That means a 32-bit DSP must support
at least 3 different sizes. Even so, making short=int=long is bound to cause
trouble, a lot of software can deal with short=int or int=long but not both.

|> > Putting in extra effort to allow for a theoretical system with
|> > sign-magnitude 5-bit char or a 31-bit one-complement int is
|> > completely insane.
|>
|> Agreed

However, allowing for ones with 16- or 32-bit chars, or signed
magnitude integers is not. The former is already happening, and there
are active, well-supported attempts to introduce the latter (think
IEEE 754R). Will they ever succeed? Dunno.

32-bit wchar_t is OK, but 32-bit char is a bad idea (see above).
C99 already allows sign magnitude integers. Or do you mean BCD integers?
That would be a disaster of unimaginable proportion...

|> It seems you overlooked the main point of Nick's concern - sized types
|> prevent automagical forward compatibility of the source code with
|> larger problems on bigger machines.

That's not true. Most problems do not get "larger" over time. Since DSP's
are mentioned, imagine implementing a codec like AMR. You need a certain
minimum size to process the fixed point samples. Larger types do not help
at all (one often needs to saturate to a certain width, in other cases you can
precalculate the maximum width needed for the required precision).
For this kind of problem sized types are the most natural.

Now there are of course cases where the problem does get larger. That's
why we've got ptrdiff_t - there is no reason to fix it size. I never said that we
should completely abolish variable sized types, but that the standard should
*mandate* that all implementations support the sized types int8, int16 etc.

One of the key advantages of sized types is that software needs less
porting effort. Eventhough Nick will claim his software runs on any system
ever made, in reality it's nontrivial to ensure software works on systems
with different integer sizes. I bet a lot of C code fails on this 32-bit-only DSP.
However if the sized types were supported any code would work unchanged.
Java uses sized types for the same reason.

Wilco

Nick Maclaren · Aug 19, 2008

|>
|> Even though Nick will claim his software runs on any system
|> ever made,

Please don't be ridiculous. I have never made such a claim, and
have used some systems so tricky that I had trouble writing even
simple Fortran that worked on them.

Regards,
Nick Maclaren.

already5chosen@yahoo.com · Aug 19, 2008

For "modern", besides the C6000 series, I'd include the ADI/Intel
Blackfin and the VLSI ZSP series.

Why don't you count C55? It is relatively new and, according to my
understanding of the market, by far the most popular general purpose
DSP in the world.

I haven't used either of those in
anger, but I believe that they're both more-or-less "C" compliant.

Of those you mentioned I only used Blackfin. It's support for 'C" is,
indeed, idiomatic as you call it.

The
main other "newness" in DSP-land are all of the DSP-augmented RISC
processors, and they're all essentially pure "C" machines, too (alignment
issues can be worse though, and you often have to use asm or intrinsics
to get at the DSP features.)

IMHO, the main newness in DSP world is that on "simple algorithms,
high throughput" front classic programmable Von-Neuman or Harward
machines are less and less competitive with FPGAs. Appearance of HW
multipliers in cost-oriented Spartan and Cyclone series changed the
game once and for all. So traditional DSP vendors, esp. TI and ADI,
should look for new niches. IMHO, it also means that C6000 and to less
extend TigerSharc lines don't have a bright future. On the other hand,
C55, Blackfin and flash-based C28 and similar Freescale products are
not at danger.
Oh, quite off topic...

Nick Maclaren · Aug 19, 2008

|>
|> > I haven't used either of those in
|> > anger, but I believe that they're both more-or-less "C" compliant.
|>
|> Of those you mentioned I only used Blackfin. It's support for 'C" is,
|> indeed, idiomatic as you call it.

It sounds interesting - thanks for the pointer.

|> IMHO, the main newness in DSP world is that on "simple algorithms,
|> high throughput" front classic programmable Von-Neuman or Harward
|> machines are less and less competitive with FPGAs.

Unfortunately, FPGAs have some pretty serious restrictions on the
classes of programming paradigm that is appropriate. And some of
the trickier cases are important for that market - not typically
DSP as such, but controllers. That is the main reason that the FPGA
fanatics are wrong that they are going to take over the world, even
if they do take over several important markets.

Personally, I would like to see FPGAs become cheap enough for the
ordinary hobbyist to use for large projects. We might see some
progress with getting away from the domination of the current subset
of the Von Neumann model. I still think that dataflow deserves a
fresh look, now we have got away from the constraints of the 1980s.

|> Oh, quite off topic...

Totally. You are ordered to stand in the corner for posting something
on computer architecture. I will come and join you shortly.

Regards,
Nick Maclaren.

already5chosen@yahoo.com · Aug 19, 2008

Personally, I would like to see FPGAs become cheap enough for the ordinary hobbyist to use for large projects.

I can't quite figure out how "large projects" belong in the same
statement with "cheap enough" and "ordinary hobbyist".

"Large projects" aside FPGA evaluation boards and development tools
are certainly cheap enough for the ordinary hobbyist right now.

Nick Maclaren · Aug 19, 2008

|> >
|> > Personally, I would like to see FPGAs become cheap enough for the
|> > ordinary hobbyist to use for large projects.
|>
|> I can't quite figure out how "large projects" belong in the same
|> statement with "cheap enough" and "ordinary hobbyist".

Think Linux. Think gcc.

|> "Large projects" aside FPGA evaluation boards and development tools
|> are certainly cheap enough for the ordinary hobbyist right now.

Hmm. The last time I looked, the cheap versions were so restrictive
as to be implausible.

Regards,
Nick Maclaren.

already5chosen@yahoo.com · Aug 19, 2008

|> >
|> > Personally, I would like to see FPGAs become cheap enough for the
|> > ordinary hobbyist to use for large projects.
|>
|> I can't quite figure out how "large projects" belong in the same
|> statement with "cheap enough" and "ordinary hobbyist".

Think Linux. Think gcc.

I'd rather call RMS and Linus extraordinary hobbyists

And the project themselves didn't retain true self-financing hobbyist
status for too long.

|> "Large projects" aside FPGA evaluation boards and development tools
|> are certainly cheap enough for the ordinary hobbyist right now.

Hmm. The last time I looked, the cheap versions were so restrictive
as to be implausible.

Regards,
Nick Maclaren.

Nick, after all these years you should have learned that people rarely
able to read your thoughts. Want us to understand you? Be more
specific! Restrictive in what sense? For around $2K/year you can get
the tools that are sufficient for 95% of commercial users. Why
wouldn't they be good enough for ordinary hobbyist?

If you are buying dev. kit you typically get 1-year software license
for free.

http://www.altera.com/products/devkits/kit-dev_platforms.jsp
http://www.xilinx.com/products/boards/virtex_boards_feature.htm
http://www.xilinx.com/products/boards/s3_sk_promo.htm

I am sure that being at super-prestigious uni you can get even better
deal from you local Altera or Xilinx representative. IMHO, if it
wasn't illegal, they would be glad to pay you for spreading their
message in Cambridge labs .

Nick Maclaren · Aug 19, 2008

|>
|> > |> "Large projects" aside FPGA evaluation boards and development tools
|> > |> are certainly cheap enough for the ordinary hobbyist right now.
|> >
|> > Hmm. The last time I looked, the cheap versions were so restrictive
|> > as to be implausible.
|>
|> Nick, after all these years you should have learned that people rarely
|> able to read your thoughts. Want us to understand you? Be more
|> specific! Restrictive in what sense?

I started to be, but it got complicated :-( The problems varied
with time and company.

|> For around $2K/year you can get
|> the tools that are sufficient for 95% of commercial users. Why
|> wouldn't they be good enough for ordinary hobbyist?
|>
|> If you are buying dev. kit you typically get 1-year software license
|> for free.
|>
|> http://www.altera.com/products/devkits/kit-dev_platforms.jsp
|> http://www.xilinx.com/products/boards/virtex_boards_feature.htm
|> http://www.xilinx.com/products/boards/s3_sk_promo.htm

Thanks for the update. If I get a moment, I will take a look at
that. My personal problem is, of course, that it means learning
a completely new skill set - which takes me longer at 60 than it
used to!

I know that people work on minor tweaks, but they aren't the
really interesting possibilities. A faster error function is
useful, and effectively impossible in software, but doesn't lead
to any breakthroughs.

Regards,
Nick Maclaren.

MooseFET · Aug 20, 2008

On Aug 20, 12:35 am, [email protected] (Nick Maclaren) wrote:
[....]

Personally, I would like to see FPGAs become cheap enough for the
ordinary hobbyist to use for large projects. We might see some
progress with getting away from the domination of the current subset
of the Von Neumann model. I still think that dataflow deserves a
fresh look, now we have got away from the constraints of the 1980s.

I think that the biggest problem keeping the FPGAs from being used by
small companies and hobbiests is the problem of tools. If there was a
compiler for FPGAs that was like "gcc", they would be a lot more
useful.

The tools are too huge and include lots of things you don't really
want.

They are rented to you for perhaps as little as nothing per year but
you can't own the tools.

Many of the tools require way too much knowledge about the internal
details of the chip to make your source code really portable.

Torben Ægidius Mogensen · Aug 20, 2008

I still think that dataflow deserves a fresh look, now we have got
away from the constraints of the 1980s.

I agree.

The forwarding network inside modern OOO processors is very much like
dataflow, so it should not be a big step to make a "real" dataflow
CPU, where the visible programming model is dataflow. It would
probably simplify a lot of things, such as renaming, since it is
explicit when a value is dead and its holder can be re-used.

And compiling to dataflow is not really that difficult. The SSA form
used in many modern compilers is not far from a dataflow model, with
the phi nodes acting as merge nodes, and it should be no major effort
to convert SSA into true dataflow (or generate dataflow directly
instead).

Torben

Nick Maclaren · Aug 20, 2008

|>
|> > I still think that dataflow deserves a fresh look, now we have got
|> > away from the constraints of the 1980s.
|>
|> I agree.
|>
|> ...
|>
|> And compiling to dataflow is not really that difficult. The SSA form
|> used in many modern compilers is not far from a dataflow model, with
|> the phi nodes acting as merge nodes, and it should be no major effort
|> to convert SSA into true dataflow (or generate dataflow directly
|> instead).

Yes. The area which interests me, and which is so far unsolved, is
how to design a dataflow language that is suitable for the majority
of applications currently programmed in Von Neumann ones. The
ridiculous thing is that a lot of application requirements fit very
naturally into a dataflow paradigm (e.g. GUIs) - the problem is
almost entirely in the programming of their components.

Aside: does anyone know why the "Harvard" approach was promoted from
being a trivial but important variation of Von Neumann to being of
equal rank, starting about 20 years ago? Because it assuredly ain't
so, despite the nonsense in Wikipedia, and almost all programming
languages have used separate code and data "address spaces" since
the invention of COBOL and FORTRAN, and were/are always talked about
as using the Von Neumann model (as they do).

Regards,
Nick Maclaren.

Nick Maclaren · Aug 20, 2008

|>
|> Unfortunately, Nick sometimes writes really good insights, and
|> sometimes just blows opinions out his ass. I was more than a little
|> annoyed when I found out that his definition of "Intel didn't do real
|> VM until the 386" really meant "Nick has no clue what Intel 286 VM
|> looked like".

Whereas I wasn't at all annoyed when I found out that you didn't
know the difference between real virtual memory, and the rudimentary
mechanisms which were almost totally abandoned in the UK in the
1960s, and were not called virtual memory by their inventors.

But I am annoyed when you make assertions about me that are false.

I knew then when Intel 286 so-called virtual memory looked like,
and I don't call it virtual memory. Nor, interestingly, did most
of the people in IBM I talked to - they took a HELL of a long time
to learn about virtual memory, but did eventually learn. Other
people seem slower.

Regards,
Nick Maclaren.

Michel Hack · Aug 20, 2008

It would be correct as long as there is no overflow. Ie. 0xffffffff << 1
becomes 0xfffffffe as expected.

Ah yes -- but is that high-order one-bit the ORIGINAL one-bit, or is
it a one-bit that was shifted into it? ;-)

Michel.

AnimalMagic · Aug 21, 2008

|>
|> Even though Nick will claim his software runs on any system
|> ever made,

Please don't be ridiculous. I have never made such a claim, and
have used some systems so tricky that I had trouble writing even
simple Fortran that worked on them.

NetHack runs on almost anything.

JosephKK · Aug 23, 2008

On Aug 20, 12:35 am, [email protected] (Nick Maclaren) wrote:
[....]

Personally, I would like to see FPGAs become cheap enough for the
ordinary hobbyist to use for large projects. We might see some
progress with getting away from the domination of the current subset
of the Von Neumann model. I still think that dataflow deserves a
fresh look, now we have got away from the constraints of the 1980s.

Click to expand...

I think that the biggest problem keeping the FPGAs from being used by
small companies and hobbiests is the problem of tools. If there was a
compiler for FPGAs that was like "gcc", they would be a lot more
useful.

Click to expand...

There are a number of algorithms I know that would have great
benefit from some dedicated bit-shifting in hardware. As long as
the FPGA is not too far removed from the CPU, and of acceptable
speed. It usually involves scanning a few hundred to a few tens
of thousand bytes and generating data.

Not fully sure what you mean with dedicated bit shifting in hardware.
If you mean normal shifts and rotates most modern CPUs have that. If
you mean something else, please state what you want to do.

JosephKK · Aug 23, 2008

|>
|> > I still think that dataflow deserves a fresh look, now we have got
|> > away from the constraints of the 1980s.
|>
|> I agree.
|>
|> ...
|>
|> And compiling to dataflow is not really that difficult. The SSA form
|> used in many modern compilers is not far from a dataflow model, with
|> the phi nodes acting as merge nodes, and it should be no major effort
|> to convert SSA into true dataflow (or generate dataflow directly
|> instead).

Yes. The area which interests me, and which is so far unsolved, is
how to design a dataflow language that is suitable for the majority
of applications currently programmed in Von Neumann ones. The
ridiculous thing is that a lot of application requirements fit very
naturally into a dataflow paradigm (e.g. GUIs) - the problem is
almost entirely in the programming of their components.

Aside: does anyone know why the "Harvard" approach was promoted from
being a trivial but important variation of Von Neumann to being of
equal rank, starting about 20 years ago? Because it assuredly ain't
so, despite the nonsense in Wikipedia, and almost all programming
languages have used separate code and data "address spaces" since
the invention of COBOL and FORTRAN, and were/are always talked about
as using the Von Neumann model (as they do).

Regards,
Nick Maclaren.

Well, gosh, the idea came from Harvard, it must be right.

Nick Maclaren · Aug 23, 2008

|> On Wed, 20 Aug 2008 10:09:58 +0200, Morten Reistad <[email protected]>
|> wrote:
|>
|> >There are a number of algorithms I know that would have great
|> >benefit from some dedicated bit-shifting in hardware. As long as
|> >the FPGA is not too far removed from the CPU, and of acceptable
|> >speed. It usually involves scanning a few hundred to a few tens
|> >of thousand bytes and generating data.
|>
|> Not fully sure what you mean with dedicated bit shifting in hardware.
|> If you mean normal shifts and rotates most modern CPUs have that. If
|> you mean something else, please state what you want to do.

Try writing code to unpick an IEEE 754 floating-point number; once
you have done that, try doing that with an IEEE 754R decimal one

With the current ISAs, doing that sort of bit-munging in software
can be a hundred times as expensive as it could be done in a very
small amount of hardware.

Some of the cryptographic algorithms are similar. Inverting bits
(as used in FFTs) is, too, but I don't know any algorithms where
that is a major bottleneck.

Regards,
Nick Maclaren.

MooseFET · Aug 23, 2008

On Aug 20, 12:35 am, [email protected] (Nick Maclaren) wrote:
[....]
Personally, I would like to see FPGAs become cheap enough for the
ordinary hobbyist to use for large projects. We might see some
progress with getting away from the domination of the current subset
of the Von Neumann model. I still think that dataflow deserves a
fresh look, now we have got away from the constraints of the 1980s.
I think that the biggest problem keeping the FPGAs from being used by
small companies and hobbiests is the problem of tools. If there was a
compiler for FPGAs that was like "gcc", they would be a lot more
useful.

Click to expand...

Click to expand...

There are a number of algorithms I know that would have great
benefit from some dedicated bit-shifting in hardware. As long as
the FPGA is not too far removed from the CPU, and of acceptable
speed. It usually involves scanning a few hundred to a few tens
of thousand bytes and generating data.

Click to expand...

Not fully sure what you mean with dedicated bit shifting in hardware.
If you mean normal shifts and rotates most modern CPUs have that. If
you mean something else, please state what you want to do.

I can think of several cases where having a custom bit shifting
instructions would be very handy.

The simplest case is when you have to swap the byte order of multi-
byte numbers. Many processors can exchange two bytes but I don't know
of any that can reverse the order of a 4 byte or 8 byte field quickly.

Compressing and decompressing often need numbers to be combined or
split along non-byte boundaries. The old TI speech chip used a
conpressed data stream that was not byte oriented.

Moore's Lobby Podcast

Menu

Categories

Platforms

Content

Connect With Us

Network

Re: Intel details future Larrabee graphics chip

Re: Intel details future Larrabee graphics chip

Wilco Dijkstra

Nick Maclaren

Wilco Dijkstra

Nick Maclaren

[email protected]

Nick Maclaren

[email protected]

Nick Maclaren

[email protected]

Nick Maclaren

MooseFET

Torben Ægidius Mogensen

Nick Maclaren

Nick Maclaren

Michel Hack

AnimalMagic

JosephKK

JosephKK

Nick Maclaren

MooseFET

Similar threads