Intel Opens Up About Larrabee

Rob Williams

Editor-in-Chief
Staff member
Moderator
Intel's Larrabee architecture has been on the mind of many enthusiasts for the past few months, but sadly, Intel hasn't released any specific performance data today. What they have revealed are the base mechanics of the architecture and other tasty tidbits to whet our appetites.

Intel today takes a portion of the veil off their upcoming Larrabee architecture, so we can better understand its implementation, how it differs from a typical GPU, why it benefits from taking the 'many cores' route, its performance scaling and of course, what else it has in store.

You can read the full article here!
 

Kougar

Techgage Staff
Staff member
Just wow. IF the hardware is even close to delivering, then this is going to be the future. No doubt about it, fully programable super-computing cards.

GPUs, got people's attention with their supercomputing ability for specialized tasks. NVIDIA's CUDA especially, after it went from languishing as a marketing slide and demo video for longer than a year before almost overnight it turned into a major deal. Universities everywhere have built single computers with multiple GPUs and now receive better performance than supercomputing clusters they utilized previously. At much lower prices. Folding@home has literally exploded, NVIDIA and EVGA GPU-only teams appeared out of the blue and climbed through the ranks of folding teams that had been folding since F@H's inception 8 years ago, and single-handedly upset the rankings. It made the PS3 look mediocre, despite teams of PS3 folders doing the same just prior to CUDA launching with F@H support.

Long paragraph short, GPU's got people to realize CPUs are no longer the way to go for raw performance. With a fully programmable "GPU" set to debut, there will not be a single hurdle left for people to begin porting applications over. Fully programmable in one of the better known coding languages, this is the start of really big things. :)

Rob, because you mentioned SIGGRAPH in your article, I found this tidbit hidden in the Larrabee wiki. They give sources for it as well.

On August 12, 2008, Intel will present a paper describing Larrabee at SIGGRAPH. [17] The paper is said to contain a comparison of performance between Larrabee and Core 2 Duo, which reveals that the single-threaded performance of one of Larrabee's cores is roughly half that of a "Core 2" core, while the overall performance per watt of a Larrabee chip is 20× better than a Core 2 Duo chip.[18]


Now, some of my thoughts. A 1Gbit bus is rather insane... no wonder they are having difficulties with the PCB. Just look at what adding a 3rd memory controller to Nehalem did to X58 motherboards, manufacturers have to add two extra PCB layers if they wish to make use of the third controller with six banks of DDR3 memory.

Fuzillia (Not the most reliable, I know) has this article out about how Larrabee is currently a 12-layer 300w card. This seems reasonable considering Larabee isn't going to launch for another year however, emphasis on cutting it down to production size comes last in the engineering process.

But one only needs to look at ATI's first 512bit ring-bus memory GPU, R600, to get an idea of what I am thinking about here. If that was not scary enough, look at NVIDIA's first 512bit memory bus GPU, the GTX 260. Larrabee is going to be a huge "multi-core" die regardless of 45nm or 32nm process size when it launches, and the card complexity is still going to dwarf GTX 260.

The performance may certainly be worth it. I can't wait. :)
 
Last edited:

Kougar

Techgage Staff
Staff member
I can't resist a double-post. I apologize if this is considered hijacking, since technically this section is for discussing TechGage articles... :D

After reading TechGage's article I went over to Anandtech since I particularly enjoy Anand Lal Shrimpi's hardware & design dissection articles.

From above I was talking about having a 1024bit ring bus is going to mean incredibly huge amounts of die size by itself. So I was already wondering if Larrabee would be making the GT200 core look small. Partly into his article I got to thinking about the actual # of cores Larrabee would likely launch with due to this reason...

Ironically (and partly why I enjoy Anand's articles so much) he explores this idea on page six. Surprisingly, it is more cores than I was expecting, but as he isn't factoring in other things like the 1024bit ring bus that y'all mention, I think 64cores is likely going to be the absolute max, and could very well be less. The again with Intel making cores the size of Itanium (~596mm^2 90nm), which is significantly larger than GT200 it's possible. Larrabee is going to be a beast. :)
 
U

Unregistered

Guest
Larrabee - DUD

I am wondering why Intel is trying to extend their x86 architecture into the GPGPU arena. It has too many unnecessary components not needed for a GPGPU.

They already have a key component to build a better GPGPU than Nvidia or AMD.

One word, Itanium.

It might be worth looking into.


It may need some reworking, but Intel is just sitting on it and just messing around trying to extend x86 beyond it's reach.


Rictor
 
U

Unregistered

Guest
Larrabee - DUD - Retraction

I am wondering why Intel is trying to extend their x86 architecture into the GPGPU arena. It has too many unnecessary components not needed for a GPGPU.

They already have a key component to build a better GPGPU than Nvidia or AMD.

One word, Itanium.

It might be worth looking into.


It may need some reworking, but Intel is just sitting on it and just messing around trying to extend x86 beyond it's reach.


Rictor




I think I see how x86 can be extended into the GPGPU (General Purpose Graphics Processing Unit) arena by combining VLIW (Very Long Instruction Word) and SIMD (Single Instruction Multiple Data).

Note: Might want to come up with a better name than GPGPU, it's a bit of a misnomer now. Maybe General Purpose Media Processing Unit (GPMPU)

I don't think I was totally off base, but Intel may want to create an entirely seperate processing unit like the FPU from the days of the 386. It would be composed primarily of multiple SIMD units.

I'm unclear if Intel should offload all SIMD units from the CPU... come to think of it... I'm wondering... I don't know... maybe they could look into combining the SIMD units on multi-core CPUs into a single unit first, starting with the dual cores and moving up.

Why is this important? Multiple reasons. First being, to test to see if it can improve SIMD performance without incurring to much complexity. To see if coordinating SIMD utilization (combining SIMD units) is possible and increase in performance is sufficient to justify the number of changes in design... might try software to coordinate the SIMD units first to check to see what if any design changes are required... observing the software in operation would lead to changes required in processor design to optimize SIMD utilization. It would be the first stage towards eventually moving SIMD functions off onto a dedicated chip for media processing with the CPU issuing VLIW to the GPMPU to maximize performance. I could maybe come with more reasons for doing it this way, but I would end up rambling on and on and on... Ah, hyperthreading, but that can come later... then move up to ultrathreading (more than 2 threads per processing unit)... hyperthreading, ultrathreading it's just register renaming mostly, though... maybe I should talk to AMD first... ah-ha... oh no... I had another idea, but it escaped me for the moment.


I once told a visiting Intel Engineer, Merced was a dud too. I just didn't know you could use VLIW to issue multiple SIMD to the GPMPU, at the time. <SIGH>

I feel like such a dud, I guess I'll go by that name now.

Dud, formerly Rictor of Gothic Terror
 

Kougar

Techgage Staff
Staff member
To chime in, Intel's goal with Larrabee has been to attempt to offer GPU-level computational performance while allowing users to code in C++, or other x86 languages that you cannot on a modern GPU.

I'll just borrow a quote from Tim Sweeny, the guy behind the Unreal Engine:

Tim Sweeny said:
"I see the instruction set and mixedscalar/vector programming model of Larrabee as the ultimate computingmodel, delivering GPU-class numeric computing performance and CPU-classprogrammability with an easy-to-use programming model that willultimately crush fixed-function graphics pipelines. The model will berevolutionary whether it's sold as a Express add-in card, an integratedgraphics solution, or part of the CPU die.

To focus on Teraflops misses a larger point about programmability:Today's GPU programming models are too limited to support large-scalesoftware, such as a complete physics engine, or a next-generationgraphics pipeline implemented in software. No quantity of Teraflopscan compensate for a lack of support for dynamic dispatch, a full C++programming model, a coherent memory space, etc."

Source
 
U

Unregistered

Guest
Larabee

To chime in, Intel's goal with Larrabee has been to attempt to offer GPU-level computational performance while allowing users to code in C++, or other x86 languages that you cannot on a modern GPU.

I'll just borrow a quote from Tim Sweeny, the guy behind the Unreal Engine:



Source


Tim Sweeny is just saying Larabee looks like a good concept (on paper), actually working out the details into a workable system design is much more difficult, especially if you are taking the wrong approach. Intel canceling their first attempt to take Larabee from concept to product shows their are still working out the details, even though the concept looks solid.

Sadly, they already have most of the technology and patents to move ahead of the pack, but it will require as one of their people once put it, "a paradigm shift".

Nvidia does offer GPGPU programmability using C (or C++) for their products through their proprietry CUDA language and will probably support more open GPGPU language like OpenCL and Microsoft's DirectCL. Nvidia purchasing of PhysX precipitated the whole move towards general purpose GPU computing and is their key advantage in pioneering GPGPU. Intel may want to look at that model, the physics engine as an add on to the computer system.

The possibility I see is Intel trimming down the CPU to a minimum of 2 core with hyperthreading or ultrathreading (>2 thread per core), memory and I/O hub and huge amounts of cache (trace, L1, L2 and maybe even more) while the "physics engine" might comprise of various concepts, design and technology used in SIMD and Merced.

Simplest model would be a dual-core CPU (maybe with hyperthreading) issuing VLIW to Itanium like architecture comprised of primarily SIMD units. But I'm sure the engineers at Intel will be able to come up with something better.

The CPU would handle the scheduling (I/O, thread-ordering, memory access, etc.) and management while the Itanium-like physics engine is freed to do the number-crunching.

The concept harkens back to the days of having a seperate FPU and CPU.

The reason for trimming down the CPU is there may not be a whole lot of gain from having more than 2 cores especially if they can do hyperthreading or ultrathreading over quad or 6 core, but the option to add additional cores is there. My experience leads me to believe that 2 cores is more than sufficient, if you can offload some of the computing to the GPGPU (or as I like to refer to as General Purpose Media Processing Unit or GPMPU or phyics engine is another term you could use). Most processors are working out of their cache rather than main memory and going with a larger cache might prove more advantageous rather than more cores. Lower power, fewer cache misses, large cache to feed the physics engine. Of course it's just one approach and Intel just needs to take a look at what they have towards building a physics engine.

One of the problems in designing systems is that when some key technologies reach a certain level of maturity, the design of the whole system needs to evolve to move forward. A paradigm shift is required. Whole avenues of design improvements open up but people often get locked into one way of looking at something or doing something and they're unable to see the whole world of possibilities that's piled up around them.


l1h4x0r
 
U

Unregistered

Guest
Larabee

To chime in, Intel's goal with Larrabee has been to attempt to offer GPU-level computational performance while allowing users to code in C++, or other x86 languages that you cannot on a modern GPU.

I'll just borrow a quote from Tim Sweeny, the guy behind the Unreal Engine:



Source


Tim Sweeny is just saying Larabee looks like a good concept (on paper), actually working out the details into a workable system design is much more difficult, especially if you are taking the wrong approach. Intel canceling their first attempt to take Larabee from concept to product shows their are still working out the details, even though the concept looks solid.

Sadly, they already have most of the technology and patents to move ahead of the pack, but it will require as one of their people once put it, "a paradigm shift".

Nvidia does offer GPGPU programmability using C (or C++) for their products through their proprietry CUDA language and will probably support more open GPGPU language like OpenCL and Microsoft's DirectCL. Nvidia purchasing of PhysX precipitated the whole move towards general purpose GPU computing and is their key advantage in pioneering GPGPU. Intel may want to look at that model, the physics engine as an add on to the computer system.

The possibility I see is Intel trimming down the CPU to a minimum of 2 core with hyperthreading or ultrathreading (>2 thread per core), memory and I/O hub and huge amounts of cache (trace, L1, L2 and maybe even more) while the "physics engine" might comprise of various concepts, design and technology used in SIMD and Merced.

Simplest model would be a dual-core CPU (maybe with hyperthreading) issuing VLIW to Itanium like architecture comprised of primarily SIMD units. But I'm sure the engineers at Intel will be able to come up with something better.

The CPU would handle the scheduling (I/O, thread-ordering, memory access, etc.) and management while the Itanium-like physics engine is freed to do the number-crunching.

The concept harkens back to the days of having a seperate FPU and CPU.

The reason for trimming down the CPU is there may not be a whole lot of gain from having more than 2 cores especially if they can do hyperthreading or ultrathreading over quad or 6 core, but the option to add additional cores is there. My experience leads me to believe that 2 cores is more than sufficient, if you can offload some of the computing to the GPGPU (or as I like to refer to as General Purpose Media Processing Unit or GPMPU or phyics engine is another term you could use). Most processors are working out of their cache rather than main memory and going with a larger cache might prove more advantageous rather than more cores. Lower power, fewer cache misses, large cache to feed the physics engine. Of course it's just one approach and Intel just needs to take a look at what they have towards building a physics engine.

One of the problems in designing systems is that when some key technologies reach a certain level of maturity, the design of the whole system needs to evolve to move forward. A paradigm shift is required. Whole avenues of design improvements open up but people often get locked into one way of looking at something or doing something and they're unable to see the whole world of possibilities that's piled up around them.


Dud
 
U

Unregistered

Guest
The Intel Approach or Larabee Redux

I think Intel is taking the wrong approach, trying to build a design that competes directly with Nvidia or AMD's GPGPU. Nvidia GPUs are designed primarily as compute shaders moving towards a "PhysX (physics) engine" GPGPU.

Intel might want to take a different approach... they should be building from their position of strength, the CPU. I'm thinking they should take look at the SIMD unit in the CPU and move towards concept of "media processing unit" then maybe move it out of the CPU and push towards a "physics engine" design. Try to develop a complimentary design to the Nvidia's GPU (explained later). If they really want a leading edge GPGPU, they should just buy Nvidia outright. Of course, that could be good and bad. It would save them considerable amount of time and they would have a graphics processor to compete with AMD/ATI. But buying Nvidia they may end up missing an opportunity to revolutionize the CPU industry and run afoul of antitrust laws. GPU and CPU makes up about 90% of the computer and even if they did, they would end with antitrust fines without even trying. Just look at Microsoft.... and the recent Intel antitrust fine involving Dell... Dell favors Intel and for that Intel got fined. Dell didn't even bother with AMD until they finally got a solid design win with the Athlon. Intel just rewarded Dell for being a faithful customer and bam Intel gets hit with a fine.

Complimentary design to Nvidia's GPU:
In order for Nvidia to use PhysX, some of the GPU resources have to be allocated away from the shader operation... It's opportunity for Intel to push Nvidia back to their area of expertise, 3D graphics. And... this is important... use it as a starting point to build their own PhysX engine.

If you have multiple GPUs, Nvidia even makes the option available to dedicate one of the GPUs for doing PhysX instead of rendering video. That makes no sense to me, it defeats the whole purpose of having SLI.

Won't it be much better if the CPU (let's say a quad core) could be organized to utilize all four SIMD units to do PhysX instead and free up the GPU to do what it's suppose to do... shading/rendering. Maybe I don't understand SIMD all that well or the limitation in attempting this feat... but it might be worth a look, it can't hurt.

What if scenario... University of Antwerp already built a supercomputer using 13 Nvidia GPUs (SLI not required, SLI is only necessary for rendering 3D graphics to a display using SLI mode) for about 6000EUR and beats their previous supercomputer (cluster) soundly (about 4x according to the report).

Intel should be seriously worried...

WHAT IF... Nvidia starts selling a complete solution... replacing the Intel processor with their own processor, the Ion. First the supercomputing world then it would start to trickle into the home... The CPU doesn't need to be super fast, just very good at managing/scheduling I/O...

The backup plan for Intel is... if Nvidia's lead becomes too big... design a better CPU for managing/scheduling I/O... Intel will have to do this anyway whenever they get around to building their own PhysX engine... which is why I think they will move the SIMD units out of the CPU... Intel has lot more experience with system design then anybody... PCI, chipset, cache, CPU, processing technology, etc... unfortunately, Nvidia has been branching out to chipset and CPU design too. Nvidia is a bigger threat than AMD/ATI combined... laugh...

OK, you can stop laughing... AMD/ATI, they can coordinate from both ends. That's the good part, the bad part, they're resources will be stretched thin and being able to accommodate the other isn't necessarily a good thing... an engineer might be able to tell you why.

Home supercomputer might be a strong selling point... to the point of making Intel's "conventional" processors irrelevant, however fast they may be.

Nvidia may have inadvertently wandered into an area where Intel could be excelling at... supercomputing...

I feel Nvidia lead is not insurmountable, though.

The plus side for Intel, they already have the pieces for building their own PhysX engine, a better PhysX engine. Better than Nvidia's, since a large chunk of the transistor real estate is used for graphics and Intel can build a pure PhysX engine, read dedicated more transistors to the PhysX engine. OK, now you know what I have in mind... Once Nvidia figures this out, if they haven't already... they may opt to build a pure PhysX engine.
The kicker, they might have the patents to PhysX engine... and the other shoe... they can't patent physics. So there you go, so don't build a PhysX engine... maybe multimedia processing engine or whatever you want to call it. Patents... feh.

Intel may want to take another look at Merced technology and combine it with SIMD. Hopefully, it's not a wild goose chase. Technically, I'm not an engineer... so if it does work, it's not my fault... =)

+-----------------------------------+
|--------------VLIW--------------|
|====================|
| SIMD SIMD SIMD SIMD |
|====================|
|====================|
|====================|
|====================|
+-----------------------------------+

What would you end up with... how about a world-class supercomputer for well under $3000 for the home. Well, technically... today's personal computers are probably faster than most of the supercomputers build less than a decade ago. Today's PC technology is outpacing today's supercomputers, and tomorrow's supercomputers will be personal home computer. It's been done... 'nuff said
 

Rob Williams

Editor-in-Chief
Staff member
Moderator
Unregistered said:
Note: Might want to come up with a better name than GPGPU, it's a bit of a misnomer now. Maybe General Purpose Media Processing Unit (GPMPU)

I'm not sure that's entirely appropriate, because there have been numerous non-media-related uses for GPGPU as well, such as computing complex algorithms, password cracking, and even earlier this week, we learned that Kaspersky was using GPGPU for faster virus detection and heuristics. Not all of these seem that exciting, but there's a reason some companies have begun building "super computers" with GPUs, and they're not doing it for media purposes.

Unregistered said:
I don't think I was totally off base, but Intel may want to create an entirely seperate processing unit like the FPU from the days of the 386. It would be composed primarily of multiple SIMD units.

That's one option, but that wasn't exactly the smartest design, either. I once was talking to an Intel engineer about that exact design, and he was embarrassed to even talk about it (I am uncertain if he was one of the engineers on that project). Things could be different today, thoguh.

Unregistered said:
It would be the first stage towards eventually moving SIMD functions off onto a dedicated chip for media processing with the CPU issuing VLIW to the GPMPU to maximize performance. I could maybe come with more reasons for doing it this way, but I would end up rambling on and on and on...

I admit I'm intrigued by the idea, but I'd be hard-pressed to believe Intel hasn't taken a look at such options. I'm no engineer, and you seem to be better versed on the entire subject than I am, but it seems to me that Intel was doomed from the start with an x86 direction. I can understand why they took the route, but I shudder to think of all the R&D dollars that were "wasted" during the entire Larrabee project, and what do we have for it now? Not much. Intel will supposedly release a software counterpart, but I don't see that having much use, either. What use could it have? What game developers are going to want to use some library built around a non-existent GPU architecture, from a gaming that's never been known to produce quality GPUs?

Unregistered said:
Nvidia does offer GPGPU programmability using C (or C++) for their products through their proprietry CUDA language and will probably support more open GPGPU language like OpenCL and Microsoft's DirectCL.

The problem is that NVIDIA's GPUs aren't native x86, while Larrabee would be. As a native x86 card, developers would be able to code in the manner they're used to, while getting the best performance possible. NVIDIA offers support for C/C++ and others, but not all of them have the same level of end performance that I'm aware of. If you take a look at the latest version of SANDRA (2010) and benchmark using the GPGPU tests, you can test using CUDA, OpenCL, Stream, DirectComputer, et cetera, and you'll see that all of these perform entirely different. For both AMD and NVIDIA, their respective Stream and CUDA perform the best.

Intel doesn't have a "Stream" or a "CUDA", but they do have IA, which developers are already familiar with, and as such, Larrabee would be able to offer great performance for OpenCL and perhaps others with perfect C/C++ support, because it's not on a different architecture. I'm also not too sure that Intel would be building Larrabee with a lot of what makes an x86 a desktop chip... I'd expect a lot to be tweaked and altered in order to fit as many of these cores into a single chip for graphics use.

Unregistered said:
The possibility I see is Intel trimming down the CPU to a minimum of 2 core with hyperthreading or ultrathreading (>2 thread per core), memory and I/O hub and huge amounts of cache (trace, L1, L2 and maybe even more) while the "physics engine" might comprise of various concepts, design and technology used in SIMD and Merced.

Are you saying that ultra-threading would be ideal for game use? I admit it's not something I thought about before, and I'm not quite sold on the fact that it would improve gaming. I still believe we need a lot of fast cores, not just a handful with HT/UT. The sad thing is, that because Intel's x86 architecture is not designed for gaming, they have a serious roadblock because the inefficiency is just insane, and I believe that's what killed the project for now. Seeing the Larrabee demo at this past IDF was depressing... we all hoped to see something a lot cooler by that point in time.

Unregistered said:
One of the problems in designing systems is that when some key technologies reach a certain level of maturity, the design of the whole system needs to evolve to move forward. A paradigm shift is required.

One side of my brain tells me that Intel should have started completely fresh from the beginning, foregoing an x86 design, but the other side says that what it was doing was cool, because they had compatibility in mind... a GPU that could offer excellent gaming performance and exceptional computational performance.

Unregistered said:
WHAT IF... Nvidia starts selling a complete solution... replacing the Intel processor with their own processor, the Ion. First the supercomputing world then it would start to trickle into the home... The CPU doesn't need to be super fast, just very good at managing/scheduling I/O...

If this happens, it's going to be extremely interesting, because it will be a major shift from what we're used to. I can't see that happening for a while, because GPGPU hasn't evolved to such a point yet. Plus, like how Intel has little experience with GPUs (compared to the others), it's unlikely that NVIDIA could produce a CPU that would be competent for even modest use for a little while. I could be wrong, and if I am, it's probably because NVIDIA has been working on something for a while (but no one outside of NVIDIA would be aware of it, or at least few).

Unregistered said:
What would you end up with... how about a world-class supercomputer for well under $3000 for the home. Well, technically... today's personal computers are probably faster than most of the supercomputers build less than a decade ago. Today's PC technology is outpacing today's supercomputers, and tomorrow's supercomputers will be personal home computer. It's been done... 'nuff said

I'd love to see this happen, but I think it's going to take a while if it does. I have a feeling that the next couple of years are going to be extremely interesting where processors in general are concerned.
 
Top