Intel's "Skulltrail 2" to Feature 16 Cores?

Rob Williams

Editor-in-Chief
Staff member
Moderator
From our front-page news:
It looks as though Intel's next dual-socket platform might not be a straight-forward dual desktop CPU setup like the original Skulltrail. Bright Side of News is reporting that "Skulltrail 2" will instead be comprised of Octal-Core (8) processors based on Nehalem-EX. That's right... 16 cores, and 32 threads.

When I questioned Intel regarding the validity of this claim, I received back an answer that neither denied or confirmed it, so I'd be willing to believe that Intel is indeed considering moving to a 16 Core Skulltrail 2 rather than an 8 Core version. For what reason is unknown, but chances are it's simply because a) it's not going to be in high-demand, and b) that is a lot of power to brag about. But, it's important to note that even if they are considering that particular move, it doesn't mean it will happen.

The question of course arises... "who could touch all that power?" and in truth, I don't think the answer is a simple one. After all, even with our original Skulltrail article, we had a difficult time finding ways to properly push all 8 Cores / 8 Threads. Just imagine how challenging 32 threads would be! If the original Skulltrail was the "ultimate" multi-tasker's PC, I think we're going to need to invent a new word to describe Skulltrail 2.

Here's Intel's official stance: "We have not announced any plans to bring a new ‘Skulltrail’ board to market. We are always researching and looking at new technologies for various segments, so we are not saying we would never come out with this board. But, at this time we have no public plans to do so."

intel_skulltrail_2_060909.jpg

Many OS and apps code paths would now nicely fit in that huge cache, but high speed memory would still be useful for streaming and HPC apps. Every Nehalem-EX Beckton processor has a quad-memory controller, e.g. 256-bit interface. With DDR3-1333, you will get 85.3 GB/s. But with DDR3-1600 you would get 102.4GB/s e.g. CPUs would have more than 100GB/s of system bandwidth for the first time in history!


Source: Bright Side of News
 

Kougar

Techgage Staff
Staff member
People said the same thing of an 8 core, 16 thread Skulltrail. :)

I can use all 16 cores and 32 threads... but I think I might run out of RAM long before then! Right now I am running:

Folding@home GPU client
Folding@home SMP 4-thread client
Folding@home Linux SMP client (inside a Ubuntu 64bit virtualization inside VMware.) VMware caps it at 2 threads.

So far I'm only using 7 of the 8 threads, Task Manager says I have about 15% of my CPU free. Hmm, maybe a second VM Machine with another copy of the Linux SMP client... :D
 

madstork91

The One, The Only...
The ePeen is strong with this one...

Speaking of, is there an ePoon? And if not, having a large ePeen just seems even more ridiculous.
 

Merlin

The Tech Wizard
If they would have a dual sided board, that would cut down the size and heat.
Then they can add more cores, of course it would take a different case.
But really, how far can you push it tiil the limit is met?
 

Kougar

Techgage Staff
Staff member
If they would have a dual sided board, that would cut down the size and heat.
Then they can add more cores, of course it would take a different case.
But really, how far can you push it tiil the limit is met?

I think this is the limit. This kind of system will require two seperate banks of memory. I presume regular DDR3 will work as I had heard the FB-DIMM chip was built directly into the motherboard (it makes sense, instead of one controller per RAM module, you'd just need one FB-DIMM controller per memory bank).

Either way, if a user wanted 6GB of memory they would have to buy 12GB. But this is a 16 core processor... 6GB will be in no way close to enough RAM. So 12GB might be enough, but that means the user must buy 24GB of RAM. DDR3 prices have crashed compared to a year ago, but 24GB would put a huge dent in anyone's wallet.
 

Rob Williams

Editor-in-Chief
Staff member
Moderator
Kougar, you are really passionate about Folding! That's a lot going on at one time. Folding is an idea I considered with a machine like this, but it's a good one. But, Folding is one scenario where people are purposely pushing their machines to the brink... I'm wondering about a non-Folding scenario. Is it remotely possible for a workstation application to take advantage of more than 8 threads?

I've not had the opportunity to use more than 8 threads on a machine before, so I'm completely lost as to what in the world could possibly touch that. Back when we did the Skulltrail article, I had trouble even using all 8 threads that were available. I wonder if that was more of a RAM issue though? Are you supposed to have a certain number of channels available per sets of threads or something?

Oh, and where Folding is concerned, I have a feeling given what Skulltrail 2 or even a "custom" 8 core Xeon machine would cost... it'd make much more sense to set up a GPU Folding rig. Seems far, far more efficient.
 

Kougar

Techgage Staff
Staff member
Kougar, you are really passionate about Folding! That's a lot going on at one time. Folding is an idea I considered with a machine like this, but it's a good one.

When a single PC has the ability to compare to several servers running Folding@home from just a few years ago, it's hard NOT to put that computational power to good use. :) I don't build a high-end system just to leave it idle or off, I want it to be doing something and living up to it's capabilites... in the very least getting full use out of it.

sigimage.php


This sort of score was unfathomable on a Pentium 4 machine, it'd take almost a datacenter's worth of P4's to eclipse that. Yet most of that score is from this single PC, although I do I still have my Q6600 and an underclocked 8800GTS 320mb folding.

Honestly, I'm giddy just imagining what NVIDIA's next generation of GPU's will be capable of outputting because compared to today's GTX 285 it's much more tailored for compute processing thanks to it's completely new MIMD design. Then they rolled that into a larger brute of a GPU core... GT200 still outfolds more than double that of a Core i7 running two Linux SMP clients today, by some accounts well over 3x if you use the higher clocked GTX 285's. And I'm sure a single GTX 295 could outfold a Skulltrail 2 ;)

But, Folding is one scenario where people are purposely pushing their machines to the brink... I'm wondering about a non-Folding scenario. Is it remotely possible for a workstation application to take advantage of more than 8 threads?

Yeah, there aren't many consumer level programs that would except for rendering, encoding, and content creation software. Still there are programs that simulate or run loads that match actual workloads, and there are some complex particle simulation programs such as fluid dynamics, aerospace engineering, etc. Tech Report didn't use much for their Istanbul testing but it's a good start and I've played with some of that benchmark software.

I wonder if that was more of a RAM issue though? Are you supposed to have a certain number of channels available per sets of threads or something?

Beyond what you already know about keeping the number of memory modules evenly divisble by the number of channels, not really. When talking about dual-socket motherboards, both AMD and Intel motherboards will now require identical RAM configurations in each memory bank. So if you have 3 2GB modules in one bank, you will need 3 2GB modules in the second bank, that's just how the NUMA design works. But I'm sure ya know that already, beyond that there isn't any strict hardware rule.

There is an IT rule of thumb for how much memory to use to balance the system, but I don't remember it and don't think it matters to much here. Otherwise, for testing purposes you would be more concerned about keeping each running process (CPU affinity) locked to a specific socket. Otherwise Windows in all it's wisdom will still sometimes move the process over to the other CPU. Not only does this incur L2/L3 cache penalties, but the data is still located in the memory bank attached to the other processor, incuring a further performance penalty during accesses. If you read Tech Report's review, they underscore this with their Euler3D benchmark.

Oh, and where Folding is concerned, I have a feeling given what Skulltrail 2 or even a "custom" 8 core Xeon machine would cost... it'd make much more sense to set up a GPU Folding rig. Seems far, far more efficient.

Taking the FahMon numbers (these are generally overshooting the actual PPD) and estimating for the linux SMP, this Core i7 is hitting about 6,000 PPD as a very rough guess towards the high side. My GTX 260 can hit that high by itself at stock. So definitely yes, GPU's are the way to go as far as folding is concerned, a single GTX 285 can supposedly top out around the 10,000 figure.

The only problem with these figures is that they are highly subjective. It depends on the work unit in question, the core used to compute the work unit, how/what else is running on the GPU/CPU, and just about anything else from GPU driver version to memory optimizations for the CPU clients. If I shut down all CPU clients the GPU would fold a little faster, and if I shut down the Linux CPU client the Windows SMP client would gain at minimum 500 higher PPD production.... but I still attain a higher PPD with everything running in total. These numbers also change as Stanford makes design changers and optimizations to the GPU's themselves, such as the code actually utilizing additional shader segments that GT200 created. For awhile they would fold about as fast as a 9800GTX+ because the expanded parts of the core just weren't being used. From my understanding that was a huge issue on ATI 4000 series cards that they fixed recently too.
 
Last edited:

Rob Williams

Editor-in-Chief
Staff member
Moderator
Kougar said:
When a single PC has the ability to compare to several servers running Folding@home from just a few years ago, it's hard NOT to put that computational power to good use.

As a non-Folder, I really can't voice a real opinion, but I refuse to stress the CPU when it's so inefficient for this purpose. After I've seen just how much more crunching GPUs can do, the CPU seems entirely useless to me. Not worth increasing the heat in my room, nor the power bill, that's for sure. The same kind of goes for GPUs though... Folding in general increases the power bill a fair deal (I've done my own tests)... it'd be nice to see cards that can crunch their hearts out but not draw 150W+ constantly.

Kougar said:
Still there are programs that simulate or run loads that match actual workloads, and there are some complex particle simulation programs such as fluid dynamics, aerospace engineering, etc.

That's not a consumer-level test ;-) When I say workstation, I mean video encoding, rendering, et cetera... something that's realistic of a professional, but not one who's working with unlimited servers at their disposal.

Kougar said:
Otherwise Windows in all it's wisdom will still sometimes move the process over to the other CPU. Not only does this incur L2/L3 cache penalties, but the data is still located in the memory bank attached to the other processor, incuring a further performance penalty during accesses.

From what I understand (and from what Intel's told me), Windows 7 is a lot smarter when handling things like this, so I'd expect to see more efficiency on that OS.

Kougar said:
Taking the FahMon numbers (these are generally overshooting the actual PPD) and estimating for the linux SMP, this Core i7 is hitting about 6,000 PPD as a very rough guess towards the high side. My GTX 260 can hit that high by itself at stock. So definitely yes, GPU's are the way to go as far as folding is concerned, a single GTX 285 can supposedly top out around the 10,000 figure.

From what I recall, even my PS3 scored a bit more than 6,000 PPD...

Thanks for all the input man, very interesting stuff. I'm curious though, what caused you to become such a hardcore Folder?
 

Hawke

Obliviot
Intel will not make a "Skulltrail 2" but as I recall, Intel stated that there is nothing stopping other companies from doing so...

I myself am happy with my current Skulltrail rig and I think it will last a good 6 years atleast
 

2Tired2Tango

Tech Monkey
If they would have a dual sided board, that would cut down the size and heat.
Then they can add more cores, of course it would take a different case.
But really, how far can you push it tiil the limit is met?

SkyNet.... Coming soon to a desktop near you!

:)
 

Hawke

Obliviot
If they would have a dual sided board, that would cut down the size and heat.
Then they can add more cores, of course it would take a different case.
But really, how far can you push it tiil the limit is met?

I don't think it will cut down the heat at all, I think it may increase the overall temps on the board itself unless you move the moard from the side to the middle and that will provide problems with fitting of standard PC cards
 

Psi*

Tech Monkey
crunch power

with kind of processing power they will be able to figure out how to handle the "thermal solution"
 

madstork91

The One, The Only...
I don't think it will cut down the heat at all, I think it may increase the overall temps on the board itself unless you move the moard from the side to the middle and that will provide problems with fitting of standard PC cards
that's a simple answer, make the box bigger, or only move it out and make it being enough to fit a fan blowing directly on the one on the other side.

Or they could bring the PCI-E slots out and put them at an L so that they run parallel to the board? Which would also require a redesign of your standard box.

Another idea is a Non flat board... and vertical mount the processors. 0.o

What would really be interesting about it if they put a processor on both sides... How would they suspend the board in the box, and keep the board from bending or breaking when pushing a card or memory in... 0.o You cant mount the board easily from the middle mount if it is suspended even more than usual, though I suppose you could just use a really long mount?
 

Kougar

Techgage Staff
Staff member
Another idea is a Non flat board... and vertical mount the processors. 0.o

That's what the Mac Pro does, they stuck half the motherboard onto a daughter card that mounts horizontally into the chassis, so both Quadcore Xeons are oriented with the coolers facing the vertical.

Whomever designs a Skulltrail 2 board would probably stick to a 2D layout though, servers can still get away with it just fine.
 

Rob Williams

Editor-in-Chief
Staff member
Moderator
with kind of processing power they will be able to figure out how to handle the "thermal solution"

Hah, that made me LOL.

Whomever designs a Skulltrail 2 board would probably stick to a 2D layout though, servers can still get away with it just fine.

That's just it. Apple is allowed to do special things because they don't expect people to poke around inside the computer, but for enthusiasts, it'd be totally different. To be honest, I have absolutely no idea why companies like ASUS and Gigabyte aren't creating an overclocker's server board just for "fun". I'll touch base and see if I can get any sort of answer.
 
Top