Our Processor Testing Suite

Rob Williams · Jun 23, 2009

Hi all:

As you're probably all aware by now, we're in the midst of upgrading all of our various methodologies on the website, and now that our graphics card test suite is all sorted out, it's time to tackle the next one: processors.

For the most part, we've been using the exact same base for our CPU test suite for close to two years, with some minor alterations along the way. There are a few tests listed here I'm ready to drop, and a whole lot more I'd like to add, so as always, please don't hesitate to drop your two cents in... I really appreciate any comments/suggestions.

Our current test suite looks like this:

SYSmark 2007 Preview

Workstation
Autodesk 3ds Max 2009
Cinebench 10
POV-Ray 3.7

Multi-Media
Adobe Lightroom 2
TMPGEnc Xpress 4.7
ProShow Gold 3.2
Sandra 2009 Multi-Media

Mathematics
Sandra 2009 Arithmetic
Sandra 2009 Cryptography
Microsoft Excel 2007

System Specific
Sandra 2009 Memory
Sandra 2009 Multi-Core Efficiency

Gaming
Call of Duty: World at War
Half-Life 2: Episode Two
Crysis Warhead
3DMark Vantage

The first thing I want to drop is SYSmark 2007, because I find it rather useless. The results it delivers don't scale as they really should, so it's a bit misleading. I think it's obvious that a Q9650 is far more powerful than an E8600, but SYSmark doesn't tell us that. Plus, running the suite is a very patience-testing process. Not to mention, it requires a completely fresh install of Windows. That all on top of the fact that running it twice in a row, even with two iterations, could give differing results, and when we're dealing with results that range between 1 - 250-ish, any variance can screw up the true scaling.

That said, I have an idea for what we can replace SYSmark with, but I can't talk about it right now, and I'm not even sure we'll be going ahead with it. Once I know more and look at it from all angles, I'll let you guys know.

Other benchmarks that could go? For the most part, I hate to chuck some of these, because they all serve a purpose. So rather than do that, I think I'd prefer to just add more, to give an even more comprehensive overview of a CPU's performance. I would likely drop ProShow Gold, since it heavily favors Intel (due to SSE4 support). I'd love to introduce another video encoder tool, to pair up with our TMPGEnc Xpress results, but I'm not sure off-hand what should be used.

SPEC's CPU2006 is another I'm dying to get in there (thanks to Psi* for the idea).

Audio encoding is another option, since dBpoweramp can convert up to four songs at a time.

For gaming, I'd like to retain 3DMark Vantage, but other than that, I'm not quite sure. Intel has given me a list of multi-threaded games, and we happen to have most of them here, so I'm going to test each one and track the CPU core usage and see which should be used. This would be in addition to World in Conflict... since there seems to be a fair amount of demand for that one.

To keep this short, I'll stop here. Once again, please don't hesitate to offer up opinions. We slave over our content for you guys, so we want to make sure we're delivering the best possible reviews and articles out there.

Rob Williams · Jul 9, 2009

Quick update as posted on our front-page:

Guess what's tiring? If you said, "Doing a complete overhaul on all of Techgage's methodologies and test suites", you'd be absolutely correct! It feels like forever that I've been working on things around here, but we're inching ever closer to completion, and I can't wait. As mentioned before, our GPU scheme of things is all wrapped-up, and we're in the process now of benchmarking our entire gamut of cards, so you can expect a review or two within the next few weeks with our fresh results.

As it stands right now, we're eagerly working on revising our CPU test suite, and that's proving to be a lot more complicated than originally anticipated. There's obviously more to consider where CPUs are concerned, so we're testing out various scenarios and applications to get an idea of what makes most sense to include. The slowest part is proving in getting prompt responses from all the companies we're contacting, but that's the nature of the business!

If there's one thing you can expect to see in our upcoming CPU suite iteration, it's a beefed-up number of tests. I don't believe we're lacking in that regard as it stands, and I certainly don't want to go the opposite direction where we have too many results, but I do want to make sure we offer up the most robust set of results out there, to cater to not only the regular consumer or enthusiast, but also the professional. So in addition to tests such as 3ds Max 2010, we're also going to add in Maya 2009. There are other professional and workstation apps I'm considering, but I won't talk about those just yet.

Given that Linux is more popular than ever (and a decent number of our visitors are using the OS), I'd like to also introduce the OS again into our CPU content, and also possibly our motherboard content (not for performance, but rather compatibility). More on that later though, as we're still looking to see which benchmarks would make the most sense with that OS (you can be sure one test would be application compilation).

Oh, and that machine in the photo above? That's our revised GPU testing machine. Because the six titles we use for our testing total 55GB, we've opted to stick to a speedy Seagate Barracuda 500GB hard drive, while for memory, we're using Corsair's 3x2GB DDR3-1333 7-7-7 kit. Huge thanks to Gigabyte for supplying us with a brand-new motherboard for the cause, their EX58-EXTREME, and also Corsair for their HX1000W power supply. Other components include Intel's Core i7-975 Extreme Edition and Thermalright's Ultra-120 CPU cooler. And yes, that's a house fan in the background, and no, it's not part of our active cooling!

evilives34 · Jul 9, 2009

thanks for the info rob, i really love to see Linux getting some limelight. i dont use linux much for day to day stuff but i always try to keep a copy installed

Doomsday · Jul 9, 2009

Sweeet!!

Which case is it?!

Rob Williams · Jul 13, 2009

evilives34 said:
i really love to see Linux getting some limelight.

It's a growing OS, to say the least, so it definitely should get some attention. The problem, though, is I have no sweet clue what to even benchmark there. On Windows, there are many common applications, like 3ds Max, Adobe Lightroom, et cetera, so if I use those, I know people are going to have heard of them. But on Linux, each distro bundles different apps, and most that I can think would be worthy of benchmarking aren't open-source, and aren't that popular (Google Picasa is one application that comes to mind).

One Linux test I'll include is application compiling, since that's a scenario many users will find themselves in at some point. Plus, the Linux kernel can be optimized for both AMD and Intel, as well as the compiler itself, so it would be a good test to use. I also found out the other night that CPU performance does scale rather well, although I haven't tested Dual-Cores yet. Another huge bonus, though, is that the way we have the machine set up (it's a heavily-optimized Gentoo install), our tests results don't sway much from run to run.

For fun, I decided to compile all of the GNOME desktop environment, which consists of 376 separate tools and applications. Now, you'd imagine a test like this would sway in the final result some. I mean, some tests we do take 4 minutes total, and the end result could vary by about 5s and we wouldn't think of it as a big deal. But look at how stable this massive test is:

Core i7-965
3.2GHz First Run: 106 minutes, 14.341 seconds
3.2GHz Second Run: 106 minutes, 15.487 seconds
3.6GHz First Run: 96 minutes, 15.064 seconds
3.6GHz Second Run: 96 minutes, 13.741 seconds
3.6GHz (Overclocked RAM) First Run: 95 minutes, 52.440 seconds
3.6GHz (Overclocked RAM) Second Run: 95 minutes, 50.550 seconds

Despite the test taking over 100 minutes at the stock CPU setting, the end result was within 1s between each run. That's quite impressive. This is of course a ridiculously-long benchmark though, so I might consider another application to use (I'll see how long OpenOffice takes), but I'll also include a modest application as well, such as Wine. That also proved quite stable from run to run (always =>0.5s variance), so it seems like a good one to include.

If anyone has other Linux benchmarking ideas, please let me have em. I'm still of course looking for as much input regarding the rest of our Windows benchmarking as well though. I hope to update this thread very soon with some proposed changes myself...

Doomsday said:
Which case is it?!

It's SilverStone's TJ10.

Rob Williams · Jul 17, 2009

Just to give a quick update on this... I'm still chugging away, and I'm <em>really</em> hoping to be finished wrapping up the CPU test suite within the next two weeks. Thanks to some folks at the Gentoo forums, I think I've pretty much found the perfect collection of real-world Linux benchmarks to use, so I'm set there. Just need to test it and test it again to make sure they're completely worthy of inclusion.

Oh, and CPU2006... unbeleivably difficult to figure out. At least on our particular configuration.

Rob Williams · Jul 19, 2009

I've been goofing around with the Linux install on what will be our Core i7 testbed, and I found something quite interesting. Up until now, it's been rare to see any scenario ever top-out a processor, but apparently compiling applications under Linux can. In the shot I attached here, you can see that at one point, it actually kept consistently at almost 100% (we're dealing with eight threads here!).

Looks like a good benchmark to me!

Kougar · Jul 20, 2009

The LinX program does the same thing in Windows... but yes, it is interesting to see the Windows Performance Monitor completely flatlined at 100% across all eight displays. And compiling applications definitely has more use than a simple Linpack GFLOP test, although it was interesting that LinX's GFLOP reading will change by around 10 GFLOPs depending on the memory system configuration on a Core i7 system.

Even running 4 Ubuntu 64bit virtual machines each fully loaded, plus the F@H GPU client CPU usage is around 96-98% depending on browser, background apps, and my music player. Once it's pegged at 100% the machine definitely loses that responsive feel and commands start to get randomly delayed,

Psi* · Jul 21, 2009

What does it all mean???

Geez ... where has the time gone? Time must be OC-ed.

Pegged CPUs are a wondrous thing. But, like with my number crunchers, since I did not write the code I don't know what it means. One of my programs always pegs all of the cores/processors. Another will use 75% to 85% at best ... each are multi-threaded with bells & whistles, yadda yadda yadda.

But does it mean that the program that gets 100% CPU utilization is better written, has less overhead due to algorithm, or what. FWIW It is written in FORTRAN.

The other program is all re-written for maximum Windows complacency and written in C++. Or, is there something else that is the weak channel? These programs can sit in the machine for hours/days crunching away & seldom writing anything to the HDD.

Back to, "what does it all mean???" What do the CPU utilization programs show?

Rob Williams · Jul 22, 2009

Psi* said:
But does it mean that the program that gets 100% CPU utilization is better written, has less overhead due to algorithm, or what.

See, that's what I'd love to know as well. Although I've dabbled in coding before, I'm in no way experienced in understanding the mechanics, although I wish I were. It's something I'll have to talk to a few companies or developers about, because they'd know. In fact, I should be meeting with Intel sometime next month, so I'll talk to their performance gurus then and see what they say.

I've wondered it for a while, though. Because I've done some tests where I thought the CPU would top out, but didn't. I'm talking video encoding, 3D rendering, et cetera. On our eight core Skulltrail machine, I've never surpassed 6/8 cores for a render. Seeing that my Linux compile managed to hit all eight threads here though, I assume that it's insanely multi-threaded in all the right ways.

From what I've seen in the past, algorithms that are heavily math-based have been great for multi-threading, but that's as much as I know. I've even seen Microsoft Excel almost top out a multi-core processor, so it's odd to see a video encoder top out at around 80%.

This is something I'll have to look into a lot more, because it'd make choosing benchmarks a little bit easier I'd imagine ;-)

Psi* · Jul 23, 2009

perfmon

I used to use perfmon to get more details, but haven't looked at it for quite a while. Maybe i will start playing with it again. It offers the ability to see what is going on in networked machines ... although I can't get that to work now ... it was really simple once upon a time.

And uh ... I have wondered if all CPU warmers are created equal? Do some get the CPU warmer than others even if the CPU is maxxed?

Meeting with Intel? Excellent ... you the man.

Kougar · Jul 26, 2009

Psi* said:
And uh ... I have wondered if all CPU warmers are created equal? Do some get the CPU warmer than others even if the CPU is maxxed?

Yes. Run the same number of threads in Prime95 as the same number of threads in Linx, just as one example.

I totally don't have any basis right now on this, but I'm pretty sure it has to do with the type of workload, what execution units in the CPU are utilized, and definitely how well the program was written.

If a program isn't written well the CPU has to idle while it waits for more data to be fed to it, well written programs are optimized to keep the CPU as busy as possible. Additionally, to even better optimize a program for a CPU it needs to be configured so every stage in the CPU pipeline is busy performing some type of task, and all of them at once when you start getting into more than one core. It takes some serious programming I'd imagine.

Rob Williams · Jul 28, 2009

Psi* said:
I used to use perfmon to get more details, but haven't looked at it for quite a while. Maybe i will start playing with it again. It offers the ability to see what is going on in networked machines ... although I can't get that to work now ... it was really simple once upon a time.

Under Performance Manager, the Networked section should note all of the activity. What version of Windows are you using? If you need help, let me know and I can look at our installations here. I never refer to the networking parts of a machine, so I might not be thinking of the same spot.

Psi* · Aug 26, 2009

I have to remember to setup thread subscriptions ...

Kougar, & I am just thinking out loud, no basis for this lengthy comment other than writing assembly long ago.

Given the differing CPU architectures of the cores, the number of registers, pipelined, hyperthreading ... yadda yadda yadda ... it is reasonable to me that free or cheap CPU warming programs are never really optimized for any processor. Or, depending on the compiler options the program may hit a particular processor more severely than another.

I do know that both AMD & Intel offer optimized libraries for their chips. I have not paid much attention to this for a while, so given the large number of chips available this could be complex if used.

A solid state device use power, as in generates heat, during transitions or switching state between high & low. Of course nothing happens in a digital system without a shifting between highs & lows. In my mind, I see CPU A versus CPU B. CPU A takes 15 clock cycles (or shifts to over simplify) to do a multiply, for example. CPU B takes 10 for a multiply, but takes more cycles to shift data into a register. But, CPU B may make 2 shifts for every cycle and that may make more heat! And, nothing happens without being in a register; the use and ordering of registers is apart of optimization ... and these are different for almost every variation of CPU.

Oh, lets not forget 45 nm versus 32 nm (coming) technology sort of improvements either. Less distance for a charge to travel, the quicker it can get there == less heat, but more speed == more heat.

So, that is why I am not surprised at the frustrating lack of apparent consistency between benchmarking programs. I *think* that the SPEC guys offer compiler option suggestions for particular CPUs. So guys like Rob just have to look it up cook book style. What's the big deal?

Rob, thanks for the perf monitor pointer ... I'll look at it.

Rob Williams · Dec 18, 2009

I've made a new post on the front-page, opening the floor for more discussion. I'm so swamped in work, so it's highly unlikely that I'll be able to put any new suite to use before Westmere, but I'm hoping things will pan out and allow it. That assumes that we can wrap this up soon...

http://techgage.com/news/work-in-progress_updating_our_cpu_test_suite/

Our Processor Testing Suite

Rob Williams

Editor-in-Chief

Rob Williams

Editor-in-Chief

evilives34

Obliviot

Doomsday

Tech Junkie

Rob Williams

Editor-in-Chief

Rob Williams

Editor-in-Chief

Rob Williams

Editor-in-Chief

Attachments

Kougar

Techgage Staff

Psi*

Tech Monkey

Rob Williams

Editor-in-Chief

Psi*

Tech Monkey

Kougar

Techgage Staff

Rob Williams

Editor-in-Chief

Psi*

Tech Monkey

Rob Williams

Editor-in-Chief