CUDA n-Body

Krazy K · Feb 23, 2009

Rob,

I have a rather large favor to ask of your staff. CUDA is now available on most nVidia GPUs. Problem is, it doesn't run the same way as video and you can't size up cards based off thebig fat table or Tom's tables. I had a GTX260 and got a n-body score of 20fps/320 gflop/s. Not knowing what I know now, I bought a 9800GX2 and my score dropped to 37fps/200 gflop/s. No where out there is a table of these n-body scores. If you catch my drift, can you have someone get the CUDA toolkit and run the n-body on some cards you have on hand?

Kougar · Feb 23, 2009

I don't believe directly translating those scores into real world use wouldn't give any real or useable info? For example the 9800GX2 will get a much higher Folding@home PPD score than a single GTX 260. It would just depend on the CUDA-based program in question, and if it can properly utilize both cores in a dual-GPU card such as the GX2.

Rob Williams · Feb 24, 2009

I agree with Kougar on this one, in that the metric really shouldn't deliver important information. I should ask though, for what reason do you want to make sure one card is better than another? Is it for Folding@home? GPGPU? If I can find the time, I'd be happy to take a look at this, but it might take me a week or two. My current queue is a little... foolish.

For what it's worth, we might be able to include Sandra's GPU tests to our GPU reviews. Would anyone find any interest in that?

Kougar · Feb 24, 2009

Sorry to hear about your queue issues Rob! Hoping that works itself out quickly for ya.

I would have to go look at the SiSoft Sandra tests as I don't remember anything about their GPU tests, so to be honest it doesn't make a difference to me. Then again come to think of it didn't they have a GPGPU test built into the thing now? I might wonder if it was any good, but I would still say for GPGPU (ie CUDA) purposes, any interested user should find performance results from users already running that specific CUDA or GPGPU on their hardware. From my experience with F@H it mostly comes down to how well the GPGPU software is written... such as can it fully utilize all of the hardwar, can it run on dual-GPU hardware or even multi-GPU setups, etc. F@H has had issues with all of the above until only recently...

Rob Williams · Feb 24, 2009

Hah, no worries on the queue situation. It's always a little "foolish" as I said, but lately it's simply been made worse by the fact that I've been running into a lot of issues with every large article I've been doing lately. As a result, I have a few things kicking around that should have been dealt with long ago. That includes a few X58 boards, few GPUs, a notebook, some flash memory (roundup) and some other stuff I can't really talk about right now. I'm catching up, but it's been fun trying to do so at an efficient pace :-D

With GPGPU results, Sandra utilizes both CUDA and ATI Stream, and like the CPU tests, it will essentially stress the processor to deliver scalable results. I forget the metrics used there, to be honest, but either way, it might be cool to see how each GPU scales. Typically, I'd believe that whatever metric is used there, could be carried over to give an example of GPGPU performance anywhere.

As for Folding@home, I wish there was a real benchmark out there for this. It would be great to be able to include that information in our GPU, and even CPU reviews. I'm not sure why something like this doesn't exist. The workloads out there exist, so it would be a matter of making sure the same exact workload is computed each time.

Krazy K · Feb 25, 2009

I run Seti @ home, when I had the 260 installed I could tear through two 7 hours tasks in about 23 minutes. Now with this gx2, it takes almost 25 minutes to get through the same 7 hour task. I started looking into it and it is based almost exclusively how fast single point floating calculations are processed. How it's written has some effect but it's the same program and the same tasks. I got the SDK pack because the forums told me that how it was measured and how I should benchmark the cards. I don't know how it all works but I can tell you with a high degree of certainty, the 260 is faster than the 9800gx2.

Kougar · Feb 25, 2009

Rob Williams said:
As for Folding@home, I wish there was a real benchmark out there for this. It would be great to be able to include that information in our GPU, and even CPU reviews. I'm not sure why something like this doesn't exist. The workloads out there exist, so it would be a matter of making sure the same exact workload is computed each time.

Technically such as thing does exist, but The Tech Report is the only site that does it. One of their forum members modified an old F@H benchmark and with some more tweaking they've got a fairly decent CPU based F@H benchmark. I looked into using it at one time as to quantify & performance tune my systems for F@H, but I believe you would need to ask them directly for the program to use it now.

Krazy K said:
I don't know how it all works but I can tell you with a high degree of certainty, the 260 is faster than the 9800gx2.

That is my point exactly. I bet you anything the seti program was only using a single core of your GX2.

A single 9800 core is marginally slower than a GTX 260, but two of them on a GX2 will easily outfold a single GTX 260 if the program is able to use both cores. Most CUDA/GPGPU programs can't.

Rob Williams · Feb 26, 2009

Kougar said:
Technically such as thing does exist, but The Tech Report is the only site that does it. One of their forum members modified an old F@H benchmark and with some more tweaking they've got a fairly decent CPU based F@H benchmark. I looked into using it at one time as to quantify & performance tune my systems for F@H, but I believe you would need to ask them directly for the program to use it now.

Oh, I'm aware of that (since I frequent their site), but that application was built for them, not us. I just think it would be nice if Folding@home themselves could release a benchmarking feature or special client for the sake of benchmarking. It doesn't seem that difficult to me. Just choose a workload that's fair to both AMD and Intel.

Also, I agree with Kougar on the single-core aspect of Seti@home. It's too bad that the applications weren't designed to run two entirely different workloads at a time, one on each core (or even more for multi-GPU setups). With that in mind, our results with SANDRA or something else would be skewed, because of the applications people are looking to run would not really equate with the results we give. I have a good feeling that things will be much different a year from now, when GPGPU is much more popular.

Kougar · Feb 26, 2009

Folding@home actually DID have their own GPU benching tool, back when the 7800GTX and 7900GTX were in their heyday. They stated the tool is not only outdated but was extremely basic and completely inapplicable for use today though, but there were links to it somewhere in their forums if you wanted to read up on it.

That said, Folding@home still needs a major redesign and overhaul of their SMP programs. It's extremely easy to misinstall them and almost too-complicated for average users. And also the 32bit clients were migrated to much more stable dieno code that brought some performance boosts... but the 64bit clients still use the very old holdover mpich code which is not exactly stable and is slower. And a bit trickier to install.

It has taken a long time just for F@H to "grow" to support multi-GPU setups, let alone dual-GPU single cards. True they stated part of the delay was due to GPU makers needing to change and/or clean up their drivers but either way it has taken F@H quite awhile to get to where it is just today... and they have a ways to go before it is a simple program to use regardless of client.

Their GPU client is evidence they are heading in the right direction as it is significantly easier to use than their SMP CPU client... but they're not there yet. It's my opinion that they were ahead of many other GPGPU programs in this regard, but as I don't use any others I couldn't say with any certainty.

Rob Williams · Feb 26, 2009

Kougar said:
That said, Folding@home still needs a major redesign and overhaul of their SMP programs. It's extremely easy to misinstall them and almost too-complicated for average users.

I can almost guarantee that it's even more complicated under Linux. Well, at least the BOINC client I used in the past to handle things. The command-line F@h was fine, but it obviously leaves a bit to be desired since it runs as a daemon. I've since given up on folding though, at least until a GPU-based client becomes available for Linux. I just can't bring myself to adding to my power bill when a GPU would be far superior.

As for the client, I had no idea it had so many issues on the Windows side as well. Seems odd to me since it's actively maintained.

Kougar said:
Their GPU client is evidence they are heading in the right direction as it is significantly easier to use than their SMP CPU client... but they're not there yet.

For the most part, I'd have to agree. I think it's because we see real results there (as in the overall figures). You don't see it so much with other applications right now. I think that will change though. The GPGPU world is still rather new.

Kougar · Feb 27, 2009

Well, as they say there are certain simulations that simply aren't worth being run on a GPU as they aren't inherently parallel in nature and wouldn't see any gains in speed. Either CPU or GPU folding is going to add noticeably to the total system power consumption, so if I'm going to pay for having the system running 24/7 I might as well make sure all of it is being fully utilized. Or that's my take on it anyway.

Rob Williams said:
As for the client, I had no idea it had so many issues on the Windows side as well. Seems odd to me since it's actively maintained.

The 5.91 and 5.92 clients make the current SMP clients look 100% stable and easy as 1-2-3 by comparison, so maybe that tells you how bad it really used to be! Stanford doesn't seem to have an answer to updating how SMP runs on 64bit systems, mostly just the 32bit prgram is seeing improvements for the last six or more months.

Edit: Do command-line based programs usually tend to matter if you close them via the red X in the corner of the window instead of ctrl+c? That's another thing about the SMP and single-core CPU clients... if you close them in ANY other method except ctrl-c they do not shut down properly and often would lose data.

Rob Williams said:
Kougar said:

Their GPU client is evidence they are heading in the right direction as it is significantly easier to use than their SMP CPU client... but they're not there yet.

Click to expand...

For the most part, I'd have to agree. I think it's because we see real results there (as in the overall figures). You don't see it so much with other applications right now. I think that will change though. The GPGPU world is still rather new.

Well, I was coming from the approach that the GPU client is now as simple as downloading an exe, running the setup for it, and then typing in the most basic config information.

This is a far cry from their SMP clients that must be installed from the exe then require an administrator cmd level prompt for command-line installation, and then require a command line configuration of all settings. And as you aptly point out it still requires running several processes 24/7 in the background as a daemon even when the program is not in use under Windows (or linux too, from what you said).

Krazy K · Feb 27, 2009

Kougar said:
That is my point exactly. I bet you anything the seti program was only using a single core of your GX2. A single 9800 core is marginally slower than a GTX 260, but two of them on a GX2 will easily outfold a single GTX 260 if the program is able to use both cores. Most CUDA/GPGPU programs can't.

So this is why I get the message.

CUDA Devices found
Coprocessor: GeForce 9800 GX2 (1)

It only sees one processor, not both?
That would explain a the whole thing, but then what happens if you had say two 9800 GTs in SLi, it would just treat it as one right?

Kougar · Mar 1, 2009

CUDA should see both GPU cores... the program using CUDA may not, and that is an important distinction. Folding@home uses CUDA and now is able to use both cores on a GX2 (which allows it to outfold a GTX 260). You would need to check with the seti client and forums and see if their program is capable of it or not as I do not know offhand.

Merlin · Mar 1, 2009

Okay, You have me interested in Folding@Home, I'll check it out after work

Rob Williams · Mar 4, 2009

Kougar said:
Well, as they say there are certain simulations that simply aren't worth being run on a GPU as they aren't inherently parallel in nature and wouldn't see any gains in speed.

Are you saying that CPU clients aren't a total waste of time, or they are, and the GPU makes much more sense? I'm not keen on increasing heat in this room with Folding, but I'd consider testing out the GPU client if it's ever made available for Linux (I'm doubtful that will be anytime soon).

Kougar said:
Do command-line based programs usually tend to matter if you close them via the red X in the corner of the window instead of ctrl+c?

I'm not entirely sure what you mean here. If you ran the client as normal, pushing CTRL+C would kill it off, but that's where running it as a daemon comes into play. If you run it that way, it's essentially run as a service, so it frees up the command-line and lets you do whatever else you need to... or close it entirely if you are running a desktop. You'd just use various commands to take control of it, or refer to the log files it outputs to make sure it's running fine. As for whether or not using CTRL+C would clear out the data, I have no idea.

Kougar said:
Well, I was coming from the approach that the GPU client is now as simple as downloading an exe, running the setup for it, and then typing in the most basic config information.

Again, I have to agree. When I tested it out, I was actually blown away by how simple it was. I was seriously expecting something really complicated after dealing with the original client and BOINC.

Krazy K said:
That would explain a the whole thing, but then what happens if you had say two 9800 GTs in SLi, it would just treat it as one right?

People have built machines with four or more GPUs for Folding, so this issue might be quite simple to fix. You'd want to make sure that you have the latest drivers, and also the latest Folding software.

Kougar · Mar 5, 2009

Rob Williams said:
Are you saying that CPU clients aren't a total waste of time, or they are, and the GPU makes much more sense? I'm not keen on increasing heat in this room with Folding, but I'd consider testing out the GPU client if it's ever made available for Linux (I'm doubtful that will be anytime soon).

I'm saything they are not a total waste of time, specifically because F@H has several classes of simulations that are not inherently parallel... they would be faster or just as fast to run on a Dual or Quad CPU.

Rob Williams said:
I'm not entirely sure what you mean here. If you ran the client as normal, pushing CTRL+C would kill it off, but that's where running it as a daemon comes into play. If you run it that way, it's essentially run as a service, so it frees up the command-line and lets you do whatever else you need to... or close it entirely if you are running a desktop. You'd just use various commands to take control of it, or refer to the log files it outputs to make sure it's running fine. As for whether or not using CTRL+C would clear out the data, I have no idea.

What I mean is, if you use Ctrl+C to close any of the F@H CPU clients, it shuts down properly. If you close the command line window by clicking the red X, the program shuts down incorrectly and often loses data/checkpoints. The underlying service daemon is always running whether F@H SMP is or not, but thankfully it has a very small footprint. Stanford has officially stated the only way users should close F@H is via the ctrl-c method, which seemed odd to me.

Rob Williams said:
Again, I have to agree. When I tested it out, I was actually blown away by how simple it was. I was seriously expecting something really complicated after dealing with the original client and BOINC.

It used to be very complicated! And it used to require 100% of a CPU core in overhead to run the GPU client... today regardless of if you use Vista or XP or Windows 7, it uses <1% of the CPU to manage GPU driver overhead.

Krazy K said:
That would explain a the whole thing, but then what happens if you had say two 9800 GTs in SLi, it would just treat it as one right?

I am fuzzy on the details (I honestly don't recall them anymore) but I believe you would need two instances of the GPU clients with each configured differently. There are some very good FAQs that I've seen to walk users through the process.

CUDA n-Body

Krazy K

Partition Master

Kougar

Techgage Staff

Rob Williams

Editor-in-Chief

Kougar

Techgage Staff

Rob Williams

Editor-in-Chief

Krazy K

Partition Master

Kougar

Techgage Staff

Rob Williams

Editor-in-Chief

Kougar

Techgage Staff

Rob Williams

Editor-in-Chief

Kougar

Techgage Staff

Krazy K

Partition Master

Kougar

Techgage Staff

Merlin

The Tech Wizard

Rob Williams

Editor-in-Chief

Kougar

Techgage Staff