Report: NVIDIA Throttles PhysX Performance on the CPU

Brett Thomas

Senior Editor
It's been long rumored that NVIDIA designs PhysX to run in a throttled state on a CPU in order to have more impressive results on its own GeForce hardware, and after seeing the research done by David Kanter of Real World Technologies, it's going to be hard to continue calling it a rumor. By analyzing the finest details of a running PhysX-enabled application, David was able to better understand how it was doing its work on the CPU.

nvidia_physx_official_logo.png

Read the full news story here and your thoughts on it right here!
 

Brett Thomas

Senior Editor
Wow, there's a lot of issues here. It's one thing to optimize for YOUR platform, it's another to intentionally hobble all others. That being said, this is a BIG accusation and one that needs more consideration.

What the author DOESN'T go on to talk about in the initial report is what instruction sets the Ageia and NV Physx chips can actually HANDLE. Simply because CPUs have access to SSE and SSE2 instruction sets does not mean that every chip in your computer utilizes them. It could very well be that these are essentially limited FPUs that do not support SSE or even are solely optimized to handle x87 instructions quite differently. After all, physics is math and there's really not much that SSE does that a straight-up x86 chip can't do on its own, just a little differently. Basically what NV is doing is saying that Physics on PhysX is essentially a big math co-processor.

So the question really isn't "Is NVIDIA intentionally hobbling gamers who aren't using its stuff," but more "Is coding for a discrete but limited FPU making the code less efficient at the CPU level?" The answer is probably "of course", but it's far less criminal, and it actually makes quite a bit of sense.

Sadly, that's not what's being asked.
 
Last edited:

Rob Williams

Editor-in-Chief
Staff member
Moderator
Brett Thomas said:
What the author DOESN'T go on to talk about in the initial report is what instruction sets the Ageia and NV Physx chips can actually HANDLE.

The thing is, though, it doesn't matter which instructions the GPU can handle, because that's not the argument. The argument is that PhysX, to a limited degree, <em>will</em> use SSE, but on a 1:100 scale compared to x87. The CPU <em>can</em> use SSE, obviously, but NVIDIA chooses to use inefficient code in order to get the job done. Does PhysX also utilize x87 code via CUDA on its GPU's? I'm not sure, but I do know that the GPU's should be able to handle the instructions, given NVIDIA's said that its GPU's could run x86 code no problem if need be.

In the article, it's shown that if the PhysX code <em>was</em> written to utilize SSE, then we'd see a tremendous speed-boost. Does this mean that NVIDIA would have to compile two different binaries? Probably, but that's a minor issue, as the end DLL or .exe or whatever it is that PhysX uses would only weigh a few megabytes at worst. PhysX turned on and GeForce card detected? Run the physics on the GPU. Hardware PhysX not detected? Run the binary with SSE enhancements on the CPU.

Companies like NVIDIA and AMD already bundle a whack of extra stuff with their drivers, so adding a dual-binary like this isn't a completely ridiculous idea. NVIDIA's cards wouldn't use SSE, but I've been told by the company numerous times in the past that its CUDA architecture is so robust, that it could almost emulate a CPU, in that, all of these CUDA-based apps currently available are more of the beginning, not the limit of where things could go.

Brett Thomas said:
After all, physics is math and there's really not much that SSE does that a straight-up x86 chip can't do on its own, just a little differently.

I don't think that's accurate. As far back as SSE2 I believe there have been some instruction sets that could be used for physics uses, but up to this point, I believe they've been mostly used for non-gaming means. In SSE4, the dot product instructions (forget the names atm) could easily be used for physics. Of course, not every gamer has SSE4, but arguably few people are going to run PhysX on an old PC, or want to.

Either way, I still think NVIDIA could create a dual binary, but it won't. It makes money on its GeForce cards, and opening up PhysX to be run elsewhere could hurt sales. It sure wouldn't be able to sell enough licenses to developers to make up for the loss in GPU's, because just how important <em>are</em> physics that a developer might pay up to or more than $50,000 for a license? Hard to say.

I'd be interested to hear NVIDIA's take on this, but we likely won't get one. So from hereonout, it looks like nothing will change, but I'm not sure anyone else thought otherwise.
 

Tharic-Nar

Senior Editor
Staff member
Moderator
I don’t know many people (if any) that buy an Nvidia GPU solely for the purpose of PhysX acceleration, mainly because very few games actually support it. Most examples of games supporting PhysX are through tech demos, single level add-ons and a couple snazzy effects, hardly game changing. The other problem as well is that you need a dedicated GPU for the purpose of PhysX since trying to run both graphics and PhysX on the same GPU can thrash it resulting in an overall poor experience, so you need 2 GPUs for a few fancy effects.

When you look at the fact that Nvidia deliberately disables hardware PhysX when a none Nvidia GPU is detected, also points to this possible throttled CPU processing as another trick up their sleeve to force people to buy their GPU’s for PhysX. The flaw in their plan is that it’s very hard to convince someone to buy a product for a niche feature they neither need nor want. Does physics enhance a game? Yes, most certainly, when done correctly. Is it required to make decent game? No, not at all, a lot of games get by fine without it. The fact most games these days are for the console market, which make absolutely no use of PhysX hardware acceleration, is another nail in the coffin. PhysX is software, and as software, needs to be made as efficient as possible, not made efficient through the use of an expensive add-on.
 
Last edited:

Optix

Basket Chassis
Staff member
I miss the good ol' days where you could truly say my card could beat your card without having any extra processes going on in the back ground.
 

Brett Thomas

Senior Editor
@Optix - it's so, so true.

@Rob -
You're failing to understand my point. If NV's cards cannot handle SSE2 instructions, I see absolutely no reason why it should then create an entirely separate compiler to utilize SSE2 (or anything else) for people to run it as a separate thread on a NON-NV product.

The issue we have here is one of specific intent. Is NV actually HOBBLING it on other platforms, intentionally, to make PhysX look better? Or is NV simply not going out of its way to ENHANCE the experience for non PhysX users?

My argument is that I bet the GPU chips don't support SSE, because that's something that's quite CPU specific (and would further require a tiny royalty to Intel, actually, for every chip manufactured, if memory serves!). These are not full-on Pentium II x86 processors sitting on our graphics cards - their entire makeup is very different. Therefore, it doesn't make a lot of sense for NV to code for instructions that it does not support. The DLL that functions as a HAL for streaming to the GPU for CUDA or even regular processing (wow, enough acronyms there?!) can utilize SSE instructions, which is undoubtedly the part they're seeing as the 1:100 code. In fact, that would actually make perfect sense, because bus width means that one instruction given to send to the GPU pipeline would come with arguments to send up to 128 OTHER instructions right after it, depending on instruction size. If there's no GPU, all of those subsequent instructions will instead process through the CPU.

It's great to say that NV could compile a second binary to process these instructions as vectorized SSE for users who dedicate a CPU core instead of their GPU. But frankly this is NV's baby and last I checked, they don't OWE optimization to other platforms.

I guess that's the whole point where I find this misleading - it's pitched right from the source as a direct intent to hobble without considering that it could very well be just plain unoptimized for modern i686...and that would make sense since there's no i686 chips sitting on our GPUs. Should it be discussion fodder? Certainly. But I hardly think that the original writers were correct in their conclusion - unless the GPU processors handle SSE...and then it's a whole different ball game.
 

Kougar

Techgage Staff
Staff member
Brett, if I may jump in here as I think you misunderstood the issue itself. The core issue is NVIDIA defaults to x87 code when a PhysX capable GPU is not present. So this has nothing to do with the GPU, and everything to do with the PhysX code being designed to use the least efficient CPU instruction set on the CPU for physics work, when a NVIDIA GPU is not present in the system.

The x87 instruction set was completely replaced by SSE. There isn't a single reason to use x87 over SSE unless wanting to compare performance of a modern processor against that of say a Pentium II. One of the excuses given for still using x87 is that the underlying codebase in Ageia's PhsyX is so old (somewhere around 2002) and full of other issues that it wasn't worth rewriting it properly.

Ars Technica pretty much nails it on the head, again: http://arstechnica.com/gaming/news/...cpu-gaming-physics-library-to-spite-intel.ars

If y'all recall, this is the same style of BS that NVIDIA touted when they disabled GPU-based PhysX on systems that had both NVIDIA and AMD GPUs installed simultaneously. There's no actual reason to do so, but they spun the issue to be that they didn't wish to undertake additional Q&A testing to make sure having an AMD GPU running the game didn't somehow cause problems, despite that it worked perfectly fine.
 
Top