Dolphin, the GameCube and Wii emulator - Forums

Full Version: Low FPS but low GPU load with 4xSSAA; not CPU limited
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4

DolphinRocks

On SSBB I can run LLE, DX9 4xIR, no SSAA at 59-60 fps with my specs at 31% GPU Load.
As soon i set 4xSSAA fps drop to ~40 at 58% GPU load.
Why isn't my GPU stressed more to get higher FPS? I can't be CPU limited from the first example, can I?
Running Dolphin x64 3.0-787

Specs:
i7-920 OC to 3.6GHz
ATI HD5870 1GB OC to 900/1300MHz
Your GPU is not strong enough to handle 4xSSAA + 4xIR. That's it.

The CPU is not the problem.

DolphinRocks

I thought that was the case but why is it being underutilised? Shouldn't it be at 100% load for it to struggle?
It can handle up to 2.5xIR and 4xSSAA, I just checked. This is better than 4xIR without SSAA, right?
It's always better to add IR than SSAA, because they're basically implemented the same, but SSAA gets scaled down more times than IR, so loses more quality.

Anyway, sometimes GPUs don't show full load, even when they're the bottleneck. This is because you aren't necessarily using all the GPU's resources (for example the Graphics RAM isn't in high demand by dolphin), but part of the actual chip is all in use.
My guess is at that resolution that 32 ROPs can't keep up with the 80 TMUs and/or 1600 shaders.
Quote:It's always better to add IR than SSAA, because they're basically implemented the same, but SSAA gets scaled down more times than IR, so loses more quality.

wtf am I reading. No it doesn't.

Quote:Anyway, sometimes GPUs don't show full load, even when they're the bottleneck. This is because you aren't necessarily using all the GPU's resources (for example the Graphics RAM isn't in high demand by dolphin), but part of the actual chip is all in use.

Video ram capacity and bandwidth aren't measured at all when the GPU load is calculated.

Quote:My guess is at that resolution that 32 ROPs can't keep up with the 80 TMUs and/or 1600 shaders.

It's a possibility. If I recall shader throughput is the only thing that is measured by GPUZ (which is what I assume he's using).

It's more likely that he's running out of memory bandwidth. There is an easy way to test that. He can change his video memory clock rates and retest for changes in performance.

Quote:I thought that was the case but why is it being underutilised? Shouldn't it be at 100% load for it to struggle?
It can handle up to 2.5xIR and 4xSSAA, I just checked. This is better than 4xIR without SSAA, right?

2.5xIR + 4xSSAA = (2.5^2) x 4 = 25x
4xIR + no SSAA = (4^2) x 1 = 16x

4x IR + 4xSSAA = 64 times the native resolution. That's a crazy high internal resolution of about 21 megapixels per frame.
How would I go about testing where my bottleneck is in the actual GPU (such as whether it is memory badwidth or GPU clock or shader clock)?
Quote: How would I go about testing where my bottleneck is in the actual GPU (such as whether it is memory badwidth or GPU clock or shader clock)?

1. Pick a clock rate
2. Adjust it
3. Benchmark the scenario
4. Goto Line 1

Or alternatively adjust settings and benchmark (although this is tricky to do in a reliable way since many settings affect more than one part of the GPU and/or 3D graphics pipeline).
Ok. Basically I should change things and test to see if it makes a speed difference, and if it doesn't move onto something else?
Pages: 1 2 3 4