Dolphin, the GameCube and Wii emulator - Forums

Full Version: GPU Bottleneck?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
(11-23-2013, 07:39 PM)Anti-Ultimate Wrote: [ -> ]You can't measure how good Dolphin will perform with CPU usage.

Dolphin needs a good IPC (Instructions per cycle). Your CPU has a pretty low IPC compared to newer i3, i5 and i7 Processors.
That's why it is so slow, and not because "Dolphin isn't making it run at 100% load".
Hi Anti-Ultimate , I don't know about actual comparison in terms of IPC, but from a MIPS (million instructions per second) perspective, new generation Core i7s (~14k MIPS) are not THAT big of a leap from the Core 2 Duo (~11k MIPS), from my numbers its approximately 30% more.

Though I don't know the exact process flow of Dolphin. But a slow processor should not limit how well Dolphin utilize the CPU. And as I pointed out in the original post release 3.5 does fully utilize my SLOW processor. Why would that be the case?

(11-24-2013, 04:22 AM)KHg8m3r Wrote: [ -> ]Older versions are not as precise as the latest versions, thus they aren't as demanding.
However, you may be able eek out a couple more fps by changing some things up in your settings. Please post pictures of your graphics config pages?
Yah, would it be the case that newer versions of Dolphin is more demanding on the GPU such that my nVidia GT 8600M would bottlenecks the whole process?

Because the release 3.5 can fully utilize my CPU while newer versions (4.0-rXXX) give a lower FPS and at the same time can not fully utilize my CPU.

Here are my setup for dolphin 4.0 r414:
[Image: dQ9a4ZG.png]

[Image: V3tEu18.png]


[Image: 3KjZCG8.png]


[Image: Tx85BDs.png]
(11-24-2013, 04:39 AM)anaid Wrote: [ -> ]Here are my setup for dolphin 4.0 r414:
http://i.imgur.com/dQ9a4ZG.png
http://i.imgur.com/V3tEu18.png
http://i.imgur.com/3KjZCG8.png
http://i.imgur.com/Tx85BDs.png
I see a problem, turn off V-sync
Try enabling the OpenMP option. It can give a speedup.

Also, in regards to processor speeds, Dolphin uses the per-core performance. As per the single core-performance of your T9300, Passmark gives you a score of 999. The i7-4770k at base speeds has a score of 2165. Thats an increase of about 216% per core
(11-24-2013, 05:12 AM)zaude93 Wrote: [ -> ]
(11-24-2013, 04:39 AM)anaid Wrote: [ -> ]
(11-23-2013, 07:39 PM)Anti-Ultimate Wrote: [ -> ]can't measure how good Dolphin will perform with CPU usage.

Dolphin needs a good IPC (Instructions per cycle). Your CPU has a pretty low IPC compared to newer i3, i5 and i7 Processors.
That's why it is so slow, and not because "Dolphin isn't making it run at 100% load".
Hi Anti-Ultimate , I don't know about actual comparison in terms of IPC, but from a MIPS (million instructions per second) perspective, new generation Core i7s (~14k MIPS) are not THAT big of a leap from the Core 2 Duo (~11k MIPS), from my numbers its approximately 30% more.

Though I don't know the exact process flow of Dolphin. But a slow processor should not limit how well Dolphin utilize the CPU. And as I pointed out in the original post release 3.5 does fully utilize my SLOW processor. Why would that be the case?

(11-24-2013, 04:22 AM)KHg8m3r Wrote: [ -> ]Older versions are not as precise as the latest versions, thus they aren't as demanding.
However, you may be able eek out a couple more fps by changing some things up in your settings. Please post pictures of your graphics config pages?
Yah, would it be the case that newer versions of Dolphin is more demanding on the GPU such that my nVidia GT 8600M would bottlenecks the whole process?

Because the release 3.5 can fully utilize my CPU while newer versions (4.0-rXXX) give a lower FPS and at the same time can not fully utilize my CPU.

Here are my setup for dolphin 4.0 r414:
I see a problem, turn off V-sync
Thanks I will give it a try.

(11-24-2013, 05:13 AM)KHg8m3r Wrote: [ -> ]Try enabling the OpenMP option. It can give a speedup.

Also, in regards to processor speeds, Dolphin uses the per-core performance. As per the single core-performance of your T9300, Passmark gives you a score of 999. The i7-4770k at base speeds has a score of 2165. Thats an increase of about 216% per core
I tried it before OpenMP doesn't really help, and as stated in the Dolphin config page "especially on CPUs with more than two cores".

My statement earlier about performance difference was aimed at IPC. That is why I used the numbers for i7-2860QM. Using i7-4770k with 3.5GHz with single core is hardly a fair comparison with a T9300 at 2.5GHz.

Nonetheless Passmark does score i7-2860QM at 1,653 compared to T9300 at 999, a good 65% increase. This tells us that the Sandy Bridge architecture is just efficient than the Penryn architecture. But not why earlier release of Dolphin can fully utilize the T9300 CPU while newer versions cannot...

Huh

(11-24-2013, 05:12 AM)zaude93 Wrote: [ -> ]
(11-24-2013, 04:39 AM)anaid Wrote: [ -> ]Here are my setup for dolphin 4.0 r414:
http://i.imgur.com/dQ9a4ZG.png
http://i.imgur.com/V3tEu18.png
http://i.imgur.com/3KjZCG8.png
http://i.imgur.com/Tx85BDs.png
I see a problem, turn off V-sync
Thanks! Ok just tried it. Same issue with Dolphin 4-rXXX, 30 FPS, while only utilizing 70% of my CPU. But better results with Dolphin 3.5, I am getting an improvement of close to 60 FPS, with full 100% CPU utilization.....
anaid Wrote:Your latter statement is true, the desktop i3 indeed perform better than the laptop Core 2 Duo. But your statement about CPU bottleneck in my laptop, is there anyway to confirm that? Because the CPU utilization monitor sure tells a different story....

Assuming LLE on thread is off dolphin uses two threads/cores. Main and video. The Main thread could be bottlenecked by cpu performance while the video thread is not. Or vice versa. This would result in one core at 100% usage and another core with a low utilization. Resulting in a total utilization that is not 90-100% despite a cpu bottleneck being present.

It could also be a GPU bottleneck. Newer builds are generally more demanding on GPUs.

If we still had the lock threads to cores option that would be perfect for diagnosing this....sigh. Since the threads bounce back and forth without that option I don't see a simple way to measure actual per thread cycle usage.

anaid Wrote:Hi Anti-Ultimate , I don't know about actual comparison in terms of IPC, but from a MIPS (million instructions per second) perspective, new generation Core i7s (~14k MIPS) are not THAT big of a leap from the Core 2 Duo (~11k MIPS), from my numbers its approximately 30% more.

MIPS is entirely relative to the application. As is IPC.

If you're comparing two microprocessors processor B could have 30% higher IPC that processor A in one application while having 210% higher IPC in another application. Depending on what kind of code those applications execute and how the microarchitecture is setup to deal with them.

anaid Wrote:Though I don't know the exact process flow of Dolphin. But a slow processor should not limit how well Dolphin utilize the CPU.

That's not entirely true. See above.
(11-25-2013, 08:56 AM)NaturalViolence Wrote: [ -> ]
anaid Wrote:Your latter statement is true, the desktop i3 indeed perform better than the laptop Core 2 Duo. But your statement about CPU bottleneck in my laptop, is there anyway to confirm that? Because the CPU utilization monitor sure tells a different story....

Assuming LLE on thread is off dolphin uses two threads/cores. Main and video. The Main thread could be bottlenecked by cpu performance while the video thread is not. Or vice versa. This would result in one core at 100% usage and another core with a low utilization. Resulting in a total utilization that is not 90-100% despite a cpu bottleneck being present.

It could also be a GPU bottleneck. Newer builds are generally more demanding on GPUs.

If we still had the lock threads to cores option that would be perfect for diagnosing this....sigh. Since the threads bounce back and forth without that option I don't see a simple way to measure actual per thread cycle usage.

anaid Wrote:Hi Anti-Ultimate , I don't know about actual comparison in terms of IPC, but from a MIPS (million instructions per second) perspective, new generation Core i7s (~14k MIPS) are not THAT big of a leap from the Core 2 Duo (~11k MIPS), from my numbers its approximately 30% more.

MIPS is entirely relative to the application. As is IPC.

If you're comparing two microprocessors processor B could have 30% higher IPC that processor A in one application while having 210% higher IPC in another application. Depending on what kind of code those applications execute and how the microarchitecture is setup to deal with them.

anaid Wrote:Though I don't know the exact process flow of Dolphin. But a slow processor should not limit how well Dolphin utilize the CPU.

That's not entirely true. See above.
Hey NaturalViolence. Thanks for your input, I learned a lot from what you said. I totally get why CPU utilization can not fully reflect CPU bottleneck now.

On a side note. I ran some test with the OpenGL Driver Monitor with Xcode. Dolphin 4.0-rXXX is indeed more demanding in terms of GPU core utilization. However, for some reason it almost doubled the "CPU wait for GPU" time required when comparing to Dolphin 3.5.

Not sure if it is due to poor hardware performance on certain OpenGL functions calls, or poor optimization of Dolphin.

[Image: 57qSMfC.png]
Makes sense given your symptoms. Disable external frame buffer and turn on vertex streaming hack. Then run it again.
(11-26-2013, 09:18 AM)NaturalViolence Wrote: [ -> ]Makes sense given your symptoms. Disable external frame buffer and turn on vertex streaming hack. Then run it again.
External frame buffer is turned off already for those results. Vertex streaming hack results in a black screen, no video, with my system.
Gah. Ok then I blame either apple or nvidia's poor openGL implementation. At this point I would say just use bootcamp. Vertex streaming hack should fix the issue but clearly you can't use it for some reason.
(11-26-2013, 09:51 AM)NaturalViolence Wrote: [ -> ]Gah. Ok then I blame either apple or nvidia's poor openGL implementation. At this point I would say just use bootcamp. Vertex streaming hack should fix the issue but clearly you can't use it for some reason.
Could what DJBarry004 said about GeForce 9xxx be true, that my GeForce GT 8600M doesn't have the "loophole" in the graphics driver? I'll try on Boot Camp and see.
Quote:If your GPU pertained to the GeForce 9xxx series and above, you could use Vertex Streaming Hack.

Also, your CPU is kind of slow. But even being slow it shouldn´t give you speed issues.
Pages: 1 2 3