Dolphin, the GameCube and Wii emulator - Forums

Full Version: Is EFB copies to RAM GPU dependant in DX11 and OpenGL?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
I'm using Dolphin 3.5-1878 version (it's pretty recent). In D3D11 and OpenGL, the EFB copies to RAM option becomes slower and slower as I increase the internal resolution. No AA is used. For example, in TLOZ:TP, Kakariko Village:

DX11, EFB to RAM, 4xIR -> 35fps
DX11, EFB to RAM, 2xIR -> 40fps
DX11, EFB to RAM, 1xIR -> 54fps
DX11, EFB to Texture, 4xIR -> 54fps
DX9, EFB to RAM, 4xIR -> 54fps
DX9, EFB to Texture, 4xIR -> 54fps
OpenGL does pretty much the same as DX11.

I usually play at 4xIR in D3D9 and that's not a problem even with EFB to RAM enabled in Twilight Princess and The Last Story. EFB to RAM in D3D9 makes only some difference in some situations when playing TLS.
Does this happen to you?
Yes, this happens to everybody. It's normal.
Any idea why it happens though? I'm curious.
Anyone who will say that dx11+ogl are more accurate and so slower is just wrong. This shouldn't increase the upscaling payload. But what else?

I guess it's because of the cpu-gpu latency. As efb2ram has to stall the gpu it has to wait for this complete latency on every efb copy. This latency is useful for async rendering (everything but efb2ram, efb access, perf querys and real xfb) as it allows the gpu to do more rendering calls in parallel. But I'll profile this again.

btw: Have you set your gpu driver option to max performance? As the driver often reclock the gpu when it stalls to often :-(
btw2: Can you also try to disable the "threaded optimization" of your gpu driver? This usually speeds up (much), but it also increase the latency.
I think the reason why it is slower at higher IR is that Dolphin is simply reading more pixels to copy back to RAM.
(08-28-2013, 05:25 PM)degasus Wrote: [ -> ]btw: Have you set your gpu driver option to max performance? As the driver often reclock the gpu when it stalls to often :-(
btw2: Can you also try to disable the "threaded optimization" of your gpu driver? This usually speeds up (much), but it also increase the latency.
Setting the GPU to max performance helped, thanks. Now the minimum fps in Kakariko Village are around 40~42. Disabling threaded optimization (it was in "Auto" before)
didn't make a difference.
I also noticed that Dolphin doesn't trigger my overclocking profile in MSI Afterburner, maybe because Dolphin is a 64 bit application. After applying overclocking manually, I got about 1fps more. Barely noticeable.
Btw, antialiasing affects performance A LOT. 8 samples at quality level 32 dropped my fps to 30. That made the GPU core clock rise up to 1320MHz, but GPU usage was only about 50%.

Edit: anyway, CPU usage is about the same when frame limiter is off (80~86%), even with AA, so this "issue" must be related to the CPU instead of the GPU, as degasus and skid said.
I think it's more because of the gpu, but we don't use the gpu in a common way. We don't use it full, so the driver will clock it down. But on the other hand, we often start a rendering call and directly wait for it which is just silly.

If you want, you can try this build: http://dl.dolphin-emu.org/wips/degasus-dolphin-master-3.5-2143+-x64.7z
There the order of efb2ram and efb2tex is switched, so the latency should be a bit lower for efb2ram. I've tested the opengl backend, the d3d backends are untested there.

skidau: as efb2ram doesn't upscale, we always read back the same amount of data.

edit: As threaded optimizating Auto->off didn't change anything, have you also tried to force it on?
(08-28-2013, 08:33 PM)degasus Wrote: [ -> ]If you want, you can try this build: http://dl.dolphin-emu.org/wips/degasus-dolphin-master-3.5-2143+-x64.7z
There the order of efb2ram and efb2tex is switched, so the latency should be a bit lower for efb2ram. I've tested the opengl backend, the d3d backends are untested there.
I've just tried your build and the official 3.5-2143. I get the same fps as before. Switching audio to HLE gives me a bit more fps sometimes.

I'm only worried about this because D3D9 is now deprecated, and I want to know why this doesn't happen in D3D9 and why it does in D3D11 and OpenGL.

Wow, I just tried in D3D9, EFB to RAM and HLE sound instead of LLE, and I get like 65fps LOL (but it sounds worse, of course).

Edit: threaded optimization doesn't help.
Hi, so some profilation later, I do have the timeline in ztp with efb2ram of one frame:
[Image: efb2ram-profile-2.png] Ok, not as easy to understand, but you should see some green "glReadPixels" in the "API Calls" row. This is the stalling work of efb2ram.

At the bottom, you see the gpu workload (both Transfers + Draw Calls) grouped into blocks in the row "GPU0". And all but the block 6016 are done while glReadPixels is waiting. The driver wants to group as much work as possible. But as we stall the gpu for every efb2ram copy, the rendering will be delayed until we want to get the efb copy.
Fancy fact: The last "Transfer" block while every glReadPixels is the readback from vram to ram.

I also dumped the list of glCalls:
[Image: efb2ram-profile-1.png]
Same result: glReadPixels is by far the "slowest" call. It isn't much work, but it have to wait for _all_ previous calls.
ChoosePixelFormat - WTF? But shouldn't matter at all.
btw: glGenTextures seems also to stall. I think I have to use a Pool here :-)
I've read it multiple times trying to understand what you said. My conclusion: the API call that takes more time is glReadPixels, which means "read a block of pixels from the frame buffer" (definition found on the internet). So that's why it's slower at higher resolutions, because it has to read more pixels. I wonder why it's so much faster in dx9 Undecided . Magic?
Pages: 1 2