Dolphin, the GameCube and Wii emulator - Forums

Full Version: Might Dolphin profit from another Thread?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi All,
i have been looking trough the Dolphin Source Code for some time now and it's still a bit crazy (at least for a Java Developer ;-))

Anyway i think Dolphin might profit from another Thread that splits the current GPU Part in2. Right now the GPU Part is clearly the bottleneck. However it might be possible to isolate the GPU OpDecoding cache its output and give it to a new Thread that contains the actual GPU Plugin (such as OGL or DX9). As i have no expirience with OpenGl or DirectX i can hardly guess the benefits but even Dual Cores should get better results as they can use their cores more evenly.
(04-30-2009, 05:24 AM)lenny12 Wrote: [ -> ]Anyway i think Dolphin might profit from another Thread that splits the current GPU Part in2. Right now the GPU Part is clearly the bottleneck. However it might be possible to isolate the GPU OpDecoding cache its output and give it to a new Thread that contains the actual GPU Plugin (such as OGL or DX9).
AFAIK the ATI/NVIDIA drivers already multi-threaded their drivers, so you won't get much speed boost if you isolate the GL/DX calls in their own thread. I am pretty sure that right now the emulator is CPU bound, since most modern cards can handle GC/Wii graphics quite easily. In that case the challenge would be to optimize the JIT compiler to generate faster native code and also generate it fast.

A possible way of taking advantage of multi-core PCs could be to separate the block compiling from the cpu thread. The CPU thread could interpret all code by default, but send requests to compile blocks that run more than once. This way the code generator can try to make more optimized code without the speed loss being noticeable, since it runs as a separate thread. It is also scalable, if you have a quad core machine, you can create more block compiling threads and alternative sending requests to them.
A similar technique is used for the IBM JVM, you can read the publication here, under "Selective Compilation".
Well i thought more about Games like SSBB where the CPU Part is Idle Skipping most of the Time. Right now the core with the GPU part is at 100% and the other one just uses 50% for me.

Also as far as i know the Opcode Decoding has nothing to do with the JIT used in the CPU Thread.
i heared from one of the coders that the GPU part is the bottleneck atm
i just cant remember who it was... maybe ector or f|res himself ^^
Well, I think that it would probably help speed, but dual core is already not stable due to synchronization and memory locks and such. if you're decent at code, I would suggest looking into stabilizing the current dual core solution before creating an additional fail point...

However, maybe you could do what you're talking about... split the GPU into two threads and then remove the current dual core option, then perhaps the synchronization problems will dissolve and it will still have the speed that the current dual core solution has.

I don't know much about it... I can read the code and kind of understand it, but I'm a long way from signing up to help.
I've done some tests to check this. It seems that most games are CPU bound unless you have a dual core with 2.5Ghz+, in that case you won't be bound at all (unless your graphics settings are set high). Trying SSBM and MKDD for example, I notice an almost linear drop in FPS as I switch both of my cores' frequencies from 2.2 GHz, to 1.6, 1.2, and .8 GHz which shows that the requirement for perfect CPU emulation speed is somewhere around 2.5Ghz dual core.
I think that some functions that are calculated by the GPU on a real hardware are not calculated by GPUs while using dolphin (maybe T&L for example...).

i was talking about the emulated GPU.
it is already clear that you need a fast CPU for dolphin. but this is because the gpu-part is not well optimized.

just commpare gamecube and wii games.

both run about the same speed on my machine. but the wii's clockspeed of the CPU is about 66% higher - without an impact on speed.
so for me it is clear that it must be the gpu-part that needs further optimization (or better implementation in openGL)
this could be the reason too, why the direct-x-plugin is so much faster than the opengl.
You should be aware that the GPU used in the GC or WII is incompatible to most if not all the operations provided for OpenGL or DirectX. So the GPU Threads main purpose is to translate Textures Vertrices and whatever to a Format that can be used by OGL and DX. The Graphic Plugins gets their Data from the OpEncoder through an interface. (At least it should but all the tweaks and hacks... sigh;-( So we could just in theory create an extra thread for the Video Plugin. However we just get a decent speed gain if calls to the Graphic Plugin are expensive, enough. So that they at least cover the expenses for chacheing the output of the Op Encoder...