Dolphin is more like a pipeline than true parallelism. It can do several things in parallel but if something is running slower than something else it will stall the other "stages" and slow everything down.
I don't have time to elaborate on how everything works so I'll just try and explain your results.
In the first example:
Quote:****efb copies to texture:
Ir x1___________223fps
Ir x4___________220fps
Ir x2 SSAA x9___99fps
We see that the framerate is going down significantly when IR goes up. The logical conclusion to draw from this data is that the cpu thread is spending lots of time stalled waiting for the video thread because the fifo buffer is full and the video thread is spending lots of time stalled waiting for the gpu to complete its shaders. The result is a "gpu bottleneck". In other words the gpu is not keeping up with the rest of the system and is acting as the weakest link. The amount of time spent stalled is going up as IR goes up because the shaders take longer to complete, more time stalled = lower framerate.
Quote:****efb copies to ram (cache):
Ir x1___________128fps
Ir x4___________85fps
Ir x2 SSAA x9___55fps
Now efb copy to ram has been turned on. This makes the stalls even longer as the cpu and video/audio threads not only have to wait on the gpu but also have to wait on the memory transfers and texture/ram encoding/decoding. There are now two sources of stalling in the video/audio thread. The gpu is still the bottleneck and so the IR still goes down as IR goes up. Note that the stall introduced by ram copy emulation is constant and does not scale with IR. If it did we would see a more exponential decay of framerate (think 128 fps -> 32 fps -> 2 fps).
Quote:****efb copies to ram (no cache):
Ir x1___________67fps
Ir x4___________53fps
Ir x2 SSAA x9___36fps
The situation gets even worse when the cache functions are disabled. With the cache option turned off the texture copies are updated every single time they are used instead of only when changes are made to the ram copies. Thus the stalls from efb copy emulation are more frequent and therefore framerate is even lower. This does not change the fact that the gpu is still producing stalls as well and therefore higher IR still decreases framerate.
Quote:****hle audio
Ir x1___________87fps
Ir x4___________82fps
Ir x2 SSAA x9___55fps
****lle audio (not on thread)
Ir x1___________73fps
Ir x4___________70fps
Ir x2 SSAA x9___50fps
Remember how I said video/audio emulation where done by the same thread? Well you have now introduced a THIRD stall. Since the LLE dsp emulator runs in sync with the cpu thread the cpu thread stalls whenever the lle dsp emulator is doing any sort of work. Framerate is now very low as there are three sources of cpu thread stalling. In other words:
1. CPU thread does some work
2. CPU thread waits for video/audio thread to finish shaders (stall, scales with IR)
3. CPU thread waits for video/audio thread to emulate efb copies (stall, depends on how many efb copies the game needs, how big they are, how often the game engine uses them, and what the game engine does with them)
4. CPU thread waits for video/audio thread to finish emulating audio with the lle dsp emulator (stall, depends on how many sounds are running at once and what the game engine is doing with them)
5. When all of this is done the cpu thread can do some more work, so we go back to stage 1
The above sequence is not how dolphin actually works, it's just to illustrate some of your performance bottlenecks.
Note that with dsp lle on thread the cpu thread still stalls when dsp lle is doing work but the video thread can now be doing work while the audio thread is doing work, which it could not before. In other words stages 2 and 4 can run in parallel, potentially improving performance.
Stages 1, 2, and 3 can run in parallel only if the cpu thread is the bottleneck (the fifo buffer isn't filling up).