Sorry to rez an old thread, but I didn't think it was appropriate to open a new thread for this.
I have heard so many reasons why dolphin will never utilize more then two cores, however, programming APIs have come a long way since dolphin began. Today it is possible to simply add in a few pragmas and run many of the math loops in parallel or even on the GPU.
As for the claim that performance increase would not be that substantial, I beg to differ. I have built my version of dolphin utilizing parallel pragmas where suggested by Intel compiler, and thrown in several of my own, in long or intensive loops, and the result of that was doubled performance in very heavy load games such as epic mickey.
To be exact, the optimizing for multicore brought my frame rate from a dismal 8 FPS in heavy load regions of Epic mickey (unplayable), to full speed in many areas. This is simply a result of performing loops and math operations in parallel where possible. I believe this should become a standard option (it could be a separate code path enabled from the hacks window).
I, however, am neither a coder nor a developer. I'll grant that some of the performance was simply a result of heavily optimized ICC code, however, even optimized code left mickey in an unplayable state without the addition of multicore support. In my builds current state it does have many synchronization issues, however, it is still very playable and mainly audio sync issues which were already present.
You can see the results of added parallel processing in the
Youtube video link posted with this. (when it's processed)
I would very much appreciate, either
a.) some assistance with addressing synchronization issues, as I just decided to tinker with my source at random about a week ago and found myself redoing bits and pieces and swapping out libraries for performance. Or,
b.) someone more skilled than I and more familiar with the design of the emulator to re-evaluate the assessment of dismal performance gains utilizing four cores rather than two.
OpenMP pragmas could be used where I used Intel pragmas (in the right hands) to recreate the gains in a 'portable platform independent' code for linux/OSX/windows users to equally enjoy.
Another optimization technique I'd like to try out for AMD cores is utilizing Open64 to compile (linux only unfortunately QQ).