Yes, the CPU thread. I actually don't think that is the cause of the slowdowns in SMG though. It is almost all audio related. Maybe with HLE the CPU thread is the problem, but I think it is the HLE/GPU thread. Is there any way to test this?
With HLE audio the cpu thread is almost always the primary bottleneck. With LLE audio (not on thread) the video thread has a much greater chance of becoming the primary bottleneck. With LLE on thread I have no idea.
If there isn't much of a difference in performance between HLE and LLE it might be because the cpu thread is acting as the bottleneck most of the time. However if that were the case you would expect to see the opposite behavior of what we're seeing.
It looks like as video thread load increases the cpu thread and DSP thread performance is becoming less of an issue. Which I guess makes sense. If we had a developer here to talk about the synchronization done between threads that would help this investigation a lot.
I know that LLE audio is tightly synchronized with the cpu thread even when it's running on a separate thread. The main reason that LLE on thread should be boosting performance is because it allows the video thread to do work while the DSP emulator is doing work. But once again this defies what I'm seeing here......unless.....
What I really need from you guys now is benchmarks with LLE DSP emulation but with LLE on thread off using the same scenarios that you did with HLE and LLE on thread.
The general GC/Wii architecture is that the CPU uploads a program (microcode) to the DSP which the DSP processes in parallel to the game code that the CPU executes. The CPU sends graphics commands to the GPU via the FIFO in a stream.
The DSP program can be written in a couple ways that I have seen.
The first way is fully asynchronous, which is where both the DSP program and game code on the CPU run continuously in parallel. The DSP and CPU talk to each other by querying their hardware registers to check the status each other is at, or via DMA for quick memory copies. DMA might be used for streaming music from the CPU to the DSP, for example.
The second way is synchronous where the CPU gets the DSP to execute a bit of audio then wait for more commands from the CPU. The CPU might also wait for the DSP to finish before continuing. The DSP and CPU usually use interrupts when this way is used.
Next, the GPU will pause if there are no more commands to process in the FIFO or if a explicit pause (called a "breakpoint") command is encountered. The CPU can also potentially pause while GPU commands are being executed. This might be done to make sure the GPU doesn't overflow with graphics commands (FIFO overflow).
So, how does this translate to Dolphin code?
Keep in mind the DSP runs at a clock rate that is 6 times slower than the CPU. So, right off the bat, the CPU thread needs 6 times more processing time than the DSP. Due to heavy optimisation in the JIT, the CPU thread in practice uses around 3 times more processing time. Therefore, the DSP thread is never the bottleneck. If "LLE on thread" is disabled, both the CPU and DSP processing will be on one core which is slower than having the CPU thread on a core by itself. Secondly, if the game is using the synchronous method above, both the CPU and DSP threads will be spending a fair bit of time waiting and slow down the overall emulation again.
Finally, the GPU emulation can slow down the emulation again as the GPU thread can cause the CPU to momentarily pause while graphics commands are completed.
Moral of the story? Dolphin is slow, but not necessarily because the dolphin developers can't code.
Holy mother of god. I got skid to post. I can now claim to be a miracle worker.
I had always thought that DSP emulation was done on the video thread not the cpu thread.
Quote:the CPU thread in practice uses around 3 times more processing time. Therefore, the DSP thread is never the bottleneck.
I also didn't know that. This is why you guys should start writing blogs and participating with the community more.
Quote:Moral of the story? Dolphin is slow, but not necessarily because the dolphin developers can't code.
Nobody ever said that. We all respect that you and the rest of the developers do. Between you, tino, and marcos the three of you were adding more miracles to the emulator per day than we could keep up with back when the three of you were active.
And I read in your commits you still have more miracles to work on the GPU thread 0_o
Thats very interesting info skid

Altrough I am confused with one thing you said,that dsp can never be bottleneck,in my understanding it would mean that HLE and LLE should always have exact same perfomance since they cant be bottleneck,there is no reason LLE should ever be slower then HLE,not even a little,which clearly is not the case

Shonumi,thats very interesting results you got. I read somewhere by one of dolphin's devs ( cant remember who ) that lle on thread is not effective in SMG games for reason I forgot

Maybe someone can remember this to help us figure out why exactly lle on thread is slower in SMG2.
Also,we should definitely move to testing atleast 3.0-784 version. I just tested it a little bit and perfomance difference is huge in SMG2 ( around 20-30% speedup for me )
It might also fix these gpu bottlenecks you see.
I just didnt have the time to test it a lot and provide screenshoots so far...
(10-25-2012, 07:46 PM)rpglord Wrote: [ -> ]that HLE and LLE should always have exact same perfomance since
HLE is a re-write of the DSP hardware and microcode combination using less instructions. That is why HLE is faster.
(10-25-2012, 07:46 PM)rpglord Wrote: [ -> ]one of dolphin's devs ( cant remember who ) that lle on thread is not effective in SMG games for reason I forgot 
SMG is using the synchronous method. Since there is an overhead with thread communication, the overall result is slower.
(10-25-2012, 01:55 PM)NaturalViolence Wrote: [ -> ]I had always thought that DSP emulation was done on the video thread not the cpu thread.
Wait wut how could you not know that? Even
I knew that, and I could have sworn you were the one who told me XD
Wait... NV failed? OMG, end of the world
