Dolphin, the GameCube and Wii emulator - Forums

Full Version: LLE is not much slower in WII games ! ( lle benchmark thread )
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13
I can confirm Shonumi's results that something is wrong with LLE in latest revison.
HLE performs the same in 3.0-776 as in 3.0,however LLE have pretty significant perfomance hit going from 3.0 to 3.0-776.
I have already posted my results in SMG2 in Yoshi Star with 3.0 776 where with LLE I got 86 fps.
In exactly same spot,with exactly same settings ( I even disabled fast mipmaps so everything is the same ) I get 97 fps with 3.0.
Thats 11 fps more !
But,there is only 1 fps more if using HLE....

[Image: iEHz4l.jpg]

[Image: VH7IHl.jpg]
[color=#FF0000]Resident Evil 0 GC EUR (GBZP08) using 60Hz mode on HLE[/color]

[Image: 61708052.jpg]

[Image: 68041228.jpg]

[Image: 31796171.jpg]



[color=#32CD32]Resident Evil 0 GC EUR (GBZP08) using 60Hz mode on LLE with LLE on THREAD OFF[/color]

[Image: 74797128.jpg]

[Image: 45684611.jpg]

[Image: 24199278.jpg]



[color=#9370DB]Resident Evil 0 GC EUR (GBZP08) using 60Hz mode on LLE with LLE on THREAD ON[/color]

[Image: 88237387.jpg]

[Image: 69769310.jpg]

[Image: 49493349.jpg]
Shonumi,are you going to share that interesting discovery you mentioned ? Smile
Sorry, I am not a kid anymore, so work kind of sidetracks a lot of things now Smile I wish I could spend more time with Dolphin though.

Anyway, I'll post an update later tonight (hopefully I'll get off early). Basically, I think I've found certain areas in SMG2 where my GPU becomes the bottleneck for my system (yup, even on the lowest settings) and this of course causes the FPS to become relatively even with HLE and LLE audio. Note, these are what I call temporary GPU bottlenecks, since they only seem to happen in specific places. I'd like to have you guys test some spots that I'll post later. I'm curious if other games could exhibit temporary GPU bottlenecks in other places.
I think rather the GPU thread becomes the bottleneck instead of the DSP or CPU thread. The GPU I don't think becomes the bottleneck, but the GPU thread does. That is what rodolfo's recent commit addresses (D3D only for now).
(10-24-2012, 06:33 AM)Shonumi Wrote: [ -> ]Anyway, I'll post an update later tonight (hopefully I'll get off early). Basically, I think I've found certain areas in SMG2 where my GPU becomes the bottleneck for my system (yup, even on the lowest settings) and this of course causes the FPS to become relatively even with HLE and LLE audio. Note, these are what I call temporary GPU bottlenecks, since they only seem to happen in specific places. I'd like to have you guys test some spots that I'll post later. I'm curious if other games could exhibit temporary GPU bottlenecks in other places.

Something related to GPU is certanly bottlenecking I knew this for long time-just look at my xenoblade benchmark where I was able to run it above fullspeed with 1x IR and when increasing IR or AA speed would certanly drop a lot.

I already tested the spot you mentioned in yoshi star...difference is huge between 3.0 and 3.0-776 ( havent gotten to testing latest build with speed improvements )

So,if there are gpu bottlenecks,they are not present in 3.0,only in latest revisions ???

(10-24-2012, 06:36 AM)Axxer Wrote: [ -> ]I think rather the GPU thread becomes the bottleneck instead of the DSP or CPU thread. The GPU I don't think becomes the bottleneck, but the GPU thread does. That is what rodolfo's recent commit addresses (D3D only for now).

What you mean to say is that cpu is still bottlenecking while working on gpu thread ?
Quote: I think rather the GPU thread becomes the bottleneck instead of the DSP or CPU thread. The GPU I don't think becomes the bottleneck, but the GPU thread does. That is what rodolfo's recent commit addresses (D3D only for now).

Wouldn't the emulated GPU thread only be bottlenecked if your hardware can't keep up with its demands, assuming the emulated CPU thread is not bottlenecking anything? If you could explain this for me, I'd appreciate that, since I'm more familiar with the DSP side of Dolphin's emulation.

@rpglord - There are different spots that I have not posted yet, where the FPS drops roughly the same for both HLE and LLE. There is a lot drawn in the background when Mario is standing in these spots. Moving away from them returns the FPS to their previous levels, so this leads me to believe it's GPU-related.
I'm not an expert on it myself, but I know that thread is the problem since I got a gigantic speed boost with 784 on D3D9.

The GPU thread on the CPU decodes GPU instructions and sends them to the actual GPU to run. Before 784 (and currently with OpenGL) the vertex buffer is emulated using an array, and calculations on it are done on the GPU thread. In SMG, this results in pretty big slowdowns.
Ah yes, decoding GPU instructions must have been what NV was talking about recently, that not everything on the emulated GPU thread runs as a shader. That makes sense. This seems like CPU-bound performance, so theoretically, overclocking (and underclocking for that matter) would affect the performance of the decoding. In the areas I noted, any IR above 1x resulted in some loss in framerate, while in every other area, the framerate doesn't decrease until I got to 4x. That's why I want you guys to check this out as well on the latest revs with DX9, to see exactly where the bottleneck occurs.

The only reason I think this is relevant to this thread is due to the fact that these areas are the only points in my testing so far where LLE and HLE audio came really close to each other (like less than 5%). I'd like to know why they occur and if they occur in other games. I'm thinking of coming up with a good method of finding these types of spots, as they should be just as easy to detect with either HLE or LLE. So yeah, pictures coming of those aforementioned spots once I switch computers.
I wouldn't really call it "decoding". That's either a gross oversimplification or completely wrong depending on how you look at it. Probably the first since you could kind of consider the first part of what the video thread does to be decoding. Regardless of that it isn't very demanding and the bulk of the work being done by it is actually carrying out the required behavior. The video thread is responsible for emulating the GC/Wii GPU. It does a lot of different stuff to emulate different things. Registers for example have to be kept track of using variables (which are obviously held on the cpu side). Some of things it does runs on the cpu side via c++ code, and some of it runs on the GPU side via HLSL, CGSL, or GLSL (there is even some GPU assembly stuff in there for some reason). Most of the stuff that can be efficiently hardware accelerated obviously runs on the GPU side. The thread spends a fairly significant amount of time just moving big data structures between main memory and video memory, and it stalls whenever this happens. The commit axxer is referring to introduced multiple vertex buffers which allows the GPU to work on data from an older buffer while filling a new buffer from a draw call. The result is increased parallelism between the CPU side and GPU side of the video thread and therefore less stalling. In situations where the video thread spent a lot of time stalling on draw calls it can result in a significant performance improvement.

I really don't want to read through this thread and start correcting every little error I find and commenting on everything else since I'll be here all night. Not to mention the last time I tried to explain how the cpu, gpu, and the three threads were all related to performance it ended up being a long and futile gesture. So could someone summarize for me what exactly the new "issue" that you're investigating is?
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13