Dolphin, the GameCube and Wii emulator - Forums

Full Version: Dolphin Progress Report
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Will we soon be seeing the October progress report? I find these reports very entertaining.
[Image: Progressreportheader-October2014Mini.jpg]

https://dolphin-emu.org/blog/2014/10/31/...ober-2014/

The October Dolphin Progress Report is live! Feel free to discuss this month's update below.
"Disabling BAT resolution results in a 10% speedboost in MMU titles"
Sorry but that must be absolute bull-honkey. The reasoning behind disabling the BAT resolution code was that regular memory accesses weren't hitting it (thus no speed-up by disabling it), and there's no way page-fault handling should account for anything close to 10% of execution time let alone a much larger percentage that would allow the BAT resolution explicitly to account for 10%.
No mention to GameTDB's title.txt support? =/
I measured quite a bit more than 10% actually; a full ~1/3 of the time spent in the MMU was spent in BAT handling, because it would proceed to check all of the BATs in a loop and none of them would apply (and repeat this check on every single MMU access). In the Rebel Strike intro easily half of the CPU-thread runtime is spent in MMU/TLB handling (it's that inefficient).

Now, yeah, there are definitely ways to make the code way faster without just turning it off (I wrote a patch to make it faster, even), but since no currently-working games actually used it we kinda figured it'd be better to just turn it off for now.
Right, but that's time spent in the MMU which is (/should be) a small portion of overall execution time; not a 10% speedup overall.
If Rebel Strike is spending that much time handling page-faults you really should fix the recording bits in the PTEs, because it shouldn't be anywhere near that much.
"Time spent in the MMU" is every single time a memory access is made to a location that isn't in the default BATs or requires page translation, which can easily be an average of like ~50 clock cycles per load or store. It's not just faults, it's all accesses to virtual addresses, as gross as that is.
But how many of those accesses are faulting when they shouldn't be (due to the page replacement algorithm malfunctioning), causing instructions to be replayed (resulting in another call to TranslateAddress) when they should actually hit the TLB the first time without even needing a hashed lookup?
I don't *think* that's an issue? I don't remember measuring many faults -- It's just that even going through to the TLB is a huge cost when you're doing it so often. Checking all of the BATs could easily take 20-30 cycles, and you have to do it tens of millions of times every second. I mean, this is calling out to multiple layers of C code for every single load or store instead of executing it with a single instruction; the cost is kind of enormous no matter what happens in the C code.
Can you generate a log of faulting addresses (when TranslatePageAddress returns 0) and their requested access flags, combined with the addresses passed to InvalidateTLBEntry (assuming it's called by tlbie instructions) ?