I did some testing, and for me the performance constantly goes up and down when testing different builds. You have to consider that sometimes game X gets faster by a change, while game Y gets slower. This is most likely what happens in those 2 d3d changes.
In New Super Mario Bros its's always something between 85 and 98 fps for me, with 4.0-5378 being the fastest from various builds tested.
My system:
Pentium G3258 @4.2 Ghz, Radeon HD 7790, 8 GB RAM 1600, 7-8-8-24, Windows 7 64bit, Aero disabled
My test:
New Super Mario Bros, D3D, 3xIR, 1xAF, no AA, no vsync, efb2ram, World 1-4 1st screen
On Dolphin 4.0-5319 i'm getting 95 fps, and on Dolphin 4.0-5315 i'm getting 98 fps. So there might be something going on, but it doesn't make any sense. And it's far from the 25% you are reporting, so i can't confirm your issue.
Anyways, if you just want to play New Super Mario Bros:
Find the TextureCache.cpp file in the source code(the d3d one), change the line:
if (!g_ActiveConfig.bCopyEFBToTexture)
to:
if (!g_ActiveConfig.bCopyEFBToTexture || bpmem.copyMipMapStrideChannels == 256)
compile, and start the game in EFB to Texture. For me it's working nicely with spinning coins at 135 fps.
(01-31-2015, 09:05 AM)mimimi Wrote: [ -> ]I did some testing, and for me the performance constantly goes up and down when testing different builds. You have to consider that...
(01-31-2015, 09:05 AM)mimimi Wrote: [ -> ]On 4.0-5319 i'm getting 95 fps, and on 4.0-5315 i'm getting 98 fps. So there might be something going on...
Are you aware that there's another issue / variable (+/- 5% difference between runs when using the exact
same build). You're probably experiencing that one.
BTW, about PR #1952: Why was ths commit closed? It looked promising, giving a major speed boost to many demanding titles without any side effects, but it had one nasty issue: at IRs higher than native, the textures were downgraded / weren't scaled properly.
There's some work on making efb copies that are paletted textures work in efb2tex. If that works, it will make efb2ram obsolete for some games. After this is done, or proven to be impossible, i'm going to take another look at making efb2ram faster.
And without those "paletted efb copy games", PR 1952 will affect too few games with too little difference to be merged. You have to keep in mind that it's a hack, it might break things in unexpected ways, or at least subtle ways. The gains just wouldn't outweight the support trouble it would cause.
PR1952 made nsmb 10% faster, the above hack makes it go 35% faster. On the Last Story, it didn't work, but i have a hack for it, that makes it run at almost efb2tex speed(around +50%). The hack is posted in the last story thread. Sadly both hacks are worse than PR1952 and are highly unlikely to be merged to master ever.
About low res textures with PR 1952, i'm currently not working on it, but i don't remember this issue. Low res textures when using efb2ram happen, when Dolphin can't find the texture in the cache(nsmb spinning coins, old paletted efb copy code), or if there's a hash problem(new paletted efb copy code + efb2ram). I'm not sure if your cpu supports crc32 natively, so your system might use a different hashing algorithm than mine. Set texture cache to safe to remove this from the list of possible reasons for low res textures.
@ Tino: Thanks for the test. So, the latest AMD FX CPUs are not affected and perform just ike the latest Pentium / Core iX series (this is not surprising, since they support modern SSE4.1 and SSE 3.1 instructions which are used in Dolphin's vertex JIT and in other places for a nice speed boost).
Now what happens when you run Dolphin on an older CPU without SSE 3.1 support? *That* would be interesting. What makes the newer builds (after those commits) so slow?
Is there some kind of issue that makes Dolphin fall back to using 'no optimizations at all' instead of 'some optimizations' as before?
It would be nice to have a custom build based on the latest 'master' or 'Ishiiruka', but with the 3 commits (PR #1817, PR #1503 and PR#1414) reverted. Just for testing.
(01-31-2015, 10:01 AM)mimimi Wrote: [ -> ]PR1952 made nsmb ~10% faster.
The speed boost was much higher than that (>20% @
6xIR).
Bulldozer was released in 2011. It's far from "latest".
(02-01-2015, 03:34 AM)delroth Wrote: [ -> ]Bulldozer was released in 2011. It's far from "latest".
Fixed (previous post updated).
(01-31-2015, 10:01 AM)mimimi Wrote: [ -> ]About low res textures with PR 1952, i'm currently not working on it, but i don't remember this issue. Low res textures when using efb2ram happen, when Dolphin can't find the texture in the cache, or if there's a hash problem(new paletted efb copy code + efb2ram). Set texture cache to safe to remove this from the list of possible reasons for low res textures.
Even with this line added to the .ini fle(s):
Code:
[Video_Settings]
SafeTextureCacheColorSamples = 1024
the textures are still low-res.
(01-31-2015, 10:01 AM)mimimi Wrote: [ -> ]nsmb (D3D, 3xIR, efb2ram, W1-4):
On Dolphin 4.0-5319 i'm getting 95~98 fps. The latest dev build is pretty fast.
Anyways, if you change that line, compile and start nsmb with EFB2Tex, you'll get a big speedup. Works nicely here at ~135 fps.
Try
W2-4 (4P) at
6xIR for a change. You'll see what's the opposte of fast

Stable 60fps (min., not avg.) is almost impossible.
(02-01-2015, 03:12 AM)kirbypuff Wrote: [ -> ]@ Tino: Thanks for the test. So, the latest AMD FX CPUs are not affected and perform just ike the latest Pentium / Core iX series (this is not surprising, since they support modern SSE4.1 and SSE 3.1 instructions which are used in Dolphin's vertex JIT and in other places for a nice speed boost).
Now what happens when you run Dolphin on an older CPU without SSE 3.1 support? *That* would be interesting. What makes the newer builds (after those commits) so slow?
Is there some kind of issue that makes Dolphin fall back to using 'no optimizations at all' instead of 'some optimizations' as before?
It would be nice to have a custom build based on the latest 'master' or 'Ishiiruka', but with the 3 commits (PR #1817, PR #1503 and PR#1414) reverted. Just for testing.
(01-31-2015, 10:01 AM)mimimi Wrote: [ -> ]PR1952 made nsmb ~10% faster.
The speed boost was much higher than that (>20% @ 6xIR).
Have you tested the issue In Ishiiruka, pr 1817 is not merged, vertexloader jit fall backs to precompiled loaders, so the the only posible causes for your issue could be #1503 or #1414.
"Extensive" nsmb benchmark (UPDATED)
------------------------------------------------------
Legend:
title / cake / menu1 / menu2 / w2-overview / w-select / w1-overview / w1-1 / w2-4 / w2-[#] / w4-overview
NOTE1: D3D, 6xIR, no AA, 1xAF, EFB2RAM.
NOTE2: These are stable / minimum framerates, not average.
NOTE3: Demanding parts are shown with an asterisk (*).
Master (4.0-5390) with the perf. regressions:
57 / 48* / 64 / 73 / 49* / 37* / 45 / 49* / 34* / 35* / 38*
Older Master (4.0-5315) (before the latest performance regression):
61 / 58* / 65 / 74 / 54* / 41* / 51 / 50* / 35* / 40* / 42*
Older Master (4.0-4575) (before the 3 performance regressions):
67 / 62* / 65 / 75 / 56* / 42* / 53 / 51* / 35* / 41* / 44*
PR Test : Async Events (4.0-5390 with PR #1992):
61 / 58* / 65 / 74 / 54* / 41* / 51 / 50* / 35* / 40* / 42*
PR Test : EFB2RAM SpeedBoost (4.0-5254 with PR #1952):
70 / 64* / 71 / 82 / 59* / 43* / 54 / 62* / 42* / 44* / 45*
Latest Ishiiruka-250:
74 / 60* / 55 / 64 / 52* / 41* / 49 / 59* / 34* / 44* / 39*
Older Ishiiruka-215 :
79 / 64* / 56 / 64 / 54* / 42* / 50 / 61* / 35* / 46* / 41*
Experimental Ishiiruka-250+:
87 / 64* / 56 / 64 / 51* / 40* / 47 / 65* / 33* / 47* / 39*
--------------------------------------------------------------------------
EDIT: * Fixed * in Master 4.0-5402:
61 / 58* / 65 / 74 / 54* / 41* / 51 / 50* / 35* / 40* / 42*
---------------------------------------------------------------------------
Just tried the latest dev build (4.0-5408 as of this writing) and guess what - the regression is gone !
Fixed in 4.0-5402 (4.0-5400 still has the issue).
Hard to believe a PR like 'map pad buttons to hotkeys' could fix such a stupid bug.
Performance is now back to normal [as with 4.0-3315 and older].
That's 1 regression down, 3 more to go (2 for D3D, 1 for OGL).