Dolphin, the GameCube and Wii emulator - Forums

Full Version: [PATCH] DSP LLE faster masked math
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

CruDeLioS

Can anyone compile this new patch, please?

I would do it myself, but I am being drowned by dozens of linker errors Sad
My understanding is that more sounds just means more work for LLE to do. In that sense making it faster would require making LLE faster.

Here is a build with the last patch I posted:
6482m x64 (with lle comp4 patch)


I think that LLE on thread should be improved too. That option would be awesome for us with less-powered rigs, but with three or quad cores, still with one or two spare cores to use that remain idle when Dolphin is in use.
(11-25-2010, 03:55 PM)Mylek Wrote: [ -> ]There was a pretty big bug in my jit dec code that should be fixed now with this version.

Without the mul hack it should sound identical to the svn original version if it's working correctly. With zelda for me it seems to hang if I try to use jit + lle on thread with or without the patch.

it is extremely likely that the hardware does not do the loops here, but instead just checks once:
Code:
+        while ((s16)ar > (s16)mask)
+            ar -= wr + 1;
+        while ((s16)ar < (s16)(~wr & mask))
+            ar += wr + 1;
i know that the old code works like that, but i think nobody actually checked what happens when multiple wraparounds would be needed.

By the way, i have my own little implementation of the increase/decrease:
Code:
inline u16 dsp_add_addr_reg(u16 ar, s16 ix, u16 wr) {
       u16 tmb = ToMask(wr);
       u16 nar = ar + ix + ((ix<0)?wr+1:0);
       if ((nar ^ ar) & ~tmb)
               nar -= wr+1;
       return nar;
}
And it is wrong for the wr == 0(wr < ix?) case (at least), like most of the implementations. (If the comments in Source/DSPSpy/tests/[id]r_test.ds are correct)

I'd really like to have some more tests done with ix > wr, like wr == 0, ix == 2, 3 and wr == 2, ix == 8.
Nice.

If we can ignore the case of multiple wraparounds then I think your code is a better approach. Thinking about it, it could be optimized further to even work without ToMask which should give a big speedup:
Code:
inline u16 dsp_add_addr_reg(u16 ar, s16 ix, u16 wr) {
   u16 nar = ar + ix + ((ix<0)?wr+1:0);
   if (((nar ^ ar) & (wr << 1)) > wr)
           nar -= wr+1;
   return nar;
}

Doing some pseudocode this would reduce the entire add fuction to ~15 assembly instructions.

Code:
MOV AX, ar
MOV BX, ix
MOV CX, wr

LEA DX, [AX + BX]

TEST BX, BX
JNS noadd
LEA DX, [DX + CX + 1]

noadd:
XOR AX, DX
LEA BX, [CX*2]
AND AX, BX
CMP AX, CX
JBE done
SUB DX, CX
SUB DX, 1

done:
MOV ret, DX

If the logic is sound it could be used to simplify all the other masked functions.
Implemented the minimal versions of the functions without ToMask based on the above code. Seems to work without any hitches from brief testing. The weakness with this implementation is the add/sub can go out of bounds if ix > wr but I'm not sure if this ever happens.

Then again, since we don't have any testing data on hardware for when ix > wr this could even be correct behavior.

[attachment=4722]
Mylek, any chances that this patch will be included to the official release since you are a dolphin developer? I am especially curious about what Xtreme2damax said that it fixes "garbage noise, robotic audio and static from before" (they made LLE unusable imo). Even if it lowers compatibility, those screeching high pitched noises make compatibility very low as things are now, can't see how this would make things worse.
There is still some relatively minor static with some games (Could of swore this was all gone except for one game with a minor issue, maybe the latest patch?), however it is more audible and clear. Just about all of the garbage noise, static, crackling and screeching is gone though.

In regards to performance improvements, as I said the real killer to performance is when multiple samples/effects are occurring at one time, these can decrease FPS by 10 - 25 or more depending on what is happening.

By the way, my latest build has the patched LLE plugin but it isn't the latest patch:

http://www.xtemu.com/forum/files/category/25-dolphin-svn-builds/
(11-29-2010, 08:44 AM)Mylek Wrote: [ -> ]Then again, since we don't have any testing data on hardware for when ix > wr this could even be correct behavior.

http://home.amis.net/mpuljar/dolphin/AR_crap.7z

here you have bunch of test data (dol-s, and wii results)
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20