Part 3: AX HLE in Dolphin, previous vs. new
DSP HLE was developed at a time when people did not know much about how the Gamecube DSP worked. It was basically a hack to have sound in games, and more hacks were added on top of that hack to try and fix bugs. The AX UCode emulation was probably the most hacky thing in the DSP HLE code. For example, some of the code that was used looked like this:
Code:
// TODO: WTF is going on here?!?
// Volume control (ramping)
static inline u16 ADPCM_Vol(u16 vol, u16 delta)
{
int x = vol;
if (delta && delta < 0x5000)
x += delta * 20 * 8; // unsure what the right step is
//x += 1 * 20 * 8;
else if (delta && delta > 0x5000)
//x -= (0x10000 - delta); // this is to small, it's often 1
x -= (0x10000 - delta) * 20 * 16; // if this was 20 * 8 the sounds in Fire Emblem and Paper Mario
// did not have time to go to zero before the were closed
//x -= 1 * 20 * 16;
// make lower limits
if (x < 0) x = 0;
//if (pb.mixer_control < 1000 && x < pb.mixer_control) x = pb.mixer_control; // does this make
// any sense?
// make upper limits
//if (mixer_control > 1000 && x > mixer_control) x = mixer_control; // maybe mixer_control also
// has a volume target?
//if (x >= 0x7fff) x = 0x7fff; // this seems a little high
//if (x >= 0x4e20) x = 0x4e20; // add a definitive limit at 20 000
if (x >= 0x8000) x = 0x8000; // clamp to 32768;
return x; // update volume
}
I don't even know how this code evolved to become what it is displayed here, I just know that it is not the good way to implement AX HLE. Also, some of the design choices in the previous implementation just couldn't allow for accurate HLE. The biggest issue was the timing on which AX HLE was working. On real hardware, the DSP runs on its own clock. At some point the CPU sends commands to it, it processes all of these commands as fast as possible, and sends a message back to the CPU when it's done. The CPU copies the processed data, then when it needs more data (in most cases, 5ms later) it sends new commands to the DSP. In the previous AX HLE implementation, none of that was right. Basically, what the emulated AX did was:
- As soon as we get the command that specified the sounds that should be mixed, copy the sound data address somewhere.
- Every 5ms send a message to the CPU saying that we processed the commands (even though no commands were processed)
- When the audio backend (ALSA, XAudio, DirectSound) requires more data, AX HLE mixed the sound and returned audio data.
Basically, nothing was right in the timing. That implementation allows for some cool hacks (like having the audio running at full speed even though the game is not running at 100% speed), but it is inaccurate and bug-prone.
When trying to fix the "missing instruments" bug I noticed all these timing issues and thought about rewriting AX HLE (once again... I always wanted to rewrite AX HLE every time I looked at the code

). The hack fix (re4d18e3a8b7c) that I found to compensate for the timing issues really did not satisfy me, and knowing more about AX HLE I noticed that rewriting it was actually not as hard as I thought it would be. After working for 24h streight on new-ax-hle, I finally got a first working version which had ok sounds and music in Tales of Symphonia.
The design in new-ax-hle is in my opinion a lot better than the design used in the previous AX HLE:
- A DSP Thread is created when the UCode is loaded. This thread will be responsible for all the sound mixing work the DSP does.
- When we get commands from the CPU, we copy the command list to a temporary buffer, and wake up the DSP Thread to tell him we have commands to process.
- The DSP Thread handles the commands, sends a message to the CPU when it's done, and goes back to sleep.
It is basically the exact same model DSP LLE on Thread uses, with less synchronization (LLE tries to match the number of cycles executed on CPU and DSP, which causes some extra performance hit). This also kind of matches what happens on the real hardware, using 2 chips instead of 2 threads. However, this also means the audio processing speed is tied to the CPU speed: if the CPU cannot keep up, it won't send commands often enough and the audio backend won't receive enough data to avoid stuttering.
Another change, this time not exactly linked to overall design, is that the new-ax-hle now handles most AX commands instead of only the one specifying the first parameter block address like the old AX does. Some of these other commands are used to set up global volume ramping, send data back to the main RAM, mix additional data from the RAM, or output samples to the buffers used by the audio interface. This means new-ax-hle now follows the correct audio emulation pipeline: ARAM -> DSP -> RAM -> AI -> Output (instead of the pipeline used previously: ARAM -> DSP -> Output). This also means some CPU sound effects like echo, reverb, etc. should work fine.
Overall, the more I fix bugs in new-ax-hle, the less I understand how the previous AX HLE could work so well. It is a pile of hacks, implementing only 2/19 AX commands (and one of these commands is not even implemented correctly), with a completely wrong timing, and some ugly code that makes no sense.
At the time I'm writing this article, new-ax-hle works a lot better than the previous AX HLE in most Gamecube games, and only a few remaining bugs are known in GC games. The Wii AX code is a bit less mature and is more like a proof of concept: I haven't really worked a lot on it, and after one or two weeks of bug fixing it should also become pretty much perfect, including Wiimote audio emulation (which was only supported with LLE previously). I'm hoping this code will be merged for 4.0, and I'll most likely be working on Zelda UCode HLE next (which has a less ugly implementation but has the same design issues as AX).
I hope you enjoyed this series of articles about DSP and AX emulation! Next time I'll work on something interesting I'll try to write about it too, I like talking about what I do
