• Login
  • Register
  • Dolphin Forums
  • Home
  • FAQ
  • Download
  • Wiki
  • Code


Dolphin, the GameCube and Wii emulator - Forums › Dolphin Emulator Discussion and Support › Development Discussion v
« Previous 1 ... 11 12 13 14 15 ... 117 Next »

PoC of an LLVM-based JIT compiler
View New Posts | View Today's Posts

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Thread Modes
PoC of an LLVM-based JIT compiler
12-02-2019, 02:02 AM
#1
aguinet
Unregistered
 
Hello all,

Mainly to see the benefits that it could have (or not), I started to implement a PoC of an LLVM-based JIT compiler for Dolphin. After some google search, I didn't find previous attempts, but I have to admit that only while writing this message I have found out about this thread: https://forums.dolphin-emu.org/Thread-i-d-like-to-make-a-llvm-jit-for-dolphin-but-i-don-t-know-where-to-start. It is a bit old, so I don't known whether this turns out to be something interesting or not (does anyone has any information?).

As it was my first time playing with the Dolphin code, I wanted to start with something really simple but that works. Moreover, the LLVM assembler being known to be quite slow compared to custom solution (like dolphin's), I searched for a "fail fast" approach that would involve *not* to write all the PPC semantics in LLVM IR by hand. My approach has thus been the following:
  • take the CachedInterpreter, and, instead of generating a list of callbacks, generate a list of "call XX" thanks to LLVM (where XX is an immediate representing the function to call) (what we will call v1)
  • verify that this works
  • replace these calls by their actual LLVM IR implementation (generated from the existing C++ code), inline everything and see what happens (what we will call v2)
  • and also verify that it actually works Smile
For reference:
  • v1 is implemented here: https://github.com/aguinet/dolphin/tree/feature/llvm_jit_simple
  • v2 is implemented here: https://github.com/aguinet/dolphin/tree/feature/llvm_jit
  • note that this has only been tested and compiled under Linux, and that there are some nasty hacks in v2 to generate the LLVM IR of the interpreter to get it back at runtime (see https://github.com/aguinet/dolphin/blob/feature/llvm_jit/Source/Core/Core/CMakeLists.txt#L670 for instance).
So v1 works on the few games I tried, and, as you would expect, is close to the performances of the CachedInterpreter (and even slower because of the extra compilation invovled just to generate a bunch of mov and call instructions). But that was still a first successful test Smile

About v2, it also works on these same games, and is really slow. I optimized the code a bit so that the LLVM IR generation step is about ~100µs for "small" blocks (<10 instructions), and can go up to 1ms for "big" blocks. Actual code generation as expected is what takes most of the time and can go up to 2ms.

My questions are:
  • inlining the full interpreter is a bit overkill. What are the trade-off between inline asm code/runtime calls that should be made in this case?
  • I have seen a "block linking" feature in the current JIT implementations. My guess is that it tries to merge block together to avoid a "round trip" to the main "scheduler", but I might be completely wrong. Is there any documentation on this process?
  • (Somehow related to the previous one) Granularity of the jitting process seems to happen at the basic block level. Would that be doable to try and link blocks together to be able to optimize full functions? (which is where the LLVM optimizer would be really good)
  • and, more generally, do you think that we could achieve something usable with this approach?
Another idea I had in mind is to use LLVM only for "hot" basic blocks/functions, as it's already done in some browsers for JS IIRC. The idea would be to fire up the compilation process in a background thread if a function is hit X times, and atomically switch to the LLVM generated code when it's ready. This way, we still get the nice performances of the current JIT, and optimize only hot functions with LLVM (to get better performances for these ones).

Thanks everyone for your help and remarks!

Regards,
Reply
12-02-2019, 02:47 AM
#2
Billiard26 Offline
Developer
**********
Developers (Some Administrators and Super Moderators)
Posts: 2,661
Threads: 8
Joined: Feb 2010
When I attempted this years ago the stuttering caused by LLVM compilation times made it unusable.
It looks like you're claiming the same with "up to 2ms".

Once the JIT cache filled up games would run at decent speeds but the occasional compilation stutter was horrible.
From my googling at the time it seemed the general agreement was LLVM was not suitable for a JITing because of compilation times.
Find
Reply
12-06-2019, 07:10 AM
#3
aguinet
Unregistered
 
(12-02-2019, 02:47 AM)Billiard26 Wrote: When I attempted this years ago the stuttering caused by LLVM compilation times made it unusable.
It looks like you're claiming the same with "up to 2ms".

Once the JIT cache filled up games would run at decent speeds but the occasional compilation stutter was horrible.
From my googling at the time it seemed the general agreement was LLVM was not suitable for a JITing because of compilation times.

Time obviously depends on the complexity of what we are jitting. LLVM is indeed generally used as a "N-tier" jit, that is optimizing code in background that has already been JITed by a custom JIT that has less compile time overhead (but could produce less efficient code than LLVM).

That's why I was proposing to maybe try and use LLVM this way also in dolphin, to try and generate more efficient code for hot functions/code path. Moreover, out of curiosity, do you have any answer in the questions I asked?

Thanks!
Reply
« Next Oldest | Next Newest »


  • View a Printable Version
  • Subscribe to this thread
Forum Jump:


Users browsing this thread: 1 Guest(s)



Powered By MyBB | Theme by Fragma

Linear Mode
Threaded Mode