• Login
  • Register
  • Dolphin Forums
  • Home
  • FAQ
  • Download
  • Wiki
  • Code


Dolphin, the GameCube and Wii emulator - Forums › Dolphin Emulator Discussion and Support › Support v
« Previous 1 ... 505 506 507 508 509 ... 1190 Next »

Technical question I have no business asking :P
View New Posts | View Today's Posts

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Thread Modes
Technical question I have no business asking :P
01-04-2015, 05:53 PM (This post was last modified: 01-04-2015, 05:55 PM by cammelspit. Edit Reason: Typo? )
#1
cammelspit Offline
Member
***
Posts: 116
Threads: 28
Joined: Jan 2010
I didn't know where this question actually should be so I put it here... Hope this is a good place... I subscribed to the Git hub with email alerts because I enjoy keeping abreast of the goings on in the dolphin development. I saw a commit by Fiora "[color=#333333]JIT: implement forward in-block branch support" and I got curious. Usually I can at least have an idea as to whats going on but seeing as my meager understanding of any useful code is SORELY insufficient I looked this up on Google as I do. I was wondering if anyone would be willing to break this down for me and my childish understanding of these things because there doesn't seem to be any good reference to exactly what is going on here. I assume this has something to do with branch predicting because that's the only reference I could get. I can see how code can take different paths and have seen other features have multiple paths as in falling back to a slower version of the code if one feature isn't supported by certain hardware. I guess I am asking does dolphin have need of predicting whats coming up and so this is just a type of prediction or am I way off base here? Fiora's mention of benchmarking makes me think this is some sort of optimization.[/color]


[color=#333333]I guess my sheer curiosity has just gotten the best of me this time and it's just bugging me knowing I have simply no idea. Smile[/color]
[Image: quit-smoking-banner.php?key=27940]
Find
Reply
01-04-2015, 07:38 PM (This post was last modified: 01-05-2015, 09:29 AM by Fiora.)
#2
Fiora Offline
x86 JIT Princess
**********
Developers (Some Administrators and Super Moderators)
Posts: 237
Threads: 0
Joined: Aug 2014
The JIT recompiles code by dividing it into blocks. The rules basically are:

1. When the emulated CPU jumps to a location in the code, start the block.
2. Continue the block until an instruction that ends the block (typically an unconditional branch (b) or return instruction (blr) or call instruction (bl)).
3. Along the way, there might be many conditional branch instructions, representing code like if(cond) {do thing} or loops. Conditional continue is a JIT feature that lets us keep going in the block if the branch isn't taken, but if the branch is taken, we leave the block and go to a new one.

"Block linking" is a feature that links up block exits (like the end of a block, or a taken conditional branch) and the entrance to the next block, so we don't have to go through the slow dispatcher to get there.

There's many optimizations to be had here: after all, we want to leave the block as little as possible. Leaving the block means flushing the register cache (losing all the PowerPC registers we were keeping around in x86 ones), which wastes time. Additionally, if we have lots of overlapping blocks of code starting at similar places, we waste instruction cache, because we're compiling redundant code. So whatever optimizations we do (e.g. inlining), we also want to limit the amount of code generated, as historically instruction cache misses have been a major performance problem due to the sheer volume of x86 code generated.

A few possible such optimizations that weren't yet implemented include:

1. Instead of ending the block on an unconditional branch, follow the branch.
2. Instead of ending the block on a call instruction (bl), let it call out to the function and then return to and continue the block. Perhaps take advantage of the calling convention and put the function parameters into registers, since most functions probably use the normal PowerPC calling convention.
3. Inline small functions instead of calling out to them.
4. Support backward jumps that go to a previous position in the block -- typically loops.
5. Support forward jumps that go to a later position in the block -- typically if/else statements.
6. Allow starting in the middle of the block, by compiling appropriate setup code and jumping straight in.

"Conditional continue", as described earlier, is one such block-jump-ish optimization that has been implemented.

I implemented 5), which is probably the easiest here. My guess is 4) is probably the hardest, or maybe 2) or 3).

Implementing this meant:

1. Keeping track of in-block branch targets, so we know where the branches in the block are going to.
2. Making sure we don't reorder instructions across branch targets.
3. Saving enough of the JIT state so that we can create code to land on when it's time to actually set up the destination for a forward branch.
4. Creating code to negotiate the register cache state differences, so that both code paths (the forward branch, and the main execution path) agree on what values should be in what registers. We also have to negotiate all the other differences in the execution paths too, but the rest are trivial by comparison.

The main dangers (speed-wise) in this optimization are:

1. If the branch and main execution paths disagree on what register values are immediates (known constants), we have to flush them, losing that knowledge, because we no longer are sure of what the value is. This could result in significantly worse code in some situations, especially if the branch is rarely taken and thus doesn't benefit much from the optimization.
2. The code to convert the branch's register state to the main register state could get nasty. We don't have a real register allocator, so I'm basically just hoping that it's Not That Bad.
3. Let's hope the way I've handled branches merging back into the main control flow isn't too gross.

The main advantages (speed-wise) in this optimization are:

1. Way less code. C code like if(x) y += 5; could result in the generation of an entirely new block just because of hopping over one add instruction with a branch; this completely avoids that. In Rebel Strike it's easily generating 2/3 as many blocks towards the top of the profile.
2. Less block exit code; way less reloading the register cache.

If anyone is interested in some of the "known gotchas" for the other things in the list that haven't been done yet, I can try to list some. (Also feel free to bother me on IRC ^^)
Website Find
Reply
01-05-2015, 07:14 AM
#3
Aleron Ives Offline
Senior Member
****
Posts: 662
Threads: 7
Joined: Apr 2014
If this forum had a thanks/like/thumbs up feature of some kind, I would use it right now.
Find
Reply
01-05-2015, 07:40 AM (This post was last modified: 01-05-2015, 07:42 AM by cammelspit.)
#4
cammelspit Offline
Member
***
Posts: 116
Threads: 28
Joined: Jan 2010
First off, thanks for responding! I am honored you took time to write to us lowly "NON CODE GENIUS" types. I have read ever single dolphin progress update since the first one and see your name go up there quite often as hammering out some of the nasty JIT stuff. I have no doubt you will nail this one down too! I am grateful for the slightly dumbed down explanation, it makes perfect sense now. I was just having no luck getting any exact description as to what was being done. The last code I wrote was in basic back in the late 80s but my little brother is a professional coder so I at least can wrap my brain around whats going on to some extent. Smile I cant say thank you enough times about being so willing to put up with my stupid question. It's not like I have and use for any of this stuff anyways so really why do I care.  Tongue Oh, and the gotchas? Yes yes, please do go on!

Thanks again, You are now top 5 in my pantheon...  Big Grin
[Image: quit-smoking-banner.php?key=27940]
Find
Reply
01-05-2015, 08:58 AM
#5
Kurausukun Offline
Zeitgenössischer Wurst
*******
Posts: 1,034
Threads: 62
Joined: Mar 2014
What IRC channel are you talking about? Is there a Dolphin-specific IRC channel that I'm unaware of?
Find
Reply
01-05-2015, 09:24 AM
#6
Fiora Offline
x86 JIT Princess
**********
Developers (Some Administrators and Super Moderators)
Posts: 237
Threads: 0
Joined: Aug 2014
#dolphin-dev on Freenode.
Website Find
Reply
01-05-2015, 11:20 PM (This post was last modified: 01-06-2015, 12:47 AM by kirbypuff.)
#7
kirbypuff Offline
The Original White Marshmallow
*****
Posts: 825
Threads: 37
Joined: Aug 2010
Related to this commit / PR (issue report):

Fiora Wrote:"(WIP) JIT: implement forward in-block branch support"

It seems to work...

...but it also seems to break NSMB (black background)
Find
Reply
01-06-2015, 01:56 AM
#8
Fiora Offline
x86 JIT Princess
**********
Developers (Some Administrators and Super Moderators)
Posts: 237
Threads: 0
Joined: Aug 2014
Yup, just woke up and heard about that, going to see if I can figure out what it is.

I do 100% expect a JIT change this significant to break at least something at first; I'd be almost suspicious if it didn't.
Website Find
Reply
01-06-2015, 04:29 AM (This post was last modified: 01-06-2015, 04:49 AM by Fiora.)
#9
Fiora Offline
x86 JIT Princess
**********
Developers (Some Administrators and Super Moderators)
Posts: 237
Threads: 0
Joined: Aug 2014
Should be fixed now. That was a surprisingly subtle bug; thanks for the report.

Technical info: if the recompiler knows that a compare + branch is always true, it unconditionally takes the branch and leaves the block. But this breaks if we have a forward branch set up earlier whose target hasn't been set up yet.

I also fixed a few more possible things to be a bit more paranoid, e.g. flushing knowledge about previous registers at forward branch entry points.
Website Find
Reply
« Next Oldest | Next Newest »


  • View a Printable Version
  • Subscribe to this thread
Forum Jump:


Users browsing this thread: 1 Guest(s)



Powered By MyBB | Theme by Fragma

Linear Mode
Threaded Mode