Dolphin, the GameCube and Wii emulator - Forums

Full Version: Multithreading
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Quote:Also, I noticed some texture artifacts (some textures blinking every 10 frames or so) in 3.0-415-dirty which I didn't have in the 3.0-370-dirty Lectrode Build. They also apprear if I download the pre-build from here.

Create an issue report and figure out what revision this started with.

Quote:I also managed to get 100% speed when affinity is set to both my cores, without the frame skipping. Speed still drops to 90-95% sometimes, but compared to the 75-85% it was before it's still an improvement.

In Fifo_Player_Thread() I changed:
Code:
if (_CoreParameter.bLockThreads)
Common::SetCurrentThreadAffinity(1); // Force to first core

to:

Code:
if (_CoreParameter.bLockThreads)
{
Common::SetCurrentThreadAffinity(1); // Force to first core

// I know: Bad Idea, raising Thread priority! So Only do it when Locking Threads
SetThreadPriority("FIFO player thread", THREAD_PRIORITY_HIGHEST);

}

Of course, this will fail if Dual-core mode isn't enabled, because the Thread would get another name. I'll fix that later (Personally, I never use single-core mode anyway, and I have no intention to make my builds publicly available ATM). Thus far, I haven't had any stability issues yet, but I haven't really played that much either.

Switching just the affinity code around didn't work. I fact, it caused both threads to "get it on" on my 2nd core, until I opened up task-manager to confirm Dolphin's affinity, which was set to ALL, so I clicked OK and suddenly both my cores were used again (Checked it twice, I probably switched something wrong)

Very interesting. I would not think that increasing the video (fifo player) thread priority would improve performance for most games. Would you mind testing some other games to see if this is only faster for this game or in general?

Quote:Of course, this will fail if Dual-core mode isn't enabled, because the Thread would get another name.

If I'm not mistaken that code should not even run if dual core mode isn't enabled. So it should work fine.

Quote:A quick question, does the Lock Threads to Cores option lock to physical cores only? I honestly can't remember, but I've noticed the emulator much slower when the affinity is set to use the virtual threads instead of physical cores.

Yes and no. I think you're getting your terminology mixed up a bit.
Physical cores = physical processors (hardware) on that die that process instructions independently and simultaneously
Hardware threads = frontends for the cores, they act as destinations for the OS to schedule software (logical) threads to. The number of hardware threads represents the number of logical threads that can be run simultaneously at any one time.
Logical cores = A layer of abstraction created by the OS that acts as an interface between hardware threads and software threads.
Virtual cores = "Fake" cores hidden behind a layer of abstraction in a VM.
Logical threads = software threads. The threads that software processes like dolphin.exe spawn. They are streams of instructions that can be assigned an affinity. The affinity is a logical core(s).

Technically it's impossible to map a software thread to a physical core.

For some reason some people seem to think that the core i7 cpus have 4 physical or "real" cores and 4 virtual or "fake" cores. Such people do not understand the terminology properly.
Your cpu has 4 physical cores and 8 hardware threads (2 hardware threads per core instead of 1, that's essentially what HT is). The OS creates 8 virtual cores which applications can assign threads to and the OS manages scheduling between the logical (software threads) of the applications and the hardware threads of the cpu, essentially acting as a middle man to hide all of the super complex hardware level threading stuff from the application programmers so that they don't have to deal with it.

Dolphin has 1-3 major threads depending on your settings and a dozen or so minor threads. Your processor and OS are capable of running 8 threads simultaneously even though you only have 4 physical cores.

In older revisions lock threads to cores would lock the affinity of the major threads to the first available logical core(s), 1, 2, and 3. However we quickly discovered that if you map the two major threads to core 1, and core 2 (keep in mind these are logical cores, sometimes called OS cores) the OS would schedule those two to the first two hardware threads, which were processed by the same physical core. So only one physical core was used even though the OS showed activity being split between two cores (because logical and physical cores are different). By changing the affinity of the second major thread from 2 to 4 we fixed that.

Here, I made this quick drawing (excuse the quality) to show you what I'm talking about:
[Image: threadstuff.png]

Quote:Semi-offtopic: I hope that eventually the emulator will be as fast as it was around 1000 - 2000 revisions ago and graphical issues that popped up in recent revisions will be fixed. Maybe a pip dream but it would also be nice to see LLE audio become nearly as fast as HLE audio.

Skid recently made a commit that sped up LLE audio pretty significantly, you should check it out.
(02-08-2012, 02:46 PM)NaturalViolence Wrote: [ -> ]Skid recently made a commit that sped up LLE audio pretty significantly, you should check it out.

Unfortunately I can't at this time, I had to RMA my PSU and I haven't received the replacement yet. I heard that Skid's commit offered a nice speedup but I doubt it's close to the performance of HLE audio.

I pulled a newer version with GIT, but while Building, I suddenly get a lot of these kind of errors:
Code:
6>c:\git\dolphin-emu\source\core\common\src\cdutils.cpp(29): error C2664: 'GetDriveTypeW' : cannot convert parameter 1 from 'const char []' to 'LPCWSTR'
6>          Types pointed to are unrelated; conversion requires reinterpret_cast, C-style cast or function-style cast

What did I mess up?

EDIT: I Traced it back to installing the 6.1 SDK for OpenMP support, but uninstalling it didn't correct it.
......that line has nothing to do with openMP.
(02-09-2012, 10:34 AM)NaturalViolence Wrote: [ -> ]......that line has nothing to do with openMP.
I know, but VC Express is giving these errors ever since I installed the 6.1 SDK, which I needed for the omp.h file, thus for building with OpenMP support.

I guess the SDK messed up the compiler settings, but I have no clue where to look to correct them or reset them to default.

Edit: x64 builds just fine, so I definitively have messed up the compiler configuration. I'll try a re-install of VS and stuff.

Edit2: Well, It took me a while to figure out, but after purging all and every trace of VS Express and the SDK's, then re-installing VS 2010 Proffesional with the DirectX SDK only solved my problem, so I'm back to building on the latest version!

Quote:Would you mind testing some other games to see if this is only faster for this game or in general?

Sure. Any specific games in mind?
Quote:Sure. Any specific games in mind?

Whatever you have.
Alright, here's an update...

Ever since I am compiling with VS 2010 Professional instead of VC Express, I have been unable to recreate the speed-up I pulled off mentioned before using the raise thread priority code I added.

At first I though it might be because of different compiler versions, but taking a better look at the code revealed that the Fifoplayerthread() code isn't even executed: cputhread() is executed instead, I double checked with some added OSD messages. So as to why it caused the speed-up earlier is still unknown to me. However, I'm still able to reach speeds of up to 250% (when frame-limiting is disabled, and heavy frame-skipping occurs while doing so) if I manually (from within Task manager) set Dolphin.exe's affinity to one core only, as opposed to only ~80% when affinity is set to both cores (Dual-core has to be enabled for this to work, locking threads still doesn't seem to have any effect on this particular behavior). This lasts for about 25 seconds, then speed drops to 40~50% for about 10 seconds, then it goes back up again. (CPU/GPU threads getting out of sync? With all that frame-skipping I can't think of anything else, really)

Further looking in to the code, I found a function in Thread.h, namely "YieldCPU()", which utilizes the "std::this_thread::yield()" function. I remember from when I used the yield command in Delphi, the thread yielding continued to run at full speed, except for when another thread with at least the same priority AND on the same core had need for additional CPU time. But then again, I could have just written my code wrong at that time, and I don't know if the same behavior occurs in C++. Out of pure curiosity, I replaced it with sleep(1), which makes the thread freeze completely for 1 ms. This resulted in much less CPU usage when I started a game: the GPU-thread went down from 100% core usage to about 30% in the areas of the game I usually have 100% game speed (I'm still using "Harvest Moon: Animal Parade PAL"), but no significant speed changes occurred. However, I did lose the ability to get 100%+ speed in the slower areas of the game with the Task manager "trick". Instead, far less frame-skipping occurred and the speed dropped to 35% or so. Also, when I changed the SleepCurrentThread function to a while loop with yields for the same amount of time, my system locked up completely as soon as I started a game. (didn't test that again with single core mode though)

I'm beginning to suspect the GPU thread is supposed to "yield" to the CPU thread at some point, but somehow fails to do so (because they are both on different cores?), causing the CPU thread to slow down to keep sync, instead of the GPU thread skipping frames. Though at this point this is just a theory (and could be intended behavior as well). I did try to raise or lower Internal Resolution to see if my GPU might be the bottleneck, but my GPU starts becoming an issue only when I go above 3x Native, meaning I get the same 80% game speed mentioned above with either 1x Native, or 2.5x Native. This also makes me believe this whole speed-thing originates somewhere in the GPU thread.

EDIT:
Because the "Task manager trick" results in frame skipping, I am trying to figure out where and how the frame skipping is done when it is selected in the Emulation menu. Interestingly enough, that setting seems to have no effect when Frame limit is set to Audio.
My apologies if this has been brought up in this thread already. I couldn't find a post concerning this, though.

I read somewhere else here that there was a decreased level of performance and a problem with games and >2 core optimisation, but I have an idea/question. This would only apply to OS X but has anyone looked into Apple's Grand Central Dispatch? It's a technology/library designed to make multithreading in OS X efficient as hell. I'm not sure of the technical limitations using it in relation to an emulator but in theory it would help a lot.
(04-22-2012, 07:12 PM)nbmatt Wrote: [ -> ]My apologies if this has been brought up in this thread already. I couldn't find a post concerning this, though.

I read somewhere else here that there was a decreased level of performance and a problem with games and >2 core optimisation, but I have an idea/question. This would only apply to OS X but has anyone looked into Apple's Grand Central Dispatch? It's a technology/library designed to make multithreading in OS X efficient as hell. I'm not sure of the technical limitations using it in relation to an emulator but in theory it would help a lot.

... its just not as easy as you think...
and that library is nothing special
(04-22-2012, 07:51 PM)dannzen Wrote: [ -> ]
(04-22-2012, 07:12 PM)nbmatt Wrote: [ -> ]My apologies if this has been brought up in this thread already. I couldn't find a post concerning this, though.

I read somewhere else here that there was a decreased level of performance and a problem with games and >2 core optimisation, but I have an idea/question. This would only apply to OS X but has anyone looked into Apple's Grand Central Dispatch? It's a technology/library designed to make multithreading in OS X efficient as hell. I'm not sure of the technical limitations using it in relation to an emulator but in theory it would help a lot.

... its just not as easy as you think...
and that library is nothing special

I do understand the difficulty, I'm not entirely new to programming, I've just never written anything that has as much technical red tape as an emulator does, haha. The API is pretty beneficial for most things I've used that adopted it and I'm in the process of implementing it in a software synthesizer I've been working on and so far there have been fairly good improvements.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14