Dolphin, the GameCube and Wii emulator - Forums

Full Version: Multithreading
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
(01-30-2012, 11:14 AM)NaturalViolence Wrote: [ -> ]Can you prove it was the lock threads to cores option that caused it? It seems like you changed a lot of options.

I tested with Lock Thread to Cores disabled + DSP HLE, in the videos the use of cpu is near 100% and playing is 50%+, anyway frezzes random in Metroid Other M using the lastest build Git376.

Salu2 - Darkness Knight
Quote:So no, I think that you'll find that there is no general rule of thumb for this.
I know this doesn't work for every situation: depends on a lot of factor: is game demanding, can hardware support any code natively, etc. etc. I just wanted to give a general explanation why some games run at full-speed, while others can't get over 50%. (and I'm assuming accurate emulation settings).

Oh, and I picked this up at several emulation boards. Seems like a lot of people take this to be true (though it is indeed not accurate, like stated above)

Quote:Quote:
So for the Wii, which has a 729MHz CPU, you would need a CPU @ ~7.3GHz.

Clock rate != performance.
Correct. But you need the extra cycles to perform the emulation more fluidly. Also, I was going on from the basic factor 10 needed for emulation.

Quote:Quote:
"Lock threads to Cores", to increase stability and performance on 3+ core environments, in case the game (or Dolphin) can't handle being thrown around random cores or (hyper)threads.

....um.....no.....it has absolutely nothing to do with that. The developers wanted to get rid of that option because it was pointless until it was revealed that locking the thread affinity helped laptops with turbo boost reach higher frequencies and could improve performance. This option does not affect stability at all.
I've seen multiple threads here where enabling this option solved game crashes. But you're right: The performance gain is way bigger because the CPU doesn't have to compensate the different threads and processes with synchronization implements. Albeit Hardware integrated, it still slows things down, while locking a thread to a single core eliminates the need to synchronize within that single thread. (You still need to synchronize all threads with each other, but that takes a hell of a lot less)

Any way, before I accidentally turn this thread into a staring contest: I was trying to explain that having more cores won't help much. We need more cycles, thus higher clock-rates. I still believe that when we have 8GHz+ multi-cores available, most speed issues with demanding games will be solved without the need for speed-up-hacks. Especially with large area games, like Twilight Princess' Hyrule field.
(01-31-2012, 06:50 PM)arthur117 Wrote: [ -> ]
Quote:Quote:
"Lock threads to Cores", to increase stability and performance on 3+ core environments, in case the game (or Dolphin) can't handle being thrown around random cores or (hyper)threads.

....um.....no.....it has absolutely nothing to do with that. The developers wanted to get rid of that option because it was pointless until it was revealed that locking the thread affinity helped laptops with turbo boost reach higher frequencies and could improve performance. This option does not affect stability at all.
I've seen multiple threads here where enabling this option solved game crashes. But you're right: The performance gain is way bigger because the CPU doesn't have to compensate the different threads and processes with synchronization implements. Albeit Hardware integrated, it still slows things down, while locking a thread to a single core eliminates the need to synchronize within that single thread. (You still need to synchronize all threads with each other, but that takes a hell of a lot less)

i lol'ed sorry... you are wrong...
and i don't want to waste my time for a explanation
Quote:I know this doesn't work for every situation

It almost never works.

Quote:Oh, and I picked this up at several emulation boards. Seems like a lot of people take this to be true (though it is indeed not accurate, like stated above)

Well I think you'll find that users browsing emulation boards usually know surprisingly little about how emulation works. They are not experts on the subject. If you were to tell a software developer who has written an emulator before what you just told me they would laugh at you.

Quote:Correct. But you need the extra cycles to perform the emulation more fluidly. Also, I was going on from the basic factor 10 needed for emulation.

Once again clock rate != performance. 10 times the performance is not the same thing as 10 times the clock rate. I have seen single core cpus running at 4 times the clock rate of a competitor achieving about the same performance.

Quote:The performance gain is way bigger because the CPU doesn't have to compensate the different threads and processes with synchronization implements. Albeit Hardware integrated, it still slows things down, while locking a thread to a single core eliminates the need to synchronize within that single thread. (You still need to synchronize all threads with each other, but that takes a hell of a lot less)

WTF?

That makes no sense. You have no idea what you're talking about. The thread synchronization doesn't change when you turn lock threads to cores on/off. Look at the source code if you don't believe me.

All that option does is set the affinity of the threads to 1 and 3 instead of letting the OS decide. I don't where on earth you got that nonsense from so I have to assume you just made it up without looking at the source code to confirm it.

Quote:Any way, before I accidentally turn this thread into a staring contest

As much as I appreciate that thought (I really do) the best way to not get into a staring contest is to not talk about things that you know nothing about as if you understand them. This annoys people and causes them to respond.

Quote:I was trying to explain that having more cores won't help much. We need more cycles, thus higher clock-rates. I still believe that when we have 8GHz+ multi-cores available, most speed issues with demanding games will be solved without the need for speed-up-hacks. Especially with large area games, like Twilight Princess' Hyrule field.

Once again, clock rate != performance.

You can get a pentium D dual core processor running at 8GHz, and it will perform very very poorly in dolphin. High clock rate does not guarantee good performance. Many micro-architectures trade IPC for clock rate. What we need is higher IPC and more efficient code or higher IPC + higher clock rate and more efficient code. Higher clock rate alone isn't going to do it. And then there are internal and external bottlenecks to consider which dampen performance scaling.

Quote:I tested with Lock Thread to Cores disabled + DSP HLE, in the videos the use of cpu is near 100% and playing is 50%+, anyway frezzes random in Metroid Other M using the lastest build Git376.

But the post that you linked to implied that turning lock threads to cores OFF fixed the issue. Are you saying it's the other way around?

Have you tried toggling only that setting on/off to make sure that it was that setting and not just a coincidence?
I only ask because we get so many reports of users saying that option X fixed there crashes when it turned out to be a coincidence. And the post you linked to showed you changing a lot of settings, not just that one.

And you're saying that the game runs at 100% cpu load during cutscene videos and over 50% during actual gameplay right? That makes sense. The openMP texture decoder is multithreaded and should be using all of your cores but the bulk of the emulators compute sensitive code runs on 2-3 threads.

I could understand lock threads to cores on being less stable in some situations since the OS can't shuffle the threads when something happens (just a theory at this point).
Announcer: "And there's the evil stare from the challenger in the red corner! How will his opponent handle this move?!"

But seriously:

Quote:That makes no sense. You have no idea what you're talking about. The thread synchronization doesn't change when you turn lock threads to cores on/off. Look at the source code if you don't believe me. All that option does is set the affinity of the threads to 1 and 3 instead of letting the OS decide
OS Default is setting affinity to all available cores (at least on the various versions of Windows I installed on many, many custom and stock rigs during the 15 years I've been building systems, scripting large scaled software operations and developing business class applications... Though I have to admit I don't like coding multi-threaded applications because I always seem to be screwing up the synchronization somewhere and I have absolutely no experience with coding emulators). Have fun watching the thread bounce around your cores when Windows gets to decide, or no affinity is set. I know the code doesn't change just because the affinity of the thread is set or not, but because the thread's child processes are spawned about evenly across all cores which the parent thread has affinity with (or whatever the OS's methods for load balancing are). The OS (or, on some MB's with multiple physical CPU's, the BIOS when that option is enabled) implements synchronization between said child processes to keep the parent thread from going haywire. Windows USUALLY does a good job with this, but start messing with thread priorities and watch it crash and burn because the synchronization gets drowned. I wasn't talking about syncing between threads, but between the child-processes of one single thread.

Quote:You can get a pentium D dual core processor running at 8GHz
<getting marshmallows>

Quote:Clockrate != Performance <snip> High clock rate does not guarantee good performance. Many micro-architectures trade IPC for clock rate. What we need is higher IPC and more efficient code or higher IPC + higher clock rate and more efficient code. Higher clock rate alone isn't going to do it. And then there are internal and external bottlenecks to consider which dampen performance scaling.
Acknowledged and agreed. I assumed the same IPC (if you mean Instructions Per Cycle) combined with a higher clockspeed. My mistake.

Quote:If you were to tell a software developer who has written an emulator before what you just told me they would laugh at you.
Like I said, I myself have no experience with coding emulators. I only know what I read on this subject, which was bad information which I assumed to be accurate in this case.
A good rule of thumb is that if you personally haven't had experience with whatever it may be, you're probably better off not trying to talk about it like you have.
Quote:OS Default is setting affinity to all available cores (at least on the various versions of Windows I installed on many, many custom and stock rigs during the 15 years I've been building systems, scripting large scaled software operations and developing business class applications... Though I have to admit I don't like coding multi-threaded applications because I always seem to be screwing up the synchronization somewhere and I have absolutely no experience with coding emulators). Have fun watching the thread bounce around your cores when Windows gets to decide, or no affinity is set. I know the code doesn't change just because the affinity of the thread is set or not, but because the thread's child processes are spawned about evenly across all cores which the parent thread has affinity with (or whatever the OS's methods for load balancing are). The OS (or, on some MB's with multiple physical CPU's, the BIOS when that option is enabled) implements synchronization between said child processes to keep the parent thread from going haywire. Windows USUALLY does a good job with this, but start messing with thread priorities and watch it crash and burn because the synchronization gets drowned. I wasn't talking about syncing between threads, but between the child-processes of one single thread.

Lets take a look at your original statement, shall we?
Quote:The performance gain is way bigger because the CPU doesn't have to compensate the different threads and processes with synchronization implements. Albeit Hardware integrated, it still slows things down, while locking a thread to a single core eliminates the need to synchronize within that single thread. (You still need to synchronize all threads with each other, but that takes a hell of a lot less)

Now to anyone who has used dolphin with and without this option this statement makes absolutely no sense. There is usually no difference in performance turning the option on/off if your processor doesn't use turboboost/turbocore (a feature which increases the clock rate based partly on the number of active cores). If there is a difference it will be <5%. Some users claim slightly faster speed, others claim slightly slower, but the one thing they all have in common is that the difference is virtually none if their cpu does not support turboboost/turbocore. And with the exception of one or two unconfirmed user reports most users encounter no stability issues using lock threads to cores with any game (or vice versa).

Disclaimer: I've never done multithreaded programming before. I've looked at the part of dolphins source code that deals with this and have a pretty good understanding of it but I've never done it myself. I do however know that a single thread of a program cannot use multiple cores at the same time, it is impossible. Processes contain threads, not the other way around. I think you need to get your terminology straight. A parent process can spawn one or more child processes, and each process can have one or more threads. But a thread cannot contain several processes running simultaneously on several cores that the OS needs to keep in sync, which your statement seems to be implying. A thread is the smallest unit of processing that can be scheduled by the OS, there is nothing for the OS to sync WITHIN the thread.
Quote: I think you need to get your terminology straight
...Blush

I pretty much feel like an idiot/noob, because NOW I realize why my multi-threading coding never seems to work like I want it to...

I sincerely apologize for my ignorance.

EDIT:
Alright, out of sheer curiosity I've been doing some tests, using Harvest Moon - Animal parade, walking circles outside of the house. However, the results don't make any sense to me:

Frame limit is set to Auto
using PAL game and enabled "Use EurGB60 (PAL60)"
Backends = DirectX9/XAudio2, DSP = HLE
Using x86 Dolphin 3.0-370-dirty, Lectrode's ICC Optimized build.
Rendering through TriDef for 3D output (interlaced)
Specs in my sig.

Enable Dual-core = ON
Dolphin.exe's affinity = both cores
Result: Fluid graphics; ~82% speed

Enable Dual-core = OFF
Dolphin.exe's affinity = both cores
Result: Fluid graphics; ~60% speed

Enable Dual-core = ON
Dolphin.exe's affinity = one core
Result: stuttering graphics (Looks like frame-skipping, 8-10 FPS); 100% speed

Enable Dual-core = OFF
Dolphin.exe's affinity = one core
Result: Fluid graphics; ~60% speed

Locking threads in any combination doesn't change anything.

Now why am I getting 100% speed only when I limit Dolphin to a single core with dual-core enabled? Shouldn't I get similar results with affinity set to both cores?
Quote:Enable Dual-core = ON
Dolphin.exe's affinity = one core
Result: stuttering graphics (Looks like frame-skipping, 8-10 FPS); 100% speed
Quote:Now why am I getting 100% speed only when I limit Dolphin to a single core with dual-core enabled? Shouldn't I get similar results with affinity set to both cores?

That doesn't make any sense at all. If you manually set the affinity to 1 core in taskmanager and turned on dual core the OS should be switching threads on that core like crazy. Speed should be much lower. I don't really know what's going on but that shouldn't be happening, let me try it.

Edit: ahahahaha. Wow, thank you for bringing this to my attention. Indeed VPS (and therefore gamespeed) remain high while fps goes way down. The only way that could happen is if the video thread is being given significantly less priority or time on the cpu than the cpu thread. However I am not getting better gamespeed. My gamespeed is still lower with affinity set to one core, just not as much lower as you might expect. So I can't reproduce that, you shouldn't be getting higher gamespeed with affinity set to 1 core.

Dual core on + two core affinity:
Two main works threads. CPU thread and video thread each get there own core.

Dual core off + two core affinity:
One main work thread being swapped between two cores.

Dual core on + one core affinity:
Two main work threads being swapped on one core. CPU thread seems to be given priority over the video thread and ends up using almost all of the processing power anyways. Which is why the cpu thread isn't running that much slower than it would with two core affinity, it's using almost all of 1 cores time anyways. The video thread on the other hand is running way slower than it would with dual core affinity, which means your fps is going to be very low.

Dual core off + one core affinity:
One main work thread running on one core.
Quote:My gamespeed is still lower with affinity set to one core, just not as much lower as you might expect. So I can't reproduce that, you shouldn't be getting higher gamespeed with affinity set to 1 core.

I kid you not: Attached are windowed screenshots (and I double-checked). Notice the Speed and FPS in the status bar.

This is with affinity set to both cores:
[attachment=7338]

And here with affinity set to only core 0:
[attachment=7337]

EDIT:
Manual frame skipping doesn't change anything in terms of gamespeed, but the amount of frames drawn to my screen decrease (obviously) and it makes my GPU real quiet (down to 5-10% use with frameskip set to 2, which is about the same "skipping" I see when I set affinity to one core only), while both my cores remain at 100% use. AFAIK My CPU only supports Speed step as a speed feature, but the multiplier shouldn't change when setting affinity. I just double checked that with CPU-Z: it doesn't change, only when CPU is completely idle it goes down to x6 instead of x9.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14