How to build/optimize for Haswell (AVX2) ?
|
06-18-2014, 01:47 PM
(This post was last modified: 06-18-2014, 01:52 PM by kinkinkijkin.)
depending on the maximum IPC, clock rate, and number of cores, it could be a lot. Like, my computer can get a whole 42000000000-ish things done in 3 seconds at it's fastest, or even up to 368400000000 if you include the GPU.
that's a lot of things, don't you think?
in a perfect world we would all be piles of sand with no ability to form coherent bodies of body
06-18-2014, 03:33 PM
Depends on the task. Where it can be significant for some, it'll be a rounding error for others. In the case of Dolphin, it's not statistically relevant. The increase in benchmarked performance is less than 1%.
Btw, I see what you did there :p A quad-core CPU running @ 3.5 GHz gives you 42000000000 cycles in 3 seconds, though you're not taking into account superscalar operations. It could theoretically have an IPC greater than 1, in which case your estimate is smaller than the "absolute maximum". 06-18-2014, 03:39 PM
I'm using a dual-core processor with a maximum IPC of 2, forget which instruction can be done twice per clock, though. I was never successful at getting this thing to unlock the extra cores, and I sure as hell wouldn't be able to hold it above stock on this mobo (usually increases to 125w consumption at stock, this one, on unlocking, which is where it gets extremely dangerous for this mobo).
in a perfect world we would all be piles of sand with no ability to form coherent bodies of body
06-18-2014, 04:49 PM
[quote="kinkinkijkin"
forget which instruction can be done twice per clock [/quote] It's probably not that the instruction "normally" finishes in half a cycle, but that superscaling allows it to complete in that time frame. I'm not too knowledgeable on x86 or x64 architectures or assembly (more so ARM) but that's what I gather when we're talking about Intel or AMD. If you're interested, have a look at this extensive document about instruction time latencies (measured in core cycles): http://www.agner.org/optimize/instruction_tables.pdf 06-30-2014, 07:51 AM
(06-17-2014, 05:59 AM)tecfreak Wrote:why do we have such a performance drop with HT on, btw? Is it the same for Linux and OSX systems or an isolated Windows issue?(06-17-2014, 05:11 AM)shuffle2 Wrote: it would be interesting if you posted the same dolphin sources built with normal settings and then with march=haswell. I have a very hard time believing it really "runs smoother". Maybe Dolphin should only use on thread per core when HT is enabled? 06-30-2014, 09:40 AM
06-30-2014, 11:07 AM
Oehr Wrote:why do we have such a performance drop with HT on, btw? Shared resources. For example half the L1 Dcache is reserved for each thread per physical core. Oehr Wrote:Maybe Dolphin should only use on thread per core when HT is enabled? The OS controls thread delegation in this case. And most modern OS already do this. The only way to completely remove any possibility of a performance hit from HT is to turn it off.
"Normally if given a choice between doing something and nothing, I’d choose to do nothing. But I would do something if it helps someone else do nothing. I’d work all night if it meant nothing got done."
-Ron Swanson "I shall be a good politician, even if it kills me. Or if it kills anyone else for that matter. " -Mark Antony 07-01-2014, 08:38 AM
thanks naturalviolence and tecfreak!
i thought that dolphin had additional issues with systems beyond 4 threads/cores. So comparing a 4-core and a 6 or 8 core CPU with identical single-core performance, does dolphin actually get another boost or is 4 cores its sweet spot? 07-01-2014, 09:28 AM
Dolphin can use:
1 thread for CPU emulation (and optionally DSP emulation) 1 thread for GPU emulation optionally 1 thread for DSP emulation That's basically one thread for each major chip in the Wii/GC. The DSP thread will basically never be doing more work than both of the other two, so it never fills up a third core. That can leave spare cores if you're on anything other than a dual core, which means that any other programs you're running, plus the OS, can use those cores and avoid hindering Dolphin. All this together means that Dolphin will run mostly the same on a dual and quad core chip with the same single-threaded IPS if nothing else is going on, and throwing more cores at it just gives extra room for other stuff to happen at the same time.
OS: Windows 10 64 bit Professional
CPU: AMD Ryzen 5900X RAM: 48GB GPU: Radeon 7800 XT |
« Next Oldest | Next Newest »
|
Users browsing this thread: 1 Guest(s)