Dolphin, the GameCube and Wii emulator - Forums

Full Version: Dolphin ICC Intel optimized builds (SSE3/4/AVX) (Latest:3.5-420 x64) [UNOFFICIAL]
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5
[color=#ff0000]DISCLAIMER: These are UNOFFICIAL Dolphin builds and come with no support from the Dolphin team, do not report bugs to them when using these builds (If you find a bug, test an official build first). These builds are compiled from experimental development source code with my own optimizations which may at times break things. Use at your own risk. Official Dolphin builds can be downloaded here.[/color]

I'm sharing these ICC Intel-only Windows builds in case anyone finds them useful. I've been rebuilding every few commits (Using completely new source and re-applying all optimizations again manually). The intent is to make Dolphin a little bit faster by using Intel's compiler with architecture specific optimizations, but there may or may not be a speed difference. Feel free to test and/or offer feedback, thanks. Also don't forget to add dsp_coef.bin and dsp_rom.bin to User/GC/ folder if you use LLE audio.


Build Software Used:

Optimizations applied:
Download (full revision changelogs listed here):
  1. AVX+OpenMP builds require a 2nd gen Core CPU or later (Sandy or Ivy bridge i3, i5, i7). These builds have been discontinued since they showed no speed improvement.
  2. SSSE3,SSE4.1,SSE4.2,AVX+OpenMP builds include codepaths for all CPU's Core 2 Duo or later
  3. AMD - possibly coming soon.
  • 420: Revision: 29d43ef89727 has some game ini updates. I messed up my original attempted zelda SS fix so I fixed that (hopefully). Uploading my usual ICC builds, plus a standard vanilla MSVC build with the Zelda-SS-Fix which should run on both AMD or Intel.
Dolphin 3.5-420 x64 ICC SSSE3,SSE4.1,SSE4.2,AVX + OpenMP
Dolphin 3.5-420 [Zelda-SS-Fix] x64 ICC SSSE3,SSE4.1,SSE4.2,AVX + OpenMP
Dolphin 3.5-420 [Zelda-SS-Fix] x64 MSVC Intel+AMD
Dolphin 3.5-419 x64 ICC SSSE3,SSE4.1,SSE4.2,AVX + OpenMP
Dolphin 3.5-420 [Zelda-SS-Fix] x64 MSVC SSE3,SSE4 Intel+AMD
  • 416: Uploading OpenMP and Non-OpenMP builds. Please note this does NOT effect the OpenMP texture decoder, that is still there in all builds, my OpenMP builds have all of Dolphin OpenMP enabled where ICC feels it might benefit from parallelization. Please test the difference between OpenMP and non-OpenMP. Also uploading builds with a potential fix for Zelda Skyward Sword crash on silent realms. Issue 5682
Dolphin 3.5-416 x64 ICC SSSE3,SSE4.1,SSE4.2,AVX + OpenMP
Dolphin 3.5-416 x64 ICC SSSE3,SSE4.1,SSE4.2,AVX
Dolphin 3.5-416 [Zelda-SS-patch] x64 ICC SSSE3,SSE4.1,SSE4.2,AVX + OpenMP (download removed, patch was bad. fixed builds will be above.)
Dolphin 3.5-416 [Zelda-SS-patch] x64 ICC SSSE3,SSE4.1,SSE4.2,AVX (download removed, patch was bad. fixed builds will be above.)
Dolphin 3.5-413 x64 ICC SSSE3,SSE4.1,SSE4.2,AVX + OpenMP
  • 412: a few stuffs changed
Dolphin 3.5-412 x64 ICC SSSE3,SSE4.1,SSE4.2,AVX + OpenMP
  • 402: few worthwhile changes... check commit logs for details. no AVX-only builds anymore, showed no speed difference. will clean up this post next time around probably, leaving stuff here for now. also, 428-real-wiimote-scanning has some critical fixes for windows.
Dolphin 3.5-402 x64 ICC SSSE3,SSE4.1,SSE4.2,AVX + OpenMP
Dolphin [real-wiimote-scanning] 3.5-428 x64 ICC SSSE3,SSE4.1,SSE4.2,AVX + OpenMP
  • 397: Only a couple Linux/OSX fixes, not worth building another master. Updated real-wiimote-scanning branch since it has a possible windows-fix. real-wiimote-scanning still synced up to 393 master. also did a test for AVX-Only build vs SSSE3,SSE4.1,SSE4.2,AVX vs vanilla there was basically zero difference. result is here. feel free to show me your results if you think there is a bigger difference, for now it doesn't seem worth building AVX-Only or SSE4-Only builds.

Dolphin [real-wiimote-scanning] 3.5-424 x64 ICC SSSE3,SSE4.1,SSE4.2,AVX + OpenMP

Dolphin [real-wiimote-scanning] 3.5-423 x64 ICC SSSE3,SSE4.1,SSE4.2,AVX + OpenMP
  • 395: Two minor changes, not really worth upgrading from 393. Testing new hand-coded SSE3/SSE4 optimizations in VideoCommon for potential speed improvement, but the automatic compiler optimizations should already be doing a better job. Can't hurt though.
Dolphin 3.5-395 x64 ICC (Intel C++ Compiler XE 13.1) AVX + OpenMP
Dolphin 3.5-395 x64 ICC (Intel C++ Compiler XE 13.1) SSSE3,SSE4.1,SSE4.2,AVX + OpenMP
Dolphin [real-wiimote-scanning] 3.5-420 x64 ICC SSSE3,SSE4.1,SSE4.2,AVX + OpenMP (based on 393 master, includes some wiimote fixes and automatic wiimote pairing)
Dolphin [FIFO-BP] 3.5-339 x64 ICC SSSE3,SSE4.1,SSE4.2,AVX (this was requested. based on a much older master, not entirely sure what it's supposed to fix. I heard on IRC that it's a bit broken on dual core mode, but I haven't tested.)
Dolphin 3.5-393 x64 ICC (Intel C++ Compiler XE 13.1) AVX + OpenMP
Dolphin 3.5-393 x64 ICC (Intel C++ Compiler XE 13.1) SSSE3,SSE4.1,SSE4.2,AVX + OpenMP
Dolphin 3.5-392 x64 ICC (Intel C++ Compiler XE 13.1) AVX + OpenMP (mirror)
Dolphin 3.5-392 x64 ICC (Intel C++ Compiler XE 13.1) AVX + OpenMP + O3 (mirror)
Dolphin 3.5-392 x64 ICC (Intel C++ Compiler XE 13.1) SSE4.2 + OpenMP (mirror)
Dolphin 3.5-392 x64 ICC (Intel C++ Compiler XE 13.1) SSE4.2 + OpenMP + O3 (mirror)
Dolphin 3.5-380 x64 ICC (Intel C++ Compiler XE 13.1) AVX + OpenMP
Dolphin 3.5-380 x64 ICC (Intel C++ Compiler XE 13.1) AVX + OpenMP + O3
Dolphin 3.5-380 x64 ICC (Intel C++ Compiler XE 13.1) SSE4.2 + OpenMP
Dolphin 3.5-380 x64 ICC (Intel C++ Compiler XE 13.1) SSE4.2 + OpenMP + O3
Dolphin 3.5-375 x64 ICC (Intel C++ Compiler XE 13.1) AVX + OpenMP
Dolphin 3.5-375 x64 ICC (Intel C++ Compiler XE 13.1) AVX + OpenMP + O3
Dolphin 3.5-375 x64 ICC (Intel C++ Compiler XE 13.1) SSE4.2 + OpenMP
Dolphin 3.5-375 x64 ICC (Intel C++ Compiler XE 13.1) SSE4.2 + OpenMP + O3
Dolphin 3.5-374 x64 ICC (Intel C++ Compiler XE 13.1) AVX + OpenMP
Dolphin 3.5-374 x64 ICC (Intel C++ Compiler XE 13.1) AVX + OpenMP + O3
  • 368: Alternate wiimote timing still gone
Dolphin 3.5-368 x64 ICC (Intel C++ Compiler XE 13.1) AVX + OpenMP
Dolphin 3.5-368 x64 ICC (Intel C++ Compiler XE 13.1) AVX + OpenMP + O3
  • 367: Alternate wiimote timing was removed, may be buggy in some games requiring it.
Dolphin 3.5-367 x64 ICC AVX + OpenMP
Dolphin 3.5-367 x64 ICC AVX + OpenMP + O3

Dolphin 3.5-358 x64 ICC AVX + OpenMP
Dolphin 3.5-358 x64 ICC AVX + OpenMP + O3

Dolphin 3.5-356 x64 ICC AVX + OpenMP

Dolphin 3.5-350 x64 ICC AVX + OpenMP
Do any of these builds work with AMD processors? I have a Bulldozer and can't get any of these to even start. Double click and nothing happens, not even a error.
Quote:I'm sharing these ICC Intel-only Windows builds

Please read the post before asking such questions.
Intel Processor-Specific Optimizations means they won't run on AMD, however I could offer separate builds without these but ICC compiled binaries used to take the slowest codepath if it didn't detect GenuineIntel in the CPUID. Is this still the case on newer Intel C++ compilers? If the GenuineIntel patch is still required, feel free to provide it and I'll make AMD builds too.

Also keep in mind that in my own testing these builds have been within 3%-5% speed of the official builds (with Intel optimizations), so AMD-specific builds being less optimized may show no difference at all. I've only done testing on my fast CPU so maybe slower CPU's show a bigger difference, I don't know... but sometimes just a few percent can make the difference between something being playable or not.
I really appreciate these builds. The Last Story runs at a solid frame rate in pretty much every situation now.
If you're not adding an /arch switch with /Qax then the baseline path will be SSE2 which non-Intel CPUs will take. Anger Fog's optimizing manual has a way to patch that out from the source code but I don't have an AMD CPU to test with so I haven't bothered. This will patch it the exe.
new builds up, 402-master and 428-real-wiimote-scanning

(02-15-2013, 05:17 PM)lamedude Wrote: [ -> ]If you're not adding an /arch switch with /Qax then the baseline path will be SSE2 which non-Intel CPUs will take. Anger Fog's optimizing manual has a way to patch that out from the source code but I don't have an AMD CPU to test with so I haven't bothered. This will patch it the exe.
are you sure that's still the behavior on ICC version 11 and higher? hmm, well I guess it can't hurt. Thanks, I will maybe give it a try whenever real-wiimote-scanning branch gets merged into master (maintaining two builds is already getting to be a PITA). Probably will do /Arch:SSE3 with /Qax:AVX for the AMD build that way it scales up from SSE3 to AVX (there's no /arch:SSSE3 unfortunately otherwise I'd do that).
The AMD settlement just made Intel put a notice up that's on almost ICC every page; the GenuineIntel check is still there.
All the /Q[a]x options do have an /arch counterpart but the ones not listed in the IDE would only be useful for Bobcat and VIA CPUs. All other pre-BD AMD CPUs only went up SSE3.
lamedude already posted that patch >.>
A short test showed that your 3.5-413 ICC build runs faster than the 3.5-413 unoptimized official build on my mobile Corei7 SB. Tested in Zelda SS.

On the Sandship, both builds start with 28FPS. If you go outside, the standard build drops to 23 FPS sometimes, while the ICC build only drops to 26 FPS which is great. Zelda SS never ran faster but unfortunately the silent realm crashes in DX9 and 11 are unfixed and OpenGL runs much slower.

EDIT: typo corrected
Pages: 1 2 3 4 5