Hi degasus,
sorry this took so long, busy times.
First, the measurements:
In order to minimize unwanted high-latency side-effects, I created a ramdisk containing the ISO (RMGP01) and set dolphin's CPU affinity to 7. (My machine has 4 hyper-threaded cores, 7 refers to the last one, just to avoid confusion.). I then measured only this one CPU using perf and let it take 500k samples (just cycle count) in the save file selection dialog, which appears to me to be deterministic when rendering the effects in the background. I also set dolphin's internal resolution to 2x, to create some workload for the CPU.
Code:
$ taskset 0x80 ./ubuntu_build/usr/games/dolphin-emu-nogui tmp/RMGP01.wbfs
$ perf top -cpu=7 > results_ubuntu
$ taskset 0x80 ./git_build/dolphin/Build5/Binaries/dolphin-emu-nogui tmp/RMGP01.wbfs
$ perf top -cpu=7 > results_git
Results:
Code:
$ cat results_git
5,05% dolphin-emu-nogui [.] OGL::ProgramShaderCache::UploadConstants
3,49% [kernel] [k] __i915_wait_request
1,99% dolphin-emu-nogui [.] OpcodeDecoder_Run<false>
1,56% [kernel] [k] i915_parse_cmds
1,50% libpthread-2.19.so [.] pthread_mutex_lock
1,49% libc-2.19.so [.] memset
1,19% dolphin-emu-nogui [.] GetCRC32
1,13% dolphin-emu-nogui [.] VertexLoaderManager::RunVertices
1,11% dolphin-emu-nogui [.] RunGpuLoop
1,09% [kernel] [k] gen6_ring_get_seqno
1,02% libdrm_intel.so.1.0.0 [.] 0x00000000000060ef
1,02% dolphin-emu-nogui [.] TextureCache::Load
1,00% dolphin-emu-nogui [.] CommandProcessor::GatherPipeBursted
1,00% perf-3862.map [.] 0x000000004187608e
0,88% dolphin-emu-nogui [.] LoadBPReg
0,82% dolphin-emu-nogui [.] JitInterface::CompileExceptionCheck
0,71% dolphin-emu-nogui [.] GeneratePixelShader<ShaderUid<pixel_shader_uid_data> >
0,67% dolphin-emu-nogui [.] CommandProcessor::SetCPStatusFromGPU
0,66% libc-2.19.so [.] __memcpy_sse2_unaligned
0,65% dolphin-emu-nogui [.] VertexManager::CalculateZSlope
0,60% dolphin-emu-nogui [.] OGL::ProgramShaderCache::SetShader
0,59% perf-3862.map [.] 0x000000004263efe6
0,58% dolphin-emu-nogui [.] OGL::Renderer::UpdateEFBCache
0,55% dolphin-emu-nogui [.] ReadDataFromFifo
0,51% perf-3862.map [.] 0x0000000042a7a9e1
0,51% dolphin-emu-nogui [.] IndexGenerator::AddStrip<true>
0,44% perf-3862.map [.] 0x0000000042a7864d
0,43% libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt
0,41% libc-2.19.so [.] __memcmp_sse4_1
0,34% dolphin-emu-nogui [.] LoadXFReg
0,34% [kernel] [k] copy_user_enhanced_fast_string
0,34% dolphin-emu-nogui [.] GPFifo::Write64
0,32% libdrm_intel.so.1.0.0 [.] 0x000000000000853c
0,32% dolphin-emu-nogui [.] VertexManager::PrepareForAdditionalData
0,31% [kernel] [k] i915_gem_execbuffer_relocate_entry
0,31% dolphin-emu-nogui [.] GetVertexShaderUid
0,30% dolphin-emu-nogui [.] GPFifo::FastCheckGatherPipe
0,29% i965_dri.so [.] 0x000000000033bca8
0,27% perf-3862.map [.] 0x0000000042a786f1
0,26% dolphin-emu-nogui [.] ZeldaAudioRenderer::ApplyReverb
0,26% [kernel] [k] __list_del_entry
0,25% dolphin-emu-nogui [.] VertexShaderManager::TransformToClipSpace
0,24% libc-2.19.so [.] _int_malloc
0,23% perf-3862.map [.] 0x0000000042a7b37e
0,23% dolphin-emu-nogui [.] CommandProcessor::SetCPStatusFromCPU
0,23% libc-2.19.so [.] __memmove_ssse3_back
0,22% dolphin-emu-nogui [.] Renderer::EFBToScaledX
0,22% libdrm_intel.so.1.0.0 [.] 0x00000000000055ae
0,21% dolphin-emu-nogui [.] VertexManager::Flush
0,20% perf-3862.map [.] 0x000000004187617a
0,20% dolphin-emu-nogui [.] TextureCache::DoPartialTextureUpdates
0,20% perf-3862.map [.] 0x0000000042a7a9f3
Code:
$ cat results_ubuntu
2,13% [kernel] [k] __i915_wait_request
1,46% dolphin-emu-nogui [.] 0x000000000036c420
0,99% dolphin-emu-nogui [.] std::less<unsigned long>::operator()
0,83% libc-2.19.so [.] __memcpy_sse2_unaligned
0,76% dolphin-emu-nogui [.] 0x0000000000440c3b
0,71% [kernel] [k] i915_parse_cmds
0,68% [kernel] [k] gen6_ring_get_seqno
0,61% libc-2.19.so [.] memset
0,60% libpthread-2.19.so [.] pthread_mutex_lock
0,53% dolphin-emu-nogui [.] 0x0000000000440c14
0,51% perf-5085.map [.] 0x00000000428d808e
0,50% dolphin-emu-nogui [.] 0x0000000000440ca6
0,49% dolphin-emu-nogui [.] 0x0000000000440cbb
0,47% dolphin-emu-nogui [.] 0x0000000000440ccc
0,47% dolphin-emu-nogui [.] 0x0000000000440d6a
0,46% libdrm_intel.so.1.0.0 [.] 0x00000000000060ef
0,45% dolphin-emu-nogui [.] 0x000000000031172c
0,41% dolphin-emu-nogui [.] 0x0000000000314a1c
0,36% perf-5085.map [.] 0x0000000040a74606
0,35% dolphin-emu-nogui [.] std::swap<int>
0,33% dolphin-emu-nogui [.] 0x0000000000314a09
0,33% perf-5085.map [.] 0x0000000040eb57ed
0,31% dolphin-emu-nogui [.] std::__atomic_base<unsigned char*>::operator+=
0,29% dolphin-emu-nogui [.] 0x0000000000440c0f
0,29% dolphin-emu-nogui [.] 0x0000000000440c9b
0,29% perf-5085.map [.] 0x0000000040eb3459
0,28% dolphin-emu-nogui [.] std::_Hashtable<unsigned int, unsigned int, std::allocator<unsigned int>, std::__detail::_Identity, std::equal_to<unsigned int>, std::hash<unsigned
0,27% libc-2.19.so [.] __memcmp_sse4_1
0,27% dolphin-emu-nogui [.] std::__atomic_base<unsigned char*>::operator unsigned char*
0,26% dolphin-emu-nogui [.] 0x0000000000145cdb
0,26% dolphin-emu-nogui [.] 0x0000000000440cd6
0,26% dolphin-emu-nogui [.] 0x0000000000440da0
0,25% dolphin-emu-nogui [.] 0x000000000036c400
0,24% dolphin-emu-nogui [.] 0x0000000000440cbe
0,23% libc-2.19.so [.] __memmove_ssse3_back
0,23% dolphin-emu-nogui [.] 0x00000000003149fc
0,22% dolphin-emu-nogui [.] 0x00000000003b14dc
0,21% dolphin-emu-nogui [.] 0x000000000036c3a9
0,21% dolphin-emu-nogui [.] 0x0000000000440d74
0,19% dolphin-emu-nogui [.] 0x00000000003149dd
0,19% libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt
0,19% dolphin-emu-nogui [.] 0x0000000000440c0b
0,18% dolphin-emu-nogui [.] std::array<short, 88ul>::operator[]
0,18% dolphin-emu-nogui [.] 0x0000000000440d44
0,18% dolphin-emu-nogui [.] 0x0000000000440bf5
0,17% dolphin-emu-nogui [.] 0x0000000000440c02
0,17% dolphin-emu-nogui [.] 0x000000000011a143
0,17% dolphin-emu-nogui [.] 0x0000000000440c19
0,17% i965_dri.so [.] 0x0000000000184e56
0,17% dolphin-emu-nogui [.] 0x0000000000320ad2
0,17% dolphin-emu-nogui [.] 0x000000000015042a
Observations:
- General performance is at 100% for the git build and at roughly 60% for the ubuntu build.
- I cannot really make sense of the addresses perf gives. Interpreting them as absolute values ends up in unallocated memory (checked statically and wihin gdb), interpreting them as relative addresses to the start of the text section does not lead me to anything useful either (not aligned to function beginnings). Before I start investing more time into this problem: Do you have an idea how I could resolve the addresses within the ubuntu dolphin build in an easier manner than reverse engineering the binary? Do you maybe have a ubuntu build with symbols?
- In one of my earlier measurements, I noticed that dolphin spends a good amount of time waiting for fgets to return. The ramdisk solution provided a 10% instant speed boost, you might want to think about creating a new config option that makes dolphin read the whole image into the memory instead of reading it from the disk.
I may also happily provide the two binaries and my .dolphin config, if you need them.
Cheers,
MX