Dolphin, the GameCube and Wii emulator - Forums

Full Version: [Unofficial] Dolphin DX12 backend
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Hi all, I've been experimenting with adding a DirectX 12 backend to Dolphin, and finally have something to release! It can be decently faster depending on the game/system/settings (up to 50%), binaries and source are below. It was a good way to get to know Dolphin's architecture better, and hope it might be interesting for others to try out.

[Image: 1nzhcIV.png] [Image: UcFERyH.png]

Performance
Generally, graphics-intensive games get a nice win, while (Gamecube CPU)-bound games (Zelda OOT from the 'bonus disk' is a good example) are the same - graphics wasn't on the critical path there. At higher resolutions, graphics becomes more important, so the relative improvement can increase there. In general, CPU usage is now much lower for the same workload relative to DX11/OpenGL.

Results below from a few games at 2.5x native resolution on NVidia and AMD hardware (and raw FPS data attached):

[Image: 5o2Hn11.png] [Image: LB4h61V.png]

Requirements
- Windows 10
- Latest graphics driver, and a AMD 7000-series, Intel HD 4400, or nVidia 600-series GPU or higher.
- VS 2015 Redist

Note: This doesn't specifically improve shader compilation stutters the first time shaders are seen, it's only faster in the 'steady state' - this could definitely be improved with the extra CPU cycles now available..


This is obviously 'as is', but please reply back with any bugs/issues seen. For more details, please see the Github readme. I hope to continue to improve the code, and pull requests would definitely be welcome.

Source: https://github.com/hdcmeta/dolphin
Download here: https://www.dropbox.com/s/gac7jufr9iob8tc/dolphin_dx12_v0.98.zip?dl=0

I've tried to make the code follow the contribution guidelines, and it should be a pretty conformant port of the DX11 backend, so hope for this to possibly end up in the main Dolphin branch. Open to any feedback on the initial code, and I'll try to submit a pull request in the next couple weeks if the code looks ok.


Changelog

v0.98 (1/24/2016)
- Fix issue on certain systems where frame-rate not properly uncapped (when vsync is disabled, and CPU set to > 100%)
- Integrate upstream changes

v0.97 (1/19/2016)
- Better tracking of CPU/GPU interactions, should resolve race-condition-induced corruption
- Small fix for a 'dirty' shutdown corruption shader caches (which used to possibly cause a crash on the next start)
- Includes all current upstream changes
- Misc behind-the-scenes refactoring/cleanup/fixes

v0.96 (1/5/2016)
- Fix for very large texture uploads (e.g. 4096x4096 custom textures)
- Misc behind-the-scenes refactoring/cleanup

v0.95 (1/3/2016)
- Prevent backend from showing up on systems without D3D12 support.

v0.94 (1/2/2016)
- Fixed bug in EFB depth buffer readback, could cause misc corruption issues.

v0.93 (1/1/2016)
- Fixed error in texture readback, was causing some misc corruption issues.
- Further refactored shader cache, and fixed some issues that were causing it to not cache shaders (causing constant regeneration).

v0.92 (12/30/2015)
- Fixed issue if game sets a viewport with non MIN_DEPTH/MAX_DEPTH depth. Caused incorrect results in MadWorld, possibly others.
- Lots of refactoring behind the scenes, based on pull request feedback. Nothing should have regressed (verified in local testing).

v0.91 (12/22/2015)
- Fixed full-screen operation when starting in full-screen mode (thanks rlaugh0095)

v0.90 (12/21/2015)
- Several rendering correctness bugs fixed. If you were seeing incorrect rendering before, there's a decent chance it has been fixed.
- Fixed full-screen operation
- Add clamp to texture copies to too-small destinations.. a 'real' fix needs to occur above the VideoBackend layer, and is in progress here: https://github.com/dolphin-emu/dolphin/pull/3355
- Moved to new versioning scheme..

12/21/2015
- Further fix to multisampling. Not claiming anything this time :-).. fixes crash when (multisampled) Color EFB accessed (fixes crash in SMG).
- Fixes possible corner-case crash when games presents frames without first uploading any texture data.
- Fixed small bug that could cause unnecessary stalls/performance loss in certain cases.

12/20/2015
- Multi-sampling 'really' fixed. Resolve issue in titles that sampled from depth buffer.
- Fixed texture upload race condition.

12/18/2015
- Multi-sampling fixed (though appears buggy on AMD hardware, YMMV)
- Per-pixel lighting fixed
- Fixed issue where CPU could get too far ahead of GPU, cause corruption.

12/17/2015
- Initial release
Great news and nice work ! Smile
Performance enhancement sounds sexy, especially for Mario Galaxy.

Well if some day Dolphin officially have a DX12 backend, I guess I will considering Windows 10 upgrade.
IF you're interested in this not being unofficial, maybe open a Pull Request tagged RFC or WIP. I don't have Windows 10 to test this, but the gains seem non-trivial and make sense based on your graphs. OoT has zero GFX overhead, Crazy Taxi has very little, Twilight Princess and Super Mario Galaxy are very heavy.
How useful are performance comparisons when your code is littered with TODOs that might make things slower once implemented? I would take your graphs more seriously if they were done on games that never hit one of your TODO paths that has an equivalent implementation in our other backends.
Also, looking at the code, my bet is that a lot of the benefits come from the queued command list implementation. I'm curious how D3D12 compares if you make the D3D code run in the same thread as the GPU emulation code like other backends do.

Not that it's a bad thing -- but if threading the GPU backend code has a big impact with D3D12 I wouldn't be surprised if it also did have the same impact on GL/D3D11.
> Open to any feedback on the initial code, and I'll try to submit a pull request in the next couple weeks if the code looks ok.
Please open the PR soon, so you'll get more early feedback Big Grin

*very* *very* nice work Big Grin

EDIT: Also, please visite us on #dolphin-dev @freenode on IRC. Most development talk is there, and currently we're only talking about d3d12 Tongue
Wow amazing, hope we can find a way to make it work on most games.
(12-18-2015, 07:35 PM)JMC47 Wrote: [ -> ]IF you're interested in this not being unofficial, maybe open a Pull Request tagged RFC or WIP.  I don't have Windows 10 to test this, but the gains seem non-trivial and make sense based on your graphs.  OoT has zero GFX overhead, Crazy Taxi has very little, Twilight Princess and Super Mario Galaxy are very heavy.

Ok, I'll try to do that this weekend then.

(12-18-2015, 07:43 PM)delroth Wrote: [ -> ]How useful are performance comparisons when your code is littered with TODOs that might make things slower once implemented? I would take your graphs more seriously if they were done on games that never hit one of your TODO paths that has an equivalent implementation in our other backends.

Good question - honestly, most of the TODOs are to make things faster :-). The only TODO I can think of that will decrease performance is to implement performance queries, and the impact from that should be pretty small. Are there specific TODOs that are being hit on the above titles that you can see? I can hopefully fix those up pretty quickly..

(12-18-2015, 07:48 PM)delroth Wrote: [ -> ]Also, looking at the code, my bet is that a lot of the benefits come from the queued command list implementation. I'm curious how D3D12 compares if you make the D3D code run in the same thread as the GPU emulation code like other backends do.

Not that it's a bad thing -- but if threading the GPU backend code has a big impact with D3D12 I wouldn't be surprised if it also did have the same impact on GL/D3D11.

Yeah, it was really important to have this threaded, but I found this is actually exactly what D3D11 and OpenGL already (automatically) do. For those APIs, the graphics driver creates its own background thread, and does a similar sort of thing (the main thread queues the work, the background thread processes it). Just in D3D12, the app has to do this itself.

If you want to experiment with turning off the automatic threading on D3D11, you can pass in the "D3D11_CREATE_DEVICE_PREVENT_INTERNAL_THREADING_OPTIMIZATIONS" flag at device creation time.

(12-18-2015, 08:07 PM)degasus Wrote: [ -> ]> Open to any feedback on the initial code, and I'll try to submit a pull request in the next couple weeks if the code looks ok.
Please open the PR soon, so you'll get more early feedback Big Grin

*very* *very* nice work Big Grin

EDIT: Also, please visite us on #dolphin-dev @freenode on IRC. Most development talk is there, and currently we're only talking about d3d12 Tongue

Will do!
Nice work. looking really good
Excellent work! Open a PR please!

I'll be testing against my GTX 770 this weekend. Smile
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23