Dolphin, the GameCube and Wii emulator - Forums

Full Version: Technical background of graphics card requirements
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
Hi,

I was wondering what the reasons for the graphics cards requirements of dolphin are. It's easily understandable that the CPU requirements are a multiple of those on the original platform, because all the instructions need to be emulated. However, why are the graphics card requirements also much higher? Isn't all of the actual emulation performed on the CPU?
The corresponding thread seems to suggest that video memory bandwidth is the relevant specification. Why is that?

I'm just interested in this and it would be cool if someone could write a few words about it (which are a bit more in-depth than "it's an emulator and that's why it's slower").

Greetings
The GPU requirements actually aren't high. You could get away with using most Sandybridge IGP's as long as you only use 1x internal resolution. If you want to use 3 or 4x IR, you'll need a pretty decent video card.

This should help in determining what you can do. http://forums.dolphin-emu.org/showthread.php?tid=20712
Hey,

this was less about "how can I tweak the settings to get better performance", but more about "which of the things dolphin has to do are responsible for the graphics card requirements".
On my notebook, the graphics card is holding back the processor. I don't expect this can be changed in any way, but I'd just be interested in why that is (simply because I'd like to know). Other games (like, whatever, Xonotic) run in full resolution with much better performance, and I was wondering what the main uses for graphics processing power in dolphin are (and, especially, in which areas it has to do much more work than a native application). The graphics card memory doesn't seem to be even close to full, for example.

Thanks and Greetings
Sven
(01-10-2012, 08:07 AM)Runo Wrote: [ -> ]http://forums.dolphin-emu.org/showthread.php?tid=18414
Try this one then.

Hey,

yeah, I found that thread. However, I compared my card (NVS 3100M) to some of those listed in the thread for 1x IR (like the GF 8500 GT) and it seems well above the requirements (at least if judging after, whatever, 3D Mark 06 score or something else; I'd compare the exact specs but they're very difficult to find and very difficult to compare). Neverthereless it seems to be the bottleneck, as another computer with a better graphics card and worse CPU performs better, and yet another with the same CPU and a better graphics card performs better too (same settings).

Still, that wasn't really my question Smile Those are all lists of benchmarks and of what should work or could work -- doesn't really matter. It doesn't work, and it probably can't be changed. I'd just be interested in what kinds of heavy tasks are performed on the graphics card, when comparing with a native application.

Greetings

Edit: Reading wikipedia, seems like the card's 64 bit bus bandwidth limits its memory throughput to 12.8GB/s. Still more than listed in the requirements thread, although slightly. That's something I'd be interested in, for example: Why is so much memory bandwidth needed here?
Quote:Hi,

I was wondering what the reasons for the graphics cards requirements of dolphin are. It's easily understandable that the CPU requirements are a multiple of those on the original platform, because all the instructions need to be emulated. However, why are the graphics card requirements also much higher?

You just said why. Just like the cpu instructions the gpu instructions also need to be emulated.

Quote: Isn't all of the actual emulation performed on the CPU?

That depends on what you mean by that.

I think first we need to establish some basics that you may or may not already know.

PC applications that use a GPU (video games for example) are programs written in a typical programming language (C/C++/C#/java/python/whatever). Whenever the programmer needs to do something on the GPU instead of the cpu (draw a textured rectangle for example) they write a shader to do that task and have the main program call that shader whenever they need to do that task. Shaders are just programs that run on the gpu that manipulate texture and vertex data. So I might write a shader that draws a red rectangle and call it DrawRedRect. Then every time my program needs to draw a red rectangle I just call the shader DrawRedRect. So the program runs on the cpu and simply called a shader every time it needs to the GPU to do something for it. The alternative would be to not use the gpu. Instead of writing a shader for DrawRedRect I could make a function called DrawRedRect and write it using C++ code instead of a shader language. I would then call that function instead. Since that function is not a shader it would run on the cpu just like the rest of the program, which is called software rendering. Shaders are basically functions that run on the GPU, we use the terms shader to differentiate them from regular functions that run on the cpu. The whole idea behind using shaders is that the GPU can do certain things much faster than the cpu can (drawing a rectangle for example), which is why it is commonly referred to as "hardware acceleration" (as in we are using a dedicated piece of hardware, the gpu, to accelerate the task by performing it faster than the cpu could).

When the program calls a shader (which is basically just a function written in a shader language) the API (direct3d, openGL, etc.) compiles the shader into shader bytecode. Then the GPU drivers compile the shader bytecode into GPU machine code (binary instructions) which is stored in a command buffer. The GPU reads in the instructions from the command buffer and decodes them into microcode which configures the stream processors. The program itself runs on the cpu, it is impossible to write what we would typically think of as an application that runs entirely on the gpu. The program simply emits instructions (by calling shaders) to make the gpu do work whenever it needs the gpu to do something. The program running on the cpu can be thought of as a client and the gpu can be thought of as a server. Just like with a real server you need a client (a program running on the cpu) to tell the server what to do.

Dolphin is technically one program. But for the purposes of explaining this we can think of each thread as being a separate program that does something different. The cpu emulator engine (interpreter, JIT recompiler, JITIL recompiler) can be though of as a program that emulates the gecko cpu in the GC/Wii. This program does not use the gpu for anything. The video thread is like a program that emulates the flipper GPU in the GC/Wii. This program uses a framework (openGL, direct3d9, or direct3d11) to provide access to the GPUs resources (because that's the only way to access the gpus resources). We also have a software renderer that does not use the gpu at all for flipper emulation. GPUs are designed for graphics related tasks, and since that's exactly what the flipper does it makes sense to use shaders running on our gpus to speed up the emulation of a lot of stuff (as evidenced by how painfully slow the software renderer is). However the video thread is like a program that emulates the flipper gpu. As such it has lots of c++ code that runs on the cpu and lots of shaders as well to do the more visual oriented stuff (for example if we needed to change the state of a flipper register that would be done with c++ code, but if the flipper needs to filter a texture it would make sense to do that on the gpu). Remember that we are emulated the hardware, not the software. The video thread is software that mimicks the behavior of the flipper gpu, it doesn't just produce the same result as the game would on the real hardware, it does it by actually mimicking the hardware, which takes things a step further. This "program" can be thought of like a virtual flipper, and as such needs to be able to do EVERYTHING that the real flipper can do. All the way down to registers and state changes. The simple fact that dolphin has to do all of this at the software level makes it very inefficient compared to the real hardware. It's the exact same reason that gecko (the GC/Wii cpu) emulation requires a much more powerful cpu.

But there is another reason besides the usual "because it has to emulate it" reason. You see back in the "old days" (not that long ago quite frankly) shaders were fixed functions, they were not programmable like they are these days. What that means is that the gpu was designed to do certain tasks a certain way through an API, and as a person interested in using the GPU for hardware acceleration all you could do with shaders was call the shaders that had already been made for you and included the with APIs libraries. You could not make your own shaders to do whatever you wanted, you had to use the shaders that were already made for you because shaders were not programmable. Our modern gpus use programmable shaders, but the flipper is totally fixed function. We need to emulate these fixed functions and fixed shaders with programmable shaders that our developers write. So once again we are trying to do something that the flipper gpu was designed for at the hardware level using software, and software is always going to be less efficient that fixed hardware.

These frameworks that we have to use to provide hardware acceleration have a set of rules that we have to follow to use them. Although the shaders are programmable through the use of a shader language there are many restrictions put into place by both the API and the hardware. Some of these restrictions prevent dolphin from emulating certain features of the flipper (such as bounding boxes for example).

But seriously all of that can be summed with "because it's an emulator". And anyone who has ever worked on or looked at the code for an emulator before will instantly know what you mean.

Quote:The corresponding thread seems to suggest that video memory bandwidth is the relevant specification. Why is that?

.....because the gpu has its own dedicated memory, called video memory.

Pretend for a minute that you had a PC without a graphics card. Your computer performs 4 basic functions, input, output, processing, and storage. When your PC runs a program is uses ram to store data and your microprocessor to process the data. Well your graphics card can be thought of like a PC in that sense. The GPU is a microprocessor that processes data, and the video memory is the memory that gpu uses to store data. In both cases the microprocessor needs to constantly read/write data to and from memory. So in both the cases the memory and microprocessor are both important. The GPU though typically needs more memory bandwidth than the cpu does. This is because graphics work tends to involve huge throughput and massive data structures (meshes, textures, shadow maps, framebuffers, etc.). The larger these structures are the more memory bandwidth is needed.

Now the flipper uses Mosys 1T-sram with about 2.7 GB/s of bandwidth. Dolphin needs about 4.5 times as much video memory bandwidth to emulate the chip at its native resolution. This is because emulation is less efficient than the real hardware, the emulator needs to move data around a lot more often than the real hardware does.

Quote:I'm just interested in this and it would be cool if someone could write a few words about it (which are a bit more in-depth than "it's an emulator and that's why it's slower").

But that is essentially why it's slower. "It's an emulator" sums up the reasons why pretty well. If you understand what is involved with emulation then you understand why the requirements are so high.

I'm not going to respond to the rest of your posts until later since I'm ready to go to sleep now. And I'm not going to bother to read through this post and do a quick proofreading scan like a normally do. I'll get back to you in a day or two.
Hey there,

wow, that's a huge post! Thank you very much for your efforts, first of all.

(01-11-2012, 12:06 PM)NaturalViolence Wrote: [ -> ]PC applications that use a GPU (video games for example) are programs written in a typical programming language (C/C++/C#/java/python/whatever). Whenever the programmer needs to do something on the GPU instead of the cpu (draw a textured rectangle for example) they write a shader to do that task and have the main program call that shader whenever they need to do that task. Shaders are just programs that run on the gpu that manipulate texture and vertex data. So I might write a shader that draws a red rectangle and call it DrawRedRect. Then every time my program needs to draw a red rectangle I just call the shader DrawRedRect. So the program runs on the cpu and simply called a shader every time it needs to the GPU to do something for it. The alternative would be to not use the gpu. Instead of writing a shader for DrawRedRect I could make a function called DrawRedRect and write it using C++ code instead of a shader language. I would then call that function instead. Since that function is not a shader it would run on the cpu just like the rest of the program, which is called software rendering. Shaders are basically functions that run on the GPU, we use the terms shader to differentiate them from regular functions that run on the cpu. The whole idea behind using shaders is that the GPU can do certain things much faster than the cpu can (drawing a rectangle for example), which is why it is commonly referred to as "hardware acceleration" (as in we are using a dedicated piece of hardware, the gpu, to accelerate the task by performing it faster than the cpu could).

When the program calls a shader (which is basically just a function written in a shader language) the API (direct3d, openGL, etc.) compiles the shader into shader bytecode. Then the GPU drivers compile the shader bytecode into GPU machine code (binary instructions) which is stored in a command buffer. The GPU reads in the instructions from the command buffer and decodes them into microcode which configures the stream processors. The program itself runs on the cpu, it is impossible to write what we would typically think of as an application that runs entirely on the gpu. The program simply emits instructions (by calling shaders) to make the gpu do work whenever it needs the gpu to do something. The program running on the cpu can be thought of as a client and the gpu can be thought of as a server. Just like with a real server you need a client (a program running on the cpu) to tell the server what to do.
Yep, known so far. Thanks for explaining neverthereless.

(01-11-2012, 12:06 PM)NaturalViolence Wrote: [ -> ]Dolphin is technically one program. But for the purposes of explaining this we can think of each thread as being a separate program that does something different. The cpu emulator engine (interpreter, JIT recompiler, JITIL recompiler) can be though of as a program that emulates the gecko cpu in the GC/Wii. This program does not use the gpu for anything. The video thread is like a program that emulates the flipper GPU in the GC/Wii. This program uses a framework (openGL, direct3d9, or direct3d11) to provide access to the GPUs resources (because that's the only way to access the gpus resources).
So the two threads dolphin uses are for emulating the CPU and the GPU? That's interesting to know.

(01-11-2012, 12:06 PM)NaturalViolence Wrote: [ -> ]We also have a software renderer that does not use the gpu at all for flipper emulation. GPUs are designed for graphics related tasks, and since that's exactly what the flipper does it makes sense to use shaders running on our gpus to speed up the emulation of a lot of stuff (as evidenced by how painfully slow the software renderer is). However the video thread is like a program that emulates the flipper gpu. As such it has lots of c++ code that runs on the cpu and lots of shaders as well to do the more visual oriented stuff (for example if we needed to change the state of a flipper register that would be done with c++ code, but if the flipper needs to filter a texture it would make sense to do that on the gpu). Remember that we are emulated the hardware, not the software. The video thread is software that mimicks the behavior of the flipper gpu, it doesn't just produce the same result as the game would on the real hardware, it does it by actually mimicking the hardware, which takes things a step further. This "program" can be thought of like a virtual flipper, and as such needs to be able to do EVERYTHING that the real flipper can do. All the way down to registers and state changes. The simple fact that dolphin has to do all of this at the software level makes it very inefficient compared to the real hardware. It's the exact same reason that gecko (the GC/Wii cpu) emulation requires a much more powerful cpu.
Ah, okay! So the graphics card emulation is really emulating the graphics card as a "server", and the CPU emulation sends request to that "server"? That makes stuff quite a lot clearer. I thought dolphin would be able to directly translate the game's requests to the graphics card to OpenGL calls (like wine does with DirectX, for example; so if the game requests "draw a vertex" in an instruction then that is translated to the corresponding OpenGL call). But thinking about it, that seems rather short sighted... Smile

(01-11-2012, 12:06 PM)NaturalViolence Wrote: [ -> ]But there is another reason besides the usual "because it has to emulate it" reason. You see back in the "old days" (not that long ago quite frankly) shaders were fixed functions, they were not programmable like they are these days. What that means is that the gpu was designed to do certain tasks a certain way through an API, and as a person interested in using the GPU for hardware acceleration all you could do with shaders was call the shaders that had already been made for you and included the with APIs libraries. You could not make your own shaders to do whatever you wanted, you had to use the shaders that were already made for you because shaders were not programmable. Our modern gpus use programmable shaders, but the flipper is totally fixed function. We need to emulate these fixed functions and fixed shaders with programmable shaders that our developers write. So once again we are trying to do something that the flipper gpu was designed for at the hardware level using software, and software is always going to be less efficient that fixed hardware.
That's also interesting to know, that the original hardware had a fixed function pipeline. It's logical that this drains performance. Oh, and it explains why the shader generator class isn't a whole library by itself (I've been wondering about that before).

(01-11-2012, 12:06 PM)NaturalViolence Wrote: [ -> ]But seriously all of that can be summed with "because it's an emulator". And anyone who has ever worked on or looked at the code for an emulator before will instantly know what you mean.
Yeah, sure it's all because it's an emulator (why else). It's just not a very specific statement, that's why I asked. Smile

(01-11-2012, 12:06 PM)NaturalViolence Wrote: [ -> ]Now the flipper uses Mosys 1T-sram with about 2.7 GB/s of bandwidth. Dolphin needs about 4.5 times as much video memory bandwidth to emulate the chip at its native resolution. This is because emulation is less efficient than the real hardware, the emulator needs to move data around a lot more often than the real hardware does.
That's interesting information, too. It's especially surprising for me that the original hardware had such a large memory bandwidth (the CPU is more 1998-ish, isn't it).

Thank you very much for your explanations, again! Smile

Greetings
Quote:That makes stuff quite a lot clearer. I thought dolphin would be able to directly translate the game's requests to the graphics card to OpenGL calls (like wine does with DirectX, for example; so if the game requests "draw a vertex" in an instruction then that is translated to the corresponding OpenGL call).

But that is basically what happens, at least with the simpler stuff. OpenGL treats the GPU like a server that processes its commands, and the drivers act as the middleman.

I was just using that as an alternate way of explaining how openGL/d3d work.

Basically the dolphin devs need to know:
1. How a PC works in depth
2. How the GC/Wii works in depth
3. How we can mimic the behavior of the GC/Wii hardware using PC hardware

I was attempting to explain the first point.

Quote:So the two threads dolphin uses are for emulating the CPU and the GPU? That's interesting to know.

If dual core is turned on and LLE on thread is turning off (or if you're using HLE audio) then yes. Those are the default settings. Turning dual core off will result in a single thread. Turning dual core on and using LLE audio with LLE on thread turned on will result in three threads, cpu emulator engine, video thread, and LLE audio thread.

Quote:That's interesting information, too. It's especially surprising for me that the original hardware had such a large memory bandwidth (the CPU is more 1998-ish, isn't it).

Video game consoles in general tend to use highly embedded/integrated hardware with enormous interconnect and memory bandwidth. Also keep in mind that that 2.8GB/s is shared by both the gecko (cpu) and flipper (gpu) so it needed to be pretty fast. The cpu has to access the memory through a 1.3 GB/s FSB, so its memory performance is capped at 1.3 GB/s. The flipper uses most of the memory bandwidth.

Take a look at this: http://en.wikipedia.org/wiki/PowerPC_7xx
It performed inbetween a pentium II and pentium III from what I can tell.

Quote:PowerPC 745/755
Motorola revised the 740/750 design in 1998 and shrunk die size to 51 mm2 thanks to a newer aluminium based fabrication at 0.22 μm. The speeds increased to up to 600 MHz. The 755 was used in some iBook models. After this model, Motorola chose not to keep developing the 750 processors in favour of their PowerPC 7400 processor and other cores.

PowerPC 750CX
IBM continued to develop the PowerPC 750 line and introduced the PowerPC 750CX (code-named Sidewinder) in 2000. It has a 256 KiB on-die L2 cache; this increased performance while reducing power consumption and complexity. At 400 MHz, it drew under 4 W. The 750CX had 20 million transistors including its L2 cache. It had a die size of 43 mm2 through a 0.18 μm copper process. The 750CX was only used in one iMac and iBook revision.

PowerPC 750CXe
750CXe (codename Anaconda), introduced in 2001, was a minor revision of 750CX which increased its frequency up to 700 and memory bus to 133 MHz, from 100 MHz. The 750CXe also featured improved floating-point performance over the 750CX. [1] Several iBook models and the last G3-based iMac used this processor.
A cost reduced version of 750CXe, called 750CXr, is available at lower frequencies.

Gekko
Gekko is the custom central processor for the Nintendo GameCube game console. It is based on a PowerPC 750CXe and adds about 50 new instructions as well as a modified FPU capable of some SIMD functionality. It has 256 KiB of on die L2 cache, operates at 485 MHz with a 162 MHz memory bus, is fabricated by IBM on a 180 nm process. The die is 43 mm2 large.

(01-12-2012, 08:50 AM)NaturalViolence Wrote: [ -> ]
Quote:That makes stuff quite a lot clearer. I thought dolphin would be able to directly translate the game's requests to the graphics card to OpenGL calls (like wine does with DirectX, for example; so if the game requests "draw a vertex" in an instruction then that is translated to the corresponding OpenGL call).

But that is basically what happens, at least with the simpler stuff. OpenGL treats the GPU like a server that processes its commands, and the drivers act as the middleman.
Okay, but there is a somewhat heavy emulation layer in between (it's even running in its own thread). With the wine-like "emulation" I assumed there isn't really much to do except just "renaming" the incoming function calls. That's the best-case scenario, obviously, when you already have an API function which does the thing requested by the game and you just have to call it. But if I'm understanding what you wrote correctly then this isn't the case for quite a lot of things.

Greetings
Fwiw, I certainly wouldn't call the GPU emulation in Dolphin "wine-like". In fact, right now I can't think of anything they might have in common ... Tongue
Pages: 1 2 3