Just want to point out that you are quoting posts from 2008. What they thought then and what they know now are probably a lot different. Pcsx2 has come a long way since 08.
Gaming Rig
Spoiler:
x86 vs x64
|
09-11-2013, 05:03 AM
Just want to point out that you are quoting posts from 2008. What they thought then and what they know now are probably a lot different. Pcsx2 has come a long way since 08.
Gaming Rig
Spoiler: 09-11-2013, 05:16 AM
(This post was last modified: 09-11-2013, 05:25 AM by omega_rugal.)
So
- 64-bit registers - more GPRs overall - FastMem (10% - 15% more speed just with this, some explanation is welcome) All x64 specific features that can not be done with x86 am i right? As for PCX2 case, if the developers think they can`t get more speed switching x64 we should trust them on that, besides, pcx2 has reached a stage in which most games run with little glitches and at playable speeds with current hardware, why waste time and effort in a x64 port? they have a life after all. now that i think of it, another question for the developers at which stage of completion you think dolphin is now? hardware wise, is there something you haven`t figure out yet? 09-11-2013, 05:26 AM
(09-11-2013, 05:03 AM)garrlker Wrote: Just want to point out that you are quoting posts from 2008. What they thought then and what they know now are probably a lot different. Pcsx2 has come a long way since 08. 64-bit OS is not new. It has been since 1985. Linux supports x86-64 since 2001. Also, they still don't change their opinion: Quote:Let's say we won't go 64 bits without a really good reason.http://forums.pcsx2.net/Thread-Is-PCSX2-...ing-64-bit (09-11-2013, 05:16 AM)omega_rugal Wrote: So The developer confirmed that fastmem improved the performance. But I'm not sure about the other points. 09-11-2013, 07:10 AM
For DSPLLE we greatly benefit from having 64 bit registers (to emulate the DSP 40-bit accumulators). I'm not sure about the use of MMX regs as GPR - that sounds like an interesting idea that would be hard to implement (compared to using the 8 additional GPRs from x86_64). Really disappointed that Intel does not allow the use of the 32 bit high part of GPRs as an independent reg, btw
For Dolphin, our JIT blocks are so small that reg spilling is probably not that much of a problem. 09-11-2013, 10:16 AM
(09-11-2013, 04:51 AM)xemnas Wrote: Thanks for the information. But I'm not quite convinced why larger ISA register set, more GPRs, more XMM, etc. make the 64-bit build faster because I don't see why they make things faster. Larger doesn't always mean faster like if we increase the amount of RAM, it doesn't make the emulation faster. You may be right but I don't see the clear connection here. I don't understand how they impact the performance.Think of GPRs as a level 0 cache. If it doesn't fit you have to spill to slower memory. 32GPRs seems to be the sweet spot before you get diminishing returns (without tricks is IA64 the only one to go higher?). 09-11-2013, 03:21 PM
(This post was last modified: 09-11-2013, 03:40 PM by NaturalViolence.)
xemnas Wrote:Thanks for the information. But I'm not quite convinced why larger ISA register set, more GPRs, more XMM, etc. make the 64-bit build faster because I don't see why they make things faster. Larger doesn't always mean faster like if we increase the amount of RAM, it doesn't make the emulation faster. You may be right but I don't see the clear connection here. I don't understand how they impact the performance. What do you mean you're not convinced!?! Even before the developers began doing extensive x86-64 optimizations (including fastmem) compiling for x86-64 produced far better performance. What I listed are basically all of the major changes that x86-64 made that can affect the performance of "small memory" applications like dolphin. If it's not one or more of those then there aren't going to be any performance improvements because there is nothing else that x86-64 changes that a "typical application" would use. This can be easily verified with a 15 second search on stack overflow, wikipedia, or google. Which I'm surprised that some of you didn't do. And why are you comparing registers to ram? They are two totally different things. You are correct that having more ram than the application uses doesn't boost performance. But then again we have a fuckton of ram in modern systems. Gigabytes of it. Far more than a typical application could possibly use. Register sets/files on the other hand are still only limited to dozens or hundreds of bytes. Easily used by just about any compiler and/or application with functions that have more than a few variables. Having more ram can speed up applications because if applications run out of ram they must access the HDD to load/store data via a swap. Which is MUCH slower than accessing ram. Preventing that from happening can sometimes boost application performance by more than 10,000% (100x). Likewise having a larger register set will speed up an application that uses the extra registers because if the program runs out of registers to use it will need to access ram in a situation where it might not normally need to. Accessing ram is MUCH slower than accessing a register. Again it can be more than 100 times slower. Reducing how often this happens even a tiny bit can boost performance considerably. It also helps boosts ILP. For the last two decades microprocessor design has been heavily focused around counteracting memory latency by reducing how often the cpu needs to access memory. This has been the primary means of increasing cpu IPC (instructions per clock) at the core level. Intel estimates that half of all increase to single threaded performance (per core performance) over the last decade is due to this. This has been done primarily through register renaming and caching. Register renaming also helps avoid serialization and boosts IPC further by improving ILP (instruction level parallelism). Competing RISC cpus in the 90s trumped x86 cpus in performance and one of the main improvements they used to do this was larger register files. x86 was implemented back in 1978 when using more than 8 GPRs would have been impractical due to semiconductor manufacturing limitations (photolithography) at the time. As semiconductor manufacturing improved they were unable to make their register file larger because doing so would require changing the register addressing system (using larger register addresses in instructions). Which would break backwards compatibility. Since backwards compatibility had been the key to their products success they held off despite the positive performance impact that it would have. In 1995 they released the pentium pro. The first iteration of their new P6 microarchitecture. This architecture came up with a genius solution to the problem. Use a large set of registers internally but don't expose them to the programmer. The programmer has access to a small number of "virtual" registers (at the ISA level) that don't actually exist inside the physical hardware. The hardware then maps and keeps track of the data in these virtual registers to a larger number of real physical registers. Boosting performance considerably without changing the register addresses and therefore without breaking backwards compatibility. As you can imagine the hardware needed to implement such a complicated optimization was pretty beefy and couldn't have been done with earlier semiconductor manufacturing technology. But this emphasizes the tremendous importance of a larger register file for boosting performance. Intel and AMD have been raising the size of their physical register files almost every generation ever since. The problem is even with continuing improvement in the tagging/indexing/reordering/retiring/etc. logic you can only do so much with a small ISA level register set being used. A larger register set at the ISA level allows better mapping to more registers at the physical level. x86-64 is just a newer/better version of x86-32. The improvements made will usually have some positive impact on performance (some memory read/write operations can be slower in some scenarios but that's about it). This impact can range from small to large depending on the level of code optimization, the type of algorithm being used, and the compiler being used. Now I'm too tired to continue this. I apologize for any likely grammar or spelling errors as I have no time to proofread this. xemnas Wrote:http://forums.pcsx2.net/Thread-We-are-dr...5#pid12935 One of the devs in the threads linked to that thread (cottonvibes) said that it could be made faster than x86-32 but it would require quite a bit of time/work before it could be optimized to the point where it was faster. xemnas Wrote:Please read the above quote. All he said was that the number of MMX registers isn't higher. Which is true. Intel/AMD did that deliberately though. SSE does everything MMX does much faster and with greater flexibility. By the time x86-64 was around they were trying to phase out MMX since SSE was already well established. I don't see why on earth PCSX2 would still be using MMX. Then again that post is from 2008. Let me put this into perspective: x86-64 SSE > x86-32 SSE > x86-64 MMX = x86-32 MMX SSE is faster than MMX and x86-64 SSE is faster than x86-32 SSE. Both are undeniable facts. So I don't see why the lack of improvement to MMX matters at all. They should be using SSE instead of MMX and in an application like dolphin or PCSX2 that does a lot of matrix math 64 bit SSE will boost performance due to the larger set of registers among other improvements. It seems like they might be planning to use MMX registers as GPRs (or maybe they already do?) as delroth said but this is stupid since they get mapped to the same physical registers internally anyways. delroth Wrote:For Dolphin, our JIT blocks are so small that reg spilling is probably not that much of a problem Interesting. Where is the pre-fastmem improvement (with HLE audio) coming from then? Using an old build with HLE can still show a 30%+ improvement in 64 bit builds over 32 bit builds. When I asked about this on IRC years back at least one dev told me it was probably because of the larger register set (this might have been skid). lamedude Wrote:Think of GPRs as a level 0 cache. If it doesn't fit you have to spill to slower memory. 32GPRs seems to be the sweet spot before you get diminishing returns (without tricks is IA64 the only one to go higher?). It's nothing like a cache in that it's managed by software rather than hardware and it doesn't do any caching. Most major non x86 architectures have 32 GPRs. Some specialized ones go higher like cell be. However this is slightly misleading since physical register file size continues to go up while architecture register set size does not in order to maintain backwards compatibility. There haven't been any new successful ISAs since the early 90s. And back in the early 90s semiconductor manufacturing technology made using more than 32 GPRs not ideal. That's why we're still stuck with 32 ISA GPRs. xemnas Wrote:64-bit OS is not new. Source? The first 64 bit microarchitecture I know of was the DEC alpha in 1992. xemnas Wrote:Linux supports x86-64 since 2001. x86-64 was released in 2003. It wasn't supported on Intel cpus until 2004 and low budget cpus until 2005. The first popular windows OS to support it was windows 7 in 2009 (both vista and XP 64 bit never gained wide adoption rates). Most systems today are 64 bit but in 2008 when that post was made that wasn't the case at all. Edit: I should probably clarify that register renaming is used to boost ILP, not reduce memory access. Though it does have that added benefit.
"Normally if given a choice between doing something and nothing, I’d choose to do nothing. But I would do something if it helps someone else do nothing. I’d work all night if it meant nothing got done."
-Ron Swanson "I shall be a good politician, even if it kills me. Or if it kills anyone else for that matter. " -Mark Antony 09-11-2013, 10:27 PM
"compiling for x86-64 produced far better performance" -> More GPRs but bigger pointers increasing cache spill. Pick your poison. Most apps don't care whether you have 64 bit GPRs or 32 bit GPRs.
"since they get mapped to the same physical registers internally anyways" -> NOPE, MMX registers are mapped to the old x87 80 bit registers, which are unused if you use SSE2 for FP math like most people do. On x86_64 you have no x87, no access to MMX, no free GPR to be used there. "an application like dolphin or PCSX2 that does a lot of matrix math" -> WTF? That's not where our bottlenecks are at all, and of course we use SSE2 for FP math - everyone serious does (even on x86, and it's actually the only way on x86_64). 09-11-2013, 11:39 PM
delroth Wrote:More GPRs but bigger pointers increasing cache spill. Pick your poison. Most apps don't care whether you have 64 bit GPRs or 32 bit GPRs. Well something was producing significant gains even in the early days. What else could it possibly be? delroth Wrote:NOPE, MMX registers are mapped to the old x87 80 bit registers, which are unused if you use SSE2 for FP math like most people do. True but I thought that on modern x86 cpus the x87 stack at the ISA level gets dumped into the same massive register file as SSE and "emulated" there. I guess I could be wrong. And either way proper optimization would allow better use of the physical resources. I doubt the lower number of registers available for XMM register renaming would hurt SSE performance much. Point taken. delroth Wrote:WTF? That's not where our bottlenecks are at all, You've corrected me multiple times in the past when I claimed this exact statement. During one of these corrections you stated that dolphin does a lot of SSE math, I believe for primitive assembly? Or maybe LLE? And SSE optimizations have consistently produced performance increases so I would imagine that it is true to at least some degree. delroth Wrote:On x86_64 you have no x87, no access to MMX, no free GPR to be used there. delroth Wrote:and of course we use SSE2 for FP math - everyone serious does (even on x86, and it's actually the only way on x86_64). Exactly. SSE is better for matrix int than MMX and better for matrix float than x87. It's stupid not to use it. So the lack of MMX and x87 shouldn't really matter. Even with the "free registers" x87/MMX is painfully slow compared to SSE on modern cpus (backed up by a lot of benchmarks). Modern microarchitectures focus so little on them that they are down to one pipe (no ILP at all) to reduce the amount of die space wasted on supporting them. I highly doubt using them would boost performance just because you have more registers available to you.
"Normally if given a choice between doing something and nothing, I’d choose to do nothing. But I would do something if it helps someone else do nothing. I’d work all night if it meant nothing got done."
-Ron Swanson "I shall be a good politician, even if it kills me. Or if it kills anyone else for that matter. " -Mark Antony (09-11-2013, 03:21 PM)NaturalViolence Wrote: Even before the developers began doing extensive x86-64 optimizations (including fastmem) compiling for x86-64 produced far better performance.This is not true at least for other emulators. Just having 64-bit build doesn't mean that it will be faster than 32-bit build. (09-11-2013, 03:21 PM)NaturalViolence Wrote: All he said was that the number of MMX registers isn't higher. Which is true. Intel/AMD did that deliberately though. SSE does everything MMX does much faster and with greater flexibility. By the time x86-64 was around they were trying to phase out MMX since SSE was already well established. I don't see why on earth PCSX2 would still be using MMX. Then again that post is from 2008.You misunderstood this. PCSX2 uses SSE but he meant that MMX registers were available for use in 32-bit mode. (09-11-2013, 03:21 PM)NaturalViolence Wrote: Source?https://en.wikipedia.org/wiki/UNICOS (09-11-2013, 03:21 PM)NaturalViolence Wrote: x86-64 was released in 2003. It wasn't supported on Intel cpus until 2004 and low budget cpus until 2005.It doesn't matter now. Because in 2013, PCSX2 team still doesn't see a good reason to have 64-bit build. That was my main point. Also, it's fact that Linux has been supported x86-64 since 2001. (09-11-2013, 03:21 PM)NaturalViolence Wrote: Exactly. SSE is better for matrix int than MMX and better for matrix float than x87. It's stupid not to use it. So the lack of MMX and x87 shouldn't really matter.I believe you misunderstood this as well. We're talking about the number of registers here. 09-12-2013, 03:30 AM
xemnas Wrote:This is not true at least for other emulators. Just having 64-bit build doesn't mean that it will be faster than 32-bit build. It usually does. While I am sure rare exceptions exist I have never seen an application that wasn't sped up by 64 bit compilation. Even if it was only 1 or 2%. Though I guess dynarecs might be an exception to the rule since they are such low level that proper optimization is probably required to gain a speedup, rather than simply providing additional benefit. Now what the PCSX2 team is saying might be true. The amount of speed up might be low and the amount of labor required for proper porting too high for them to consider it worth it. But no where have they claimed that it would be slower or even the same speed. xemnas Wrote:You misunderstood this. PCSX2 uses SSE but he meant that MMX registers were available for use in 32-bit mode. Ok well then if it doesn't use MMX at all what is the point of bringing that up? It uses XMM and r registers. Both of which x86-64 has more of (16 vs. 8). xemnas Wrote:https://en.wikipedia.org/wiki/UNICOS Cray supercomputers, no wonder. You managed to find the one exception and I congratulate you on knowing about this. But I'm afraid I still don't see how it's relevant to your argument. xemnas Wrote:It doesn't matter now. Because in 2013, PCSX2 team still doesn't see a good reason to have 64-bit build. That was my main point. Every thread you've linked to so far pretty much gives the same answer. "It's not worth the effort". Which is different from "it would be slower". There is your answer. xemnas Wrote:Also, it's fact that Linux has been supported x86-64 since 2001. And I didn't dispute that fact. It's also a fact that x86-64 hardware didn't exist until 2003. Those two facts are not mutually exclusive. xemnas Wrote:I believe you misunderstood this as well. We're talking about the number of registers here. Ok. Then what does the number of MMX registers have to do with an application that doesn't use MMX? Plus even if you count the MMX and x87 registers the total number of GPRs in x86-64 is still higher than x86-32. And they're much more useful registers. Both the PS2 and GC/Wii have cpus with 32 integer registers and 32 floating point registers. We're emulating them with cpus that have fewer ISA registers so I would imagine that having more would help to some degree. Going back to your original statements: xemnas Wrote:No. That might only be one of the reasons. The more important reason they explained was that even if they made the 64-bit work it probably wouldn't be faster than 32-bit. The performance would be pretty much the same. While this is technically valid to some degree if you count x87 and MMX you really shouldn't be using either in an application like this (and it looks like they don't) so it's kind of a moot point. xemnas Wrote:Regarding the registers someone mentioned, PCSX2 team also explained that the number of usable registers weren't that much different. 32-bit can use MMX registers or something if I remember correctly. This is not valid as the developers have not claimed this. Only users.
"Normally if given a choice between doing something and nothing, I’d choose to do nothing. But I would do something if it helps someone else do nothing. I’d work all night if it meant nothing got done."
-Ron Swanson "I shall be a good politician, even if it kills me. Or if it kills anyone else for that matter. " -Mark Antony |
« Next Oldest | Next Newest »
|