(09-12-2013, 03:30 AM)NaturalViolence Wrote: [ -> ]And I didn't dispute that fact. It's also a fact that x86-64 hardware didn't exist until 2003. Those two facts are not mutually exclusive.
It means that x86-64 is no secret since 2001. We shouldn't discuss more on this. It's a minor point and off-topic.
(09-12-2013, 03:30 AM)NaturalViolence Wrote: [ -> ]Ok well then if it doesn't use MMX at all what is the point of bringing that up?
(09-12-2013, 03:30 AM)NaturalViolence Wrote: [ -> ]Ok. Then what does the number of MMX registers have to do with an application that doesn't use MMX?
He said that MMX registers could be used in 32-bit mode. Can you read the first page?
(09-12-2013, 03:30 AM)NaturalViolence Wrote: [ -> ]This is not valid as the developers have not claimed this. Only users.
He is a developer.
xemnas Wrote:He is a developer.
My bad. I didn't recognize the name.
xemnas Wrote:He said that MMX registers could be used in 32-bit mode. Can you read the first page?
What I don't understand is why he would bring that up. They shouldn't (and from what I can tell aren't) using MMX or x87 anyways. They're painfully slow compared to SSE. So it's a completely useless point to make. It's essentially "but x86-64 doesn't have this thing that we wouldn't be using either way anyways". It's not a valid point that you can use to claim x86-64 won't provide a speedup. My best guess is he was just trying to point out to the user above him that the number of theoretical registers in the ISA isn't that different. Though that only applies if you ignore all of the additional registers added with other extensions like AVX over the past 5 years and ignore the fact that nobody uses MMX or x87 anyways.
I trust cottons statement over his on this one as it makes more sense.
All I can do at this point is reiterate the statements that I've already made. You can either believe me or not. I believe dolphins performance data provides some evidence to back up my deductions. I can't comment too much on PCSX2 since I have no experience with it nor have I looked at the source code. But then again this thread is about dolphin so I don't believe that's relevant anyways.
MMX is not "painfully slow" compared to SSE2 (SSE = enhanced x87, SSE2 = enhanced MMX) nor is it unusable on x86_64. It has smaller registers so can only process half the data in the same amount of time, however unlike SSE2+ it mostly has no alignment restrictions so in some cases it can still produce faster results.
tueidj Wrote:It has smaller registers so can only process half the data in the same amount of time, however unlike SSE2+ it mostly has no alignment restrictions so in some cases it can still produce faster results.
And 1/3 or 1/4 as many pipes depending on the microarchitecture (3 or 4 SSE ops can run in parallel while only 1 or at the most 2 MMX ops can run in parallel). And a slow as hell stack. Not to mention we're emulating 128 bit vectors so it makes sense to use 128 bit vectors instead of 64 bit vectors. All in all SSE should be a lot faster.
Benchmarks show that SSE puts out a crapton more synthetic performance on modern chips. At least some of this should translate into applicable performance. If you're doing vector math you should be using SSE. Every organization from microsoft to Intel recommends it, and for good reason.
tueidj Wrote:nor is it unusable on x86_64.
It's not usable in vs though. Which everybody uses.
xemnas Wrote:No. They use MMX registers. You can read this post for more details: http://forums.pcsx2.net/Thread-Is-it-pos...#pid37818.
Why didn't you post that to begin with? It looks like all of your questions about PCSX2 have already been answered there. Although none of this necessarily applies to dolphin.
Let me reiterate my points one more time if I may:
-Clearly dolphin is substantially sped up by 64 bit optimizations. This is pretty easy to test.
-Delroth has listed two potential causes for this. Fastmem and LLE accumulators.
-Before fastmem was implemented dolphin still showed substantial improvement from x86-64 compilation. So clearly that's not the only reason.
-Using HLE instead of LLE in old builds still shows a substantial speedup. So clearly these two things together aren't only reasons.
-That leaves the basic stuff. Larger set of GPRs and instruction pointer relative data access. Unless there were some other optimizations that were implemented before dolphin 2.0 that I don't know about.
Unless someone has a better explanation which I would very much like to hear I'm going to stick with what I know.
(09-12-2013, 09:11 AM)NaturalViolence Wrote: [ -> ]Why didn't you post that to begin with? It looks like all of your questions about PCSX2 have already been answered there. Although none of this necessarily applies to dolphin.
Because at first you asked me for a quote and I gave you that but you didn't understand. I don't know why because it's very clear to me. Also, I don't have questions about PCSX2. I posted the link for you to understand this correctly. Anyway, it's okay that you understand now.
Quote:Before fastmem was implemented
How did you test that? As far as I know, Dolphin got support for x64 between 2006 and the public release in 2008, and it already had fastmem in 2008 (look at the first revisions on the Dolphin git repository).
Quote:And 1/3 or 1/4 as many pipes depending on the microarchitecture (3 or 4 SSE ops can run in parallel while only 1 or at the most 2 MMX ops can run in parallel). And a slow as hell stack. Not to mention we're emulating 128 bit vectors so it makes sense to use 128 bit vectors instead of 64 bit vectors. All in all SSE should be a lot faster.
WTF are you even talking about? MMX has no "stack", and where are we emulating 128 bit vectors? GC has paired singles which are 64 bit vectors (overlapping the FPR), and that's about it.
Also, why are you making this an exclusive choice? You can use MMX and SSE, that's what PCSX2 does (they use MMX as an extension for GPRs, SSE2 for FP and probably SSE1/2/3/4/4.1 in a shitton of places to emulate weird PS2 chips).
Quote:It's not usable in vs though. Which everybody uses.
VS? What does VS has to do with anything here?
delroth Wrote:where are we emulating 128 bit vectors?
PCSX2 not dolphin. I'm pretty sure the VUs are 128 bit.
delroth Wrote:How did you test that? As far as I know, Dolphin got support for x64 between 2006 and the public release in 2008, and it already had fastmem in 2008 (look at the first revisions on the Dolphin git repository).
I thought it was implemented post 2.0. Still you said 10-15%. Some games hit 30-40% speedup with 2.0 and 3.0. Are you sure that's all from fastmem?
delroth Wrote:MMX has no "stack"
I'm thinking of x87. My bad.
delroth Wrote:Also, why are you making this an exclusive choice? You can use MMX and SSE, that's what PCSX2 does (they use MMX as an extension for GPRs, SSE2 for FP and probably SSE1/2/3/4/4.1 in a shitton of places to emulate weird PS2 chips).
Generally it's one or the other since SSE is almost always massively superior to MMX.
delroth Wrote:VS? What does VS has to do with anything here?
Would you seriously think about implementing features not supported in vs at all?
Quote:PCSX2 not dolphin. I'm pretty sure the VUs are 128 bit.
And I'm pretty sure Jake Stine already said in one of the posts that they use SSE up to SSE4.1 for VU emulation. What's your point?
Quote:Generally it's one or the other since SSE is almost always massively superior to MMX.
And here it's not the case.
I don't think this discussion is going anywhere if you're unable to listen to what people are telling you.
Quote:Would you seriously think about implementing features not supported in vs at all?
What are you talking about here? MMX, SSE, and just about every instruction set I can think of has intrinsics in Visual Studio, not to mention the possibility of going to assembly,
not to mention that compiler has nothing to do with anything when you write a JIT.
Please come back to discuss on this topic when you have a clue about software. Here it just looks like you're trying to confuse as many people as you can by spewing technobabble when you have no idea what it means.
delroth Wrote:And I'm pretty sure Jake Stine already said in one of the posts that they use SSE up to SSE4.1 for VU emulation. What's your point?
My point was just to demonstrate the superiority of SSE in this instance. They don't use MMX here because it wouldn't make sense to use MMX here.
delroth Wrote:What are you talking about here? MMX, SSE, and just about every instruction set I can think of has intrinsics in Visual Studio, not to mention the possibility of going to assembly,
But mmx has no intrinsic for x86-64 in visual studio according to msdn. That was my point.
delroth Wrote:not to mention that compiler has nothing to do with anything when you write a JIT.
delroth Wrote:And here it's not the case.
Points taken.
delroth Wrote:I don't think this discussion is going anywhere if you're unable to listen to what people are telling you.
delroth Wrote:Please come back to discuss on this topic when you have a clue about software. Here it just looks like you're trying to confuse as many people as you can by spewing technobabble when you have no idea what it means.
*backs away slowly*
I may have gotten some things wrong but I don't think it's fair to say that I'm just spewing technobabble. I don't use PCSX2 or read their dev blogs so I can only go by what I read with other emulators. I guess I shouldn't have entered the discussion once PCSX2 was brought up. I'll stay out of that now but I'm still waiting for an answer to this question if you don't mind:
NaturalViolence Wrote:You said 10-15%. Some games hit 30-40% speedup with 2.0 and 3.0. Are you sure that's all from fastmem?
Keep in mind that's with HLE.