x86 vs x64

**delroth** · 09-12-2013, 11:35 AM **#31**

No I'm not sure, I haven't profiled. Have you?

**NaturalViolence** · 09-12-2013, 11:39 AM **#32**

Nope. And I'm afraid I don't care enough to. Oh well. I guess it will remain a mystery. I still think I'm right. I mean what else could it possibly be? Do dolphins libraries benefit at all from "instruction pointer relative data access" (referencing data relative to RIP register)?

omega_rugal · 09-12-2013, 12:13 PM **#33**

Sorry to bump between you two and your heated debate but this

Quote:That's not where our bottlenecks are at all

What are the bottlenecks now? if i may ask.

**delroth** · 09-12-2013, 12:35 PM **#34**

See https://code.google.com/p/dolphin-emu/wi...ationIdeas for some stuff that would help with performance (especially macroblocks - I've looked at the size distribution of blocks and something like 99% of blocks we compile are < 5 instructions).

tueidj · 09-12-2013, 04:20 PM **#35**

(09-12-2013, 09:11 AM)NaturalViolence Wrote: And 1/3 or 1/4 as many pipes depending on the microarchitecture (3 or 4 SSE ops can run in parallel while only 1 or at the most 2 MMX ops can run in parallel).

Source? I think you'll find they use the same execution units... You seem to be quoting throughput figures which of course are going to be doubled for SSE vs MMX due to the register width, as I already explained.

(09-12-2013, 09:11 AM)NaturalViolence Wrote: And a slow as hell stack.

No.

Quote:Not to mention we're emulating 128 bit vectors so it makes sense to use 128 bit vectors instead of 64 bit vectors.

No.

Quote:All in all SSE should be a lot faster.

No. You didn't address the point about alignment restrictions.

(09-12-2013, 09:11 AM)NaturalViolence Wrote: Benchmarks show that SSE puts out a crapton more synthetic performance on modern chips. At least some of this should translate into applicable performance. If you're doing vector math you should be using SSE. Every organization from microsoft to Intel recommends it, and for good reason.

Unless you're not actually using the latest and greatest software/hardware from them - WinXP doesn't support AVX, pre Core2 cpus performed each SSE op by breaking it into two MMX ops etc. Don't forget these companies are vendors, it's part of their job to push their current products so of course they're going to recommend using the latest features.
If MS were committed to drinking their own kool-aid why did they decrease the usability of SSE(/XMM) registers when moving from x86 to x86_64 by making XMM6 and XMM7 non-volatile? I'll tell you why: They just make it up as they go along, as evidenced by the win64 betas using a different ABI than the RTM build. They gave developers a preview so they could get their apps up to scratch by changing their inline MMX code to SSE, then pulled the rug out from under them. The ones who stuck with MMX were unscathed...

Quote:It's not usable in vs though. Which everybody uses.

Hahahahahahahahaha.
It's been well documented in the past how bad the code generation from VS intrinsics is (example). If you're writing assembly code you shouldn't be relying on the compiler at all, otherwise what's the point? You can write a function that seems perfect at a glance - ops interleaved to match instruction latencies, loads and stores in the perfect order to alleviate register spills - then the compiler can go and screw it all up. Assembly code belongs in external asm files which are only assembled, not compiled - ensuring the code ends up doing exactly what you tell it to do rather than what the compiler thinks should be done.

funky1096 · 09-12-2013, 04:45 PM **#36**

Naturalviolence said
"so much with a small ISA level register set being used. A larger register set at the ISA level allows better mapping to more registers at the physical level."

So why (besides backwards compatibility) doesent AMD and intel get their heads out of the mud that was the 1970's and make much much faster processors?

**NaturalViolence** · 09-12-2013, 11:20 PM **#37**

@tuedifj

Points noted. You're correct that I was looking at throughput figures. It appears that they do indeed use the same pipes/EUs as SSE despite having separate registers.

funky1096 Wrote:So why (besides backwards compatibility) doesent AMD and intel get their heads out of the mud that was the 1970's and make much much faster processors?

Why isn't backwards compatibility a valid reason? That's the only reason they haven't done it but it's a big enough reason to make sure they never do it. Nobody is going to buy a cpu that can't run x86 software. Imagine if all x86 software had to be rewritten or at least recompiled, including windows. They've tried to release a few non-x86 cpus in the past to replace x86 but they all failed horribly in the market.

Also keep in mind that they still have register renaming to help counteract it so "much much faster" is probably an exaggeration. There would be a speedup but nobody really knows how significant it would be.

funky1096 · (This post was last modified: 09-13-2013, 07:27 AM by funky1096.)

(09-12-2013, 11:20 PM)NaturalViolence Wrote: Why isn't backwards compatibility a valid reason? That's the only reason they haven't done it but it's a big enough reason to make sure they never do it. Nobody is going to buy a cpu that can't run x86 software. Imagine if all x86 software had to be rewritten or at least recompiled, including windows. They've tried to release a few non-x86 cpus in the past to replace x86 but they all failed horribly in the market.

Also keep in mind that they still have register renaming to help counteract it so "much much faster" is probably an exaggeration. There would be a speedup but nobody really knows how significant it would be.

To the underlined i say why do they not develop a new architecture just to see how much faster it would be.
To the bold i say depending upon how much faster a new architecture is there would be no reason to rewrite all those applications just make a emulator for the old design.(if it is possible to emulate such a design that is im clueless about cross architecture emulators needed processing power.) Cool

[EDIT]

simple googling brought this up http://raspberrypi.stackexchange.com/que...k-3-server

So current ARM(or whatever A-- type of processor its called for the raspberry pi) can at a weak processor emulate x86 effectively so whats stopping someone from writing the best emulator ever for a new architecture? money, politics, fear, laziness? It would rock the technological world depending upon how such a processor is marketed.
What where the failed architectures that you mentioned? Why did they fail and did they even give a chance to emulate x86?

**neobrain** · 09-13-2013, 07:52 AM **#39**

It doesn't work that way. I don't even know where to begin with, that idea doesn't make sense at all in any of the aspects you mentioned :|

**AnyOldName3** · 09-13-2013, 08:07 AM **#40**

Itanium was supposed to replace x86, and was supposed to be able to emulate x86 too. The first generation's performance was abysmal, especially when emulating x86, so no-one bought it. Later generations now have earned their place (albeit among servers and supercomputers, not home PCs), but for anything an average user is likely to use, they're worse. Intel aren't likely to go down that road again.