Dolphin, the GameCube and Wii emulator - Forums

Full Version: Dolphin CPU hierarchy [UNOFFICIAL]
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
This is probably going to be the longest rebuttal post I've ever made since I'm responding to 3-4 different posts each of which are pretty long. I'm not even halfway done yet and it's already insanely long.....

I do have some corrections to AON3's post that I want to make too.
Finally....done....phew. I hope I don't have to do this again for quite some time.

Cruzar Wrote:If Intel is so much faster than AMD why is it in any Game test (real world NOT synthetic) if the FX chips (including the 4100) are ever behind by a comparable Intel cheap, even an i5 the FPS difference is so negligible as to be like wow.....5fps? Whoopie fucking do.

Like AON3 said PC games are generally GPU bound. But I would like to expound on that a little more if I may.

The gpu side of a modern game engine is under pretty consistent high load. However the cpu side often produces highly erratic behavior. Most of the time it's doing very little work. But every once and awhile an event handler is triggered which causes the computational requirements to explode. Maybe you blew up a building and it needed to do physical simulation of the debris. Maybe a series of nearby explosions caused the audio renderer load to suddenly increase. Whatever the cause the result is a short but dramatic drop in framerate. The result of that is a sometimes perceivable microstutter. With a good cpu these stutters will be less severe (less drop in framerate), less often, and shorter in duration. But since these stutters happen only occasionally and for a brief moment they have very little effect on average framerate. A tremendous increase in cpu side microstuttering will only drop the average framerate by a few fps. And that's what we see with bulldozer/piledriver.

But how do we measure this? Simple. Frame times. Recently it's become more popular to include these as a benchmark in additional to framerates due to a wider availability of the software and hardware tools required to measure it as well as a greater exposure to the problem by the general public (people have begun to realize that especially with SLI/Crossfire microstuttering can cause a high framerate to still appear "jittery"). When measuring a cpus impact on a video game it is absolutely crucial to measure frame times. So how well does AMD currently stack up against the competition in this area? Well see for yourself: http://techreport.com/review/23246/inside-the-second-gaming-performance-with-today-cpus

Not good.....


As far as framerates are concerned you'll usually only see a big difference if you have a 120Hz monitor. When trying to reach framerates above 75-90 fps sandy/ivy bridge cpus take a significant lead in framerate.

In the few cpu bound game engines that exist FX series cpus get utterly destroyed. Take a look at SC2 and FC2 benchmarks for example.

In addition to this both the dolphin and pcsx2 communities have run benchmarks on bulldozer and piledriver for their respective emulators and proven repeatedly that they are terrible for emulation as well. So don't use them for emulated games either.

There are so many game benchmarks for piledriver/bulldozer from so many different sources all showing a small to substantial loss in framerate compared to sandy/ivy bridge in almost every game that I for one cannot understand why someone would attempt to recommend one of these things as a gaming cpu.

Cruzar Wrote:And according to some websites, say http://www.phoronix.com/scan.php?page=ar...ver2&num=1 show the FX chips demolishing Intel? HMMMM?

I wouldn't say demolishing. It won in some of the tests and it lost in some of the tests. In terms of the number of tests won/lost they look about equal.

AON3 also mentioned that they were all heavily multi-threaded. Which is not a good representation of the variety of real world applications that a user is likely to run. This is true but I would like to add to that. Nearly all of the benchmarks they used were either workstation/server applications, scientific applications, or synthetic applications. In other words the type of applications that this particular microarchitecture does well with. And it still lost in half of them. A good review will contain a wide range of different types of workloads/software that your average user is expected to use to see how well the architecture performs under the many different types of work that a typical user will be giving it. But we don't see that here. We just see the best case scenario. Honestly what are you more likely to do with your PC, run protein folding simulations or run dolphin?

Cruzar Wrote:And AMD's architecture is not the same as Intel, so no you cannot just toss the same coding at the AMD parts, that's like trying to throw the same code for a GTX 680 at a fucking HD 7970, or Geforce FX 5200, they're not the same they require different coding to work properly also the 5200 was shit and still is shit to this day so I just tossed it in for lulz.

AON3 properly addressed this as well. Modern graphics engines use the same codepath for different GPUs because they all run the same type of code (HLSL for d3d and GLSL/CGSL for openGL). So yes we do "toss the same code" at them because it's impossible to do otherwise with the abstraction present in modern operating systems. You would know this if you had done any graphics programming at all. But instead you're just making assumptions. So please knock it off. It's a waste of everyone's time to have to correct you.

Cruzar Wrote:And wow, doesn't mean they know diddly.

Please start quoting what you're responding to. I have no clue who "they" is or what "they" don't know.

Cruzar Wrote:Compiled with doesn't necessarily mean they left everything at defaults.

I assure you we've tried every GCC flag available.

Even if we hadn't a microarchitecture that requires a specially optimized binary just to perform on par with a competing microarchitecture running an unoptimized binary is inherently a bad microarchitecture. Case in point: The Netburst and Itanium microarchitectures.

Sandy/ivy bridge cpus are so far ahead in dolphin that no level of compiler optimization will allow an FX series cpu to perform competitively. I doubt even hand optimizing all of the code in assembly for piledriver would allow it to achieve the same performance that sandy/ivy bridge cpus are getting now.

Cruzar Wrote:If it were really optimized it would be faster, much faster.

How do you know this? What optimizations could be made to allow the FX series cpus to outperform sandy/ivy bridge cpus? Please bless us with your great knowledge of low level optimizations. Unless....you're making another assumption. But you wouldn't do that right? RIGHT!?!

Cruzar Wrote:Especially with better core/thread scheduling

That was fixed ages ago in the linux kernel and NT kernel. All modern piledriver benchmarks do proper thread scheduling.

Cruzar Wrote:rather than staying Dual/Triple

I have no idea what you're trying to say here. Dual/triple threading has nothing to do with the scheduling issues you were referring to earlier in your sentence.

If you're asking why dolphin is dual/triple threaded it is because like most applications it is literally impossible to gain any speedup from multithreading it any further. Dolphin has only 3 chips to emulate and the instructions for each chip must be emulated in sequential order or the result of the emulated function will be changed (thus breaking emulation). This makes parallel processing of the instructions incredibly stupid because of the resulting stall time from syncing/locking the threads. Despite this fact the developers tried it anyways and proved that it wouldn't work.

Cruzar Wrote:just because oh em gee Intel wont benfit ur stoped

Once again we need some context or we can't figure out what it is you're trying to say. And talking like that is extremely unprofessional so please avoid doing it if you wish to be taken seriously.

Cruzar Wrote:That's quite and Interesting claim. (Sarcasm)

See above.

Cruzar Wrote:not just them others too though hard to find what with most sites running benches/tests pretty much owned by Intel

This is beyond ridiculous. Peer reviewed benchmarks are literally the most accurate form of data collection for performance available on the planet. To claim that most hardware review sites are paid off by Intel is absurd. You have users getting the same results with the same software. An enormous number of sites, far more than Intel could even find let alone pay off. The lack of a guarantee that they will all accept your money. Most of these sites even go so far to make their employees sigh ethical clauses strictly forbidding it just to keep people like you happy. Then there is the fact that if even one site blows the lid on what you're doing it will utterly ruin your companies reputation. This means the only way you could be right is if Intel has personally tracked down and bribed every single website and user that has ever benchmarked their products and somehow ensured that none of them would ever admit to it. In other words it's a worldwide conspiracy by tens of thousands of people (at least, probably more) including random users. I am absolutely stunned that someone would make such a claim.

Cruzar Wrote:did tests of their own and the FX only occasionally looses to Intel in real world scenarios an 8350 can sometimes even keep up with a 3820.

"Sometimes" is the key word there. And yes you're right they do keep up with the i7 3770K and 3820 in some applications. The problem is that the type of workload needed for them to perform this well is very rare in real world applications and even when these chips do perform competitively they require more power and generate more heat to do it. Which is why OEMs won't touch them with a 10 ft. pole.

Cruzar Wrote:Also the cinebench scandal was a scandal involving cinebench and the Intel compiler they (and other synthetic benchmarks writers) used,

That's not a scandal. They have openly admitted which compiler they use from day one. ICC is a compiler that is made for Intel and is therefore optimized for Intel cpus, everyone knows that. It's not Intels fault that AMD hasn't bothered to invest R&D money into making their own compiler.

Cruzar Wrote:of which when detecting an Intel chip did it the most optimized way possible, but if it saw an AMD chip, it did it the worst way possible making sure even though your processor clearly can do and is doing much better than the test so claims, that you think your processor is slow as shit and worthless and that you must OMG buy Intel, buy Intel, when patched to make it think you're always on Intel, AMD chips had a massive performance jump.

This is true to some extent. However I must point out two things that nullify this argument being used to defend the FX series cpus:
1. Only a small fraction of the software typically used for benchmarking these cpus is compiled with ICC. You can't simply ignore all of the other results which are equally poor just because some tests use ICC. The results with software compiler on other compilers are generally just as bad.
2. If your favorite software runs faster on Intel cpus and part of the reason is ICC you shouldn't stop using Intel cpus just because it's "not fair". The goal of a cpu is to run the software that you use faster and with a lower power consumption. You shouldn't really care how it's achieved. Is it fair? No. Is it a valid reason for not buying Intel cpus? No.

Also the benefits of using Intel CPUIDs in ICC are grossly exaggerated by the AMD community. In nearly all tests done on the subject the performance difference has been shown to be small or nothing depending on the type of workload. In dolphin for example there is typically little to no difference comparing Intel vs. AMD on ICC and GCC. The only software the really gains any benefit at all are those that extensively use newer SSE instructions (which to fair cinebench probably does).

You can't just dismiss bulldozer/piledrivers architectural weaknesses just because some software uses ICC. AMDs microarchitectures are still missing a lot of features that are very important for many workloads. They still don't have micro-op fusion, macro-op fusion, a microcode cache, a 6 wide execution pipeline, and the list goes on and on. No level of software optimization will make up for the lack of these features. Certain algorithms are just going to run poorly no matter what without these features. A good cpu microarchitecture is one that can run a wide range of algorithms well, not just some. If piledriver was massively faster in optimal workloads then we could forgive all of the crappy results. But it's not. It ranges from "much slower" to "about the same or slightly faster" depending on the application. Combine this with a higher power consumption and lack of integrated graphics and it clearly becomes the lesser architecture.

To summarize:
-Stop blaming programmers for everything. Believe it or not there is such a thing as a genuine hardware weakness. Sometimes hardware can do a specific task in a more optimal way. Claiming that anytime something is slower than something else it must be the programmers fault is insane. A K7 cpu for example is just flat out inferior to todays cpus even when running the same code at the same clock rate. Its pipeline design is simply less optimal. Blaming programmers for this would not be logical.
-Vishera (piledriver FX series) cpus performance relative to ivy bridge i7s mostly range from "much worse" to "about the same" depending on the application. Overall their performance is poorer. These benchmarks have been done by both professional organizations and random users on the internet with comparable results so there is no reason to doubt them. If you want to have the fastest overall cpu regardless of price your only option is Intel at this point.
-FX series cpus generally do slightly worse in framerate tests and dramatically worse in frame time tests. In cpu bound games their framerates are much poorer. This makes them a bad choice for PC gaming.
-Both the dolphin and pcsx2 communities have conducted community benchmarks and found that FX series cpus to no ones surprise perform poorly in emulators.
-FX series cpus consume more power and produce more heat despite have suboptimal performance.
-While the FX series cpus can overclock higher overclocking them even a tiny bit causes their power consumption (and therefore heat) levels to shoot up to up to crazy high numbers. The same level of overclocking can be achieved on competing Intel cpus with dramatically lower power consumption and heat. Making them the better choice for most overclockers who don't have access to phase change cooling or any other exotic form of cooling (not to mention the risk of condensation and rapid electrical/chemical damage to the hardware associated with using that kind of cooling solution).
-AMD has realized that their chips are inferior. That is why they have had to price them so much lower in order to sell them. Their idea is to provide a chip that can produce comparable performance to i5 and i7 cpus in some applications (primarily those that are heavily multithreaded) at a lower cost. In exchange for this users will have to deal with higher power consumption, greater cooling requirements, lack of integrated graphics (which is important to bringing down total system cost for non-gamers), and poor performance in many applications that don't have a workload that is optimal for this type of architecture. But you do get similar performance in some applications at a lower cost. If that weren't the case they wouldn't be able to sell them at all. If you happen to use those applications extensively by all means buy one. But it sounds to me like the types of applications you plan on using are not well suited to this architecture. So you would be wise to avoid it.

AnyOldName3 Wrote:Picture a road with a narrow bit cars have to slow down to pass. This is the GPU. If you make another bit of the road wider, it means more cars can pass through that area, but it won't make a huge difference to how many cars get to the other end per unit time, because they all have to go through the narrow bit. In the real world, this model isn't perfect, but it's enough to give you an idea of how bottlenecking works. It also allows me to point out that the French for traffic jam is embouteillage, which literally translates as bottling or bottlenecking, which is one of the few things I can remember from being taught French.

This analogy is more confusing and inaccurate than it is helpful. This is why I generally avoid trying to compare electronics to other types of systems for the purposes of explanation.

AnyOldName3 Wrote:It is very, very difficult to make a program favour one microarchitecture over another if they are both supposedly the same speed.

I strongly disagree with this. Compiler and code optimizations can both be used to shift an algorithm from favoring one microarchitecture to another.
And the simple way of explaining this: whatever test you've seen on the internet, it wasn't the same as running this emulator. Period, end of story. If you take the latest AMD and Intel chips and clock them both at 4.0GHz, Intel wins every single time, and by a large margin.
You forgot to correct the part where AON3 called x86-64/AMD64/EM64T IA-64. Smile
Oh wow! I'm surprised that I didn't catch that.

Starscream Wrote:And the simple way of explaining this: whatever test you've seen on the internet, it wasn't the same as running this emulator. Period, end of story. If you take the latest AMD and Intel chips and clock them both at 4.0GHz, Intel wins every single time, and by a large margin.

If he had just asked that then sure I would have pointed him at the benchmark results and that would have been the end of it.

But he already knows that it runs dolphin poorly. But he doesn't know why. He is convinced that the only reason that dolphin runs poorly on FX series cpus is because dolphin is poorly optimized for them due to bad programming. This is the main myth that I was trying to dispel when I first responded.
(06-11-2013, 03:38 PM)lamedude Wrote: [ -> ]You forgot to correct the part where AON3 called x86-64/AMD64/EM64T IA-64. Smile
Talk to wikipedia about that, not me (although I did assume that after reading the IA-32 page, IA-64 was the same, but 64 bit, which isn't that illogical). The IA-64 32 page says it's a non-brand-specific term for 64 bit x86 CPUs. Also, many pages (for example hackintosh/mac clone pages) link to the IA-64 32 page when talking about 64 bit x86 processors.

However, just to point out that I do know something about Itanium (and it's difference to x86), and how foolish I feel after double checking which page I'd misread/misinterpreted, I'm going to point out a tiny mistake in NV's post, which barely has any effect on anything.

Quote:[text here which makes the sentence make sense but is irrelevant to the point I'm making] microarchitecture. Case in point: The Netburst and Itanium microarchitectures.

Itanium isn't a microarchitecture, it's a whole ISA of its own. The competing microarchitectures argument doesn't entirely apply, due to the fact that it was designed for its own niche, and when it was terrible it was still partially experimental. You wouldn't compare PowerPC and Intel's x86 chips in this way (even if PPC was awful), due to the fact that one isn't a bad version or x86, but its own thing entirely.

I'd further make my point, but I'd then end up writing something wrong, even if I haven't already, and that would make more work for NV, which he doesn't need.
"Itanium" is a marketing name used to refer to the ISA (more formally called IA-64), the family of cpus based on IA-64, and the first microarchitecture used in the family (codenamed Merced). Is it dumb to use the word that way? Yes. But it is used that way. So I'm kind of sort of right on that front.

AnyOldName3 Wrote:The competing microarchitectures argument doesn't entirely apply, due to the fact that it was designed for its own niche, and when it was terrible it was still partially experimental.

It was designed to replace all x86 cpus and become intels new ISA. They openly admitted that this was their plan from day one. It failed so horribly that they were forced to retarget it as a niche product to keep it alive and try to recoup the investment losses. It wasn't "designed for its own niche" but it did end up in one thanks to its failure to be competitive.

As far as being experimental is concerned you're kind of right. However I would like to point out a few things. By the time it was finally "ready" it had been in development for nearly a decade, so they had certainly had plenty of time to tinker with it and decide how they were going to market it. They put so much financial backing behind it in R&D, marketing, and software development tools that I really don't think it was meant to "test the waters" so to speak. They really believed it was the future. It's the only reason they would make that kind of financial investment in it. Its failure nearly sunk the company.


AnyOldName3 Wrote:You wouldn't compare PowerPC and Intel's x86 chips in this way (even if PPC was awful), due to the fact that one isn't a bad version or x86, but its own thing entirely.

Then you're misunderstanding the point I was trying to make:
NaturalViolence Wrote:a microarchitecture that requires a specially optimized binary just to perform on par with a competing microarchitecture running an unoptimized binary is inherently a bad microarchitecture.

My point was that microarchitectures that require crazy software optimizations just to have competitive performance are inherently bad microarchitectures. This is true regardless of what ISA they use.

And for the record they may have had a different ISA but they did compete with PPC and x86 cpus for market share in the server and mainframe markets. One of the reasons why they failed to produce comparable performance is that they required very specific workloads that were well suited to the architectures strengths and crazy low level optimizations to perform well. The architectures design posed serious design complications for compilers that made extremely thorough hand optimized code a must unless the workload was optimal.

AnyOldName3 Wrote:I'd further make my point, but I'd then end up writing something wrong, even if I haven't already, and that would make more work for NV, which he doesn't need.

And I thank you for that. I still haven't had time to respond to this thread: http://forums.dolphin-emu.org/Thread-amd-vs-nvidia?page=5

And I'm wondering if I should even bother at this point. I'm leaning towards no.
Please do, because otherwise I have to pick up the slack.

Back to Itanium:

The impression I got months ago when I read the wiki page was that initially Itanium, and other EPIC architectures were supposed to replace x86 etc because they expected the maximum IPC to be 1 otherwise. In this potentially fictitious course of events, loads of money was sunk into EPIC, mostly by Intel, hence theirs started to work. At some point, a breakthrough was made in x86 which would allow different instructions to run on different parts of the same chip/hyperthreading was invented/they realised 2 cores could be put on one chip. This held off Itanium, but they carried on with it because they though it would turn out better in the end. The Intel engineers expected the software team to be able to make the Itanium mode for ICC work, and work well, but in actual fact, it was a gargantuan task compared to what had been expected. As Itanium's progress slowed, x86's accelerated way past earlier predictions. Intel continued with Itanium, as theoretically it was still great, and the fact that an omniscient compiler didn't exist was just a software issue which could be worked on later. Either way, they weren't about to waste all the money they put in, so they continued. At some point, they decided to release Itanium chips, just because they thought they needed some money back, and as most issues were software, not hardware, they could be solved later, and the older chips would improve. By this point, x86 had gotten so much better that few people actually believed EPIC was the future. Eventually, Intel realised that without a god manually optimising everyone's code, Itanium would only be good in certain niches, so marketed Itanium as niche, and x86 as normal.

Now, how much of this happened.
AnyOldName3 Wrote:replace x86 etc because they expected the maximum IPC to be 1 otherwise.

This sentence makes no sense to me. The IPC of what was expected to be 1? And under what conditions?

If you're talking about x86 cpus well then that's definitely wrong. They were already capable of exceeding 1 instruction per clock cycle back when they started working on itanium (around 1995). P5 can do up to 2 instructions per cycle and P6 can do up to 3 instructions per cycle. And I'm willing to bet that even the early P6 designs exceeded 1 instruction per clock cycle average in most workloads.

AnyOldName3 Wrote:In this potentially fictitious course of events, loads of money was sunk into EPIC, mostly by Intel, hence theirs started to work.

Correct. It was a joint project by both Intel and HP but most of the financial backing obviously came from Intel. There were rumors that disagreements and poor coordination by the two teams of engineers at both companies had a major negative impact on the architectures design and release date. But those rumors have never been confirmed so while it sounds plausible we will never know for sure.

AnyOldName3 Wrote:At some point, a breakthrough was made in x86 which would allow different instructions to run on different parts of the same chip/hyperthreading was invented/they realised 2 cores could be put on one chip.

There are a couple problems with this sentence.
1. What you're describing at the beginning of the sentence is pipelining. Which was invented in the late 70s and had widespread use in microprocessors by the late 80s.
2. By the time dual core x86 cpus wre around Itanium had pretty much been dead for 2 years and on life support. They realized their failure long before that.
3. Hyperthreading (SMT) is only significantly beneficial in certain types of pipelines. Netburst was ideal for it. But they avoided using it in the pentium M pipeline for awhile because it would have done more harm than good. And of course its benefits still vary widely depending on the workload even when a microarchitecture does favor it.
4. Itanium eventually received HT.
5. Itanium also went multi-core shortly after x86.
6. They always realized that multi-core chips were possible. They just never made them because they figured that there was no market for them except in high end servers. Only when they ran into the power wall did they resort to multi-core designs since their was no other way to continue maintaining rapid increases in performance at that point.

AnyOldName3 Wrote:This held off Itanium, but they carried on with it because they though it would turn out better in the end.

Which time period are you referring to here?

By the time SMT and multicore were around Itanium had already been out for a few years.

AnyOldName3 Wrote:The Intel engineers expected the software team to be able to make the Itanium mode for ICC work, and work well, but in actual fact, it was a gargantuan task compared to what had been expected.

Kind of true. I don't think they focused on ICC too much. They invested heavily in 3rd party software developers to make compilers and such for them.

AnyOldName3 Wrote:As Itanium's progress slowed, x86's accelerated way past earlier predictions.

Kind of true. Progress slowed on both fronts (below expected predictions). But progress slowed a lot more on Itanium's front.

AnyOldName3 Wrote:Intel continued with Itanium, as theoretically it was still great, and the fact that an omniscient compiler didn't exist was just a software issue which could be worked on later.

By the time they realized what you wrote in your previous sentence they probably knew it was time to give up. They turned it into a mainframe niche product pretty fast.

What you're saying in this sentence is true of the first two years.

AnyOldName3 Wrote:Either way, they weren't about to waste all the money they put in, so they continued.

And because HP payed them to keep it on life support. Unless you're talking about earlier....

I'm starting to get confused as there is no way to figure out which point in the lifecycle you're referring to here.

AnyOldName3 Wrote:At some point, they decided to release Itanium chips, just because they thought they needed some money back, and as most issues were software, not hardware, they could be solved later, and the older chips would improve.

Now I'm really confused about your chronology. Your earlier sentences were talking about things that happened after Itanium was released and now you're saying "At some point, they decided to release Itanium chips" as if everything leading up to this occurred earlier.

This is correct but completely out of place.

AnyOldName3 Wrote:By this point, x86 had gotten so much better that few people actually believed EPIC was the future.

I disagree. Their propaganda didn't begin to die down until a year or two after release. Many people were still convinced that it was the future for the first year or two. It was AMD64 (athlon 64 and opteron) that really nailed the coffin shut.

AnyOldName3 Wrote:Eventually, Intel realised that without a god manually optimising everyone's code, Itanium would only be good in certain niches, so marketed Itanium as niche, and x86 as normal.

Pretty much. Even with well optimized code it's still only good for certain workloads. Mainly scientific applications. These days they might not even be faster at those anymore.