Dolphin, the GameCube and Wii emulator - Forums

Full Version: Dolphin CPU hierarchy [UNOFFICIAL]
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
(01-08-2013, 11:02 AM)NaturalViolence Wrote: [ -> ]
Quote:So basically... you don't know any more about the performance than I do.
Well what more do you need to know? We know enough to rank the family in a categorized list such as this one. Just not enough to compare specific numbers on specific models (which is not what the OP is about anyways).

Ironically our attempts to convince forum users not to buy them prevent us from acquiring the data needed to prove to them why they shouldn't buy it. I guess we could try and trick a few of them into thinking they're good for benchmarking purposes Tongue.

Actually, we do have the data. Sort of, in theory. At least, a decent idea. It's something I like to call PPC score. It works like this. As you probably know, when CPUs are benchmarked, especially in things like CPU Mark, they'll be given a very specific score. Well, we know one thing for absolute certain, even if we don't know exactly how many IPC each processor gets. We know the number of cores on a processor. So, basically, in theory, you could divide the CPU Mark Score by the number of processors, and get a rough PPC (Power Per Core) score. Higher the better.

With that said, here would be why the Intel Processors would work so much better.

Intel Core i7-3720QM @ 2.60GHz CPU Mark: 8552

8552/4 (Number of Cores) = 2138 Power Per Core score.

Which is also why my AMD A6-4400M is so abysmal. It got a measly...

825.5. Uggh.

Test Validity:

Intel i5 3210M comes out to 3820 Dolphin Score before Turbo, placing it in the very fast category with a PPC of 1910. The Pentium G870 gets a score of 1560, placing it in the Very Fast class. Along with the i3-2100 which gets 1824 which implies that somewhere between 1824 and 1910 is where the break for Extremely Fast likely is. Of course, this depends on architecture latency times, microstuttering, GPU, etc. However, one of the processors that really helps my theory is the Intel Core i3-2370M. This low end i3 gets a PPC score of just under 1500, and is, as predicted, performs at about the same range as the desktop Pentium G870 at Dolphin. Simple, right? It matches up with the list perfectly.

Sad Truth: The reason why AMD struggles so hard with Dolphin is because their PPC score is so low. One of the best for their PPC score is the FX-4300 Quad Core. At stock clocks, (3.8GHz...) the FX gets a score of 1262.5. In the numbers, it's obvious why AMD gets destroyed on Dolphin. Painfully obvious.

Luckily the FX series and K series of AMD have unlocked multipliers to overclock to your heart's content. Now if only I could figure out how to predict PassMark scores by knowing the amount of instructions per clock. Then I'd know how good my processor would perform before I buy it and try to overclock! Lol.

However, I wonder if this will start to fall apart between Ivy and Sandy Bridge. This really doesn't seem to incorporate new instruction sets well and just gives a general idea of what range you'd fall into.
I really didn't want to have to write a WoT today......sigh.

Quote:Actually, we do have the data.

No we don't. I have at least glanced at every single thread on this forum over the last 2 years (at least if not longer) except for the ones in the controller section. We have never had any verifiable or reliable data about the performance of these processors in dolphin. I can absolutely promise you that.

Quote:Sort of, in theory. At least, a decent idea. It's something I like to call PPC score. It works like this. As you probably know, when CPUs are benchmarked, especially in things like CPU Mark, they'll be given a very specific score.

See this part of my post:
Quote:No. We have no benchmarking data using dolphin. We can at best make safe assumptions using benchmarking data from other applications with similar code profiles.

The data that we are talking about is not the data that you're talking about. But I will respond to the rest of your post simply to allow further discussion because you might have something useful to point out that I have not considered.

Quote: Well, we know one thing for absolute certain, even if we don't know exactly how many IPC each processor gets.

But we do know exactly how much IPC specific programs get on specific cpus after we benchmark them. It's a really simple arithmetic equation.

Assuming IPC of total cpu:
(instructions processed/time) / clock signal frequency
or another way of writing that: IPS/Fclk (I have no way to type subscript without HTML....that I know of)

Often we'll just take the easier route and do time / clock rate which gives you a number that while technically isn't IPC can be used to compare the speed of processors per clock cycle so long as they use the same ISA. To get the number of instructions processed you need to run a profiler which few benchmarks do. Of course you have to be really careful about removing as many background processes as possible and even then your results won't be 100% accurate, closer to 99%.

If the application is single threaded the resulting IPC will be for both total and per core.

If the application is multi threaded you will get the total IPC but not per core (which is what we're interested in). When people say IPC it is implied (while not technically correct) that they are usually talking about per core IPC unless otherwise stated. It is also implied (once again not technically correct that they are using the simpler time/clock rate equation, which is really just "performance per clock per core". To get the per core IPC with a multithreaded application you cannot just divide the result by the number of cores because that would assume perfect scaling, which is impossible. Instead you simply disable all of the cores except one before you run the application. Hyperthreading, NUMA, SMT, and CMT add even more confusion to all of this.

Unfortunately we haven't been able to do any of this stuff with dolphin.

Quote:We know the number of cores on a processor. So, basically, in theory, you could divide the CPU Mark Score by the number of processors, and get a rough PPC (Power Per Core) score. Higher the better.

There are two reasons why you can't do this:
1. Assumption of perfect scaling (see above)
2. Passmark is a completely different application!

Your IPC is going to be very different on a chip depending on what type of application you're running. For example vishera can actually achieve a higher total IPC in video encoders than ivy bridge while IPC in a compiler will be awful.

Also "power per core" is the amount of power (in watts) consumed per core. Although I have never seen that acronym used before. You might want to pick a different wording.

Quote:With that said, here would be why the Intel Processors would work so much better.

Intel Core i7-3720QM @ 2.60GHz CPU Mark: 8552

8552/4 (Number of Cores) = 2138 Power Per Core score.

Which is also why my AMD A6-4400M is so abysmal. It got a measly...

825.5. Uggh.

Test Validity:

Intel i5 3210M comes out to 3820 Dolphin Score before Turbo, placing it in the very fast category with a PPC of 1910. The Pentium G870 gets a score of 1560, placing it in the Very Fast class. Along with the i3-2100 which gets 1824 which implies that somewhere between 1824 and 1910 is where the break for Extremely Fast likely is. Of course, this depends on architecture latency times, microstuttering, GPU, etc. However, one of the processors that really helps my theory is the Intel Core i3-2370M. This low end i3 gets a PPC score of just under 1500, and is, as predicted, performs at about the same range as the desktop Pentium G870 at Dolphin. Simple, right? It matches up with the list perfectly.

A lucky coorelation, nothing more. This is much more complicating than you are making it out to be.

Quote:Sad Truth: The reason why AMD struggles so hard with Dolphin is because their PPC score is so low. One of the best for their PPC score is the FX-4300M Quad Core. At stock clocks, (3.8GHz...) the FX gets a score of 1262.5. In the numbers, it's obvious why AMD gets destroyed on Dolphin. Painfully obvious.

"PPC" as you call it is a performance metric that you made up that has no relationship to the performance of the vast majority of applications and does little to show the strengths and weaknesses of a processor or processor architecture.

Quote:Luckily the FX series and K series of AMD have unlocked multipliers to overclock to your heart's content.

And power scaling that is far too poor to get a big overclock out of it without insane cooling.

Quote: Now if only I could figure out how to predict PassMark scores by knowing the amount of instructions per clock. Then I'd know how good my processor would perform before I buy it and try to overclock! Lol.

That makes no sense. You would need to have the passmark score to calculate the IPC of a chip in passmark.
Quote:Intel Core i3-2370M. This low end i3 gets a PPC score of just under 1500, and is, as predicted, performs at about the same range as the desktop Pentium G870
Without Hyper Threading (HT) , G620 is faster than the i3 . G860 is much faster
and HT doesn't benefit anything in Dolphin
First of all, I apologize if there are formatting problems in my post. The quoting function on this forum is just a bit strange, I've noticed.


Quote:The data that we are talking about is not the data that you're talking about. But I will respond to the rest of your post simply to allow
further discussion because you might have something useful to point out that I have not considered.

That is unlikely, because after reading your post, I've had several core misunderstandings about how processors work, among other very important things. Most importantly, that IPC is dynamic depending on task apparently, and not a static number. I'm no computer idiot, but I'm not that great either, and I need to stop making baseless assumptions, I apologize for that. Correlation does not equal causation, and in some cases, correlation could just be coincidence.

Quote:But we do know exactly how much IPC specific programs get on specific cpus after we benchmark them. It's a really simple arithmetic equation.

Assuming IPC of total cpu: (instructions processed/time) / clock signal frequency
or another way of writing that: IPS/Fclk (I have no way to type subscript without HTML....that I know of)

Often
we'll just take the easier route and do time / clock rate which gives you a number that while technically isn't IPC can be used to compare the
speed of processors per clock cycle so long as they use the same ISA. To get the number of instructions processed you need to run a profiler
which few benchmarks do. Of course you have to be really careful about removing as many background processes as possible and even then your
results won't be 100% accurate, closer to 99%.

If the application is single threaded the resulting IPC will be for both total and per core.

If the application is multi threaded you will get the total IPC but not per core (which is what we're interested in). When people say IPC it is
implied (while not technically correct) that they are usually talking about per core IPC unless otherwise stated. It is also implied (once
again not technically correct that they are using the simpler time/clock rate equation, which is really just "performance per clock per core".
To get the per core IPC with a multithreaded application you cannot just divide the result by the number of cores because that would assume
perfect scaling, which is impossible. Instead you simply disable all of the cores except one before you run the application. Hyperthreading,
NUMA, SMT, and CMT add even more confusion to all of this.

Unfortunately we haven't been able to do any of this stuff with dolphin.

You're completely right about all of this. The only think that is iffy about what you've said is that I've seen that sometimes, the problem with what you just said is if you try to disable all the cores except one, which do you disable. Supposedly every processor is made differently, and if that's true, how exactly do you judge what core you disable? Even the cores sometimes have small differences between how strong each one is, you know? Wouldn't that be a problem?

Quote:There are two reasons why you can't do this:
1. Assumption of perfect scaling (see above)
2. Passmark is a completely different application!

Your IPC is going to be very different on a chip depending on what type of application you're running. For example vishera can actually achieve a
higher total IPC in video encoders than ivy bridge while IPC in a compiler will be awful.

Also "power per core" is the amount of power (in watts) consumed per core. Although I have never seen that acronym used before. You might
want to pick a different wording

And this is where I also have seem to forgotten that the limiting reactant of the accuracy of making assumptions of your chip using a benchmark...is the benchmark itself and what exactly you're benchmarking. For future reference, however, which benchmark would you recommend as the closest to what you would have to go through on Dolphin? Perfect scaling being impossible leads an interesting point as well. For instance, are the Intel Processors also known to scale better than AMD as part of the reason why they perform better? Just curious.

As for the acronym, it probably doesn't matter, I think your points have done more than enough to make the "acronym" invalid anyway, IMO.

Quote:A lucky correlation, nothing more. This is much more complicating than you are making it out to be.

"PPC" as you call it is a performance metric that you made up that has no relationship to the performance of the vast majority of
applications and does little to show the strengths and weaknesses of a processor or processor architecture.

Agreed on all counts, and sorry for wasting your time regarding it.

Quote:Luckily the FX series and K series of AMD have unlocked multipliers to overclock to your heart's content.
Quote:And power scaling that is far too poor to get a big overclock out of it without insane cooling.

Really? I'd like to hear more about this. I was working on a budget desktop build using certain parts, and both the FX-4300 and A8-5600K were considered (fit in the price range) along with the Intel Pentium G850. Just how bad is the power scaling on the FX and the K?

And if you know, I was also wondering, just how much damage does the lack of L3 on the 5600K cause? Because I found a Mobo I like for the build with six or so SATA III ports, but one, I almost feel that's a waste because not only do many HDs seem to bottleneck with SATA III, but what could I really do with 6 SATA ports anyway?

Also, the 5600K build so far is a bit cheaper than the FX build, but is the difference between them significant enough?

Quote:That makes no sense. You would need to have the passmark score to calculate the IPC of a chip in passmark.

Misconception of mine thinking IPC was static. Again, my own fault. I apologize again for wasting your time. I'll actually go learn something about these things before I go ranting like I actually know something again. My fault.

admin8 Wrote:Without Hyper Threading (HT) , G620 is faster than the i3 . G860 is much faster
and HT doesn't benefit anything in Dolphin

You just saved me some serious money. I was seriously considering getting the i3-2100 instead of the G870 because of their benchmarking scores. However if what you're saying is correct, then the G870/G860 might be perfect for a budget build. Thanks. By the way, as shown in the rest of my post, my assumptions were very faulty, so it's not surprising someone found such a big error in my logic. Thanks though. Do you possibly have a list on this site of how processors perform in a line better than others? You know like Processor A > Processor B instead of putting them in groups?

Wondering so I can know which processor to get for a Dolphin budget build on both AMD and Intel.
A CPU that performs extremely well in single-threaded benchmark (Cinebench Single-Thread for example) or Starcraft 2 (which is a dual core application) will have good performance in Dolphin

Quote:the G870/G860 might be perfect for a budget build
Those Pentium Sandy Bridge are outdated . Pentium SB is basically an i3 SB without Hyper Threading ( 2 cores 2 threads vs 2 cores 4 threads )
I could build sth better :
_Pentium Ivy Bridge G2120 + B75 mobo
_i5 3450 + Asrock Z77 (with "no k overclock" function , must be Asrock Z77 or else no overclocking for you ) -> Overclock to 3.9GHz http://forums.guru3d.com/showthread.php?t=366476
_i5 3570k + Any Z77 mobo -> Overclock to 4.2GHz

When Haswell CPU is released (Probably in June 2013) , those build will be outdated again

There is no budget AMD CPU atm since Dolphin is a dual core application and those fast dual core i3 (Sandy /Ivy) outperform the most high-end slow 8 cores FX-8150 / FX-8350 . Unfortunately ,Dolphin benchmark ....was discontinued . Btw , there is a CPU benchmark on PCSX2 forum , you should take a look
Quote:First of all, I apologize if there are formatting problems in my post. The quoting function on this forum is just a bit strange, I've noticed.

Use the source tab in the reply window, copy/paste the stuff you want to quote, and manually insert the quote tags around it (as I'm sure you have already figured out since the structure of your post indicates that you're already doing this).

Quote:That is unlikely, because after reading your post, I've had several core misunderstandings about how processors work, among other very important things. Most importantly, that IPC is dynamic depending on task apparently, and not a static number.

Well that's both right and wrong.

The hardware IPC is static, the software IPC is dynamic. The term was originally used to refer to the hardwares capabilities but later began to more commonly refer to software IPC since hardware IPC is basically a useless metric.

Let's use x86 Intel cpus as an example:

P1 (8086, 8088)
IPC: 0.5

P1.5 (186, 188)
IPC: 0.5

P2 (286)
IPC: 0.5

P3 (386DX, 386SX, 386SL, 376, 386EX, 386EXTB, 386EXTC, 386CXSA, 386CXSB, 386SXSA, 386SXTA)
IPC: 0.5

P4 (486DX, 487SX, 486SL, 486SL-NM, 486DX-S, 486SX, 486SX-S, 486DX2, 486DX2WB, 486DX2-S, 486SX2, 486DX4, 486DX4WB, 486GX, RapidCAD, 486ODP, 486ODPR)
IPC: 1

P5 (Pentium, Pentium MMX, Mobile Pentium)
IPC: 2

P6 (Pentium Pro, Pentium II, Pentium III, Celeron, Mobile Pentium II, Mobile Pentium III, Pentium III M)
IPC: 3

P6-8 (also called netburst, Pentium 4, Pentium 4 HT, Pentium 4E, Pentium D, Pentium EE, Celeron, Celeron D, Mobile Pentium 4, Pentium 4M)
IPC: 3

PM (Pentium M, Core Solo, Core Duo, Pentium Dual Core, Celeron M)
IPC: 3

Core (Core 2 Duo, Core 2 Quad, Core 2 Extreme, Pentium Dual Core, Celeron)
IPC: 4

Nehalem/Westmere (Core i3, Core i5, Core i7, Core i7 Extreme, Pentium Dual Core, Celeron)
IPC: 4

Sandy Bridge/Ivy Bridge (Core i3, Core i5, Core i7, Core i7 Extreme, Pentium, Celeron)
IPC: 4

Haswell/Broadwell (Based on the assumption that they will continue the current branding system which seems very likely at this point: Core i3, Core i5, Core i7, Core i7 Extreme, Pentium, Celeron)
IPC: 5

You'll notice that this does not line up with the average "performance per clock" of applications at all. The first three generations of x86 cpus were all capable of executing a maximum of 1 instruction every 2 clock cycles under perfect conditions, an IPC of 0.5. However there are no applications that come close to those "perfect conditions" so there are no applications that come close to an IPC of 0.5. In most cases is was closer to 10-30 cycles per instructions or an IPC of 0.1-0.033. As such the IPC of your average application went way up with each new generations even the chips were still capable of the same maximum IPC.

I could continue for the remaining generations, but you get the idea.

Quote: I'm no computer idiot, but I'm not that great either, and I need to stop making baseless assumptions, I apologize for that. Correlation does not equal causation, and in some cases, correlation could just be coincidence.

As long as you approach this in a scientific manner (which you appear to be doing) I have no issues with what you do and do not already know.

Quote:You're completely right about all of this. The only think that is iffy about what you've said is that I've seen that sometimes, the problem with what you just said is if you try to disable all the cores except one, which do you disable. Supposedly every processor is made differently, and if that's true, how exactly do you judge what core you disable? Even the cores sometimes have small differences between how strong each one is, you know? Wouldn't that be a problem?

Not really. All of the cores are SUPPOSED to be physically identical since they use the same microarchitecture. However since the manufacturing isn't perfect they end up having slightly different physical/electrical properties. One core may be able to reach 3.5GHz and remain stable while another on the same chip may only be stable up to 3.2GHz. If even one core is defective the entire chip gets thrown out. If the cores are stable at different frequencies they use the highest frequency that all of the cores remain stable at. This of course ignores thermal limits but it's the same idea with TDP.

As long as the cores have the same clock signal frequency and microarchitecture (which they always do) they will perform the same. And as such you can disable any cores you want and you will still get the same results.

Quote:And this is where I also have seem to forgotten that the limiting reactant of the accuracy of making assumptions of your chip using a benchmark...is the benchmark itself and what exactly you're benchmarking. For future reference, however, which benchmark would you recommend as the closest to what you would have to go through on Dolphin?

I wouldn't. I would recommend you look at a wide range of different types of benchmarks to get an idea of the strengths and weaknesses of an architecture.

Dolphin like most applications is a balance of many different types of code. And as such no single synthetic test is going to give you an accurate comparison.

Quote: Perfect scaling being impossible leads an interesting point as well. For instance, are the Intel Processors also known to scale better than AMD as part of the reason why they perform better? Just curious.

Not really. The issue of multithreaded performance scaling is mostly on the software side. Atomic locks for example.

Quote:Really? I'd like to hear more about this. I was working on a budget desktop build using certain parts, and both the FX-4300 and A8-5600K were considered (fit in the price range) along with the Intel Pentium G850. Just how bad is the power scaling on the FX and the K?

Pretty bad:
[Image: 51144.png]



Quote:And if you know, I was also wondering, just how much damage does the lack of L3 on the 5600K cause?

Not much. The hitrates are very high on piledrivers L2 cache and the L3 cache on the FX-4300 is only 4MB anyways.

Quote: Because I found a Mobo I like for the build with six or so SATA III ports, but one, I almost feel that's a waste because not only do many HDs seem to bottleneck with SATA III, but what could I really do with 6 SATA ports anyway?

Not much. HDDs won't show much difference between SATA II and SATA III (or even SATA I and SATA II for that matter). The only difference is going to be your burst speed (read/write to cache). But since the cache is too small for most of what a typical HDD does you usually won't see any difference between the two.

SATA III support is important for high end SSDs however.

Quote:Also, the 5600K build so far is a bit cheaper than the FX build, but is the difference between them significant enough?

What exactly are the prices on these builds? I can't judge whether it's significant enough if I don't know how significant the price difference is. I would expect the performance difference to be around 10-12%

I would recommend that you steer clear of AMD at this point at almost any price range.
(01-13-2013, 08:37 AM)NaturalViolence Wrote: [ -> ]I would recommend that you steer clear of AMD at this point at almost any price range.
For Dolphin that is.

In particular, compared to something like an Ivy Core i3, the 65w TPD APUs (such as the A10-5700) are much more balanced and competitive overall:
http://www.overclock.net/t/1320379/hexus-review-amd-a10-5700-65w-quad-trinity


Back onto talking about Vishera, since Piledriver is a refined Bulldozer architecture, would it be possible to bench an FX-8150 in Dolphin, then calculate the performance difference Vishera has over Zambezi via typical PC benchmarks and then extrapolate that difference into Dolphin?
Quote:For Dolphin that is.

In particular, compared to something like an Ivy Core i3, the 65w TPD APUs (such as the A10-5700) are much more balanced and competitive overall:
http://www.overclock.net/t/1320379/hexus...ad-trinity

For almost everything. There are a few integer based heavily multithreaded applications that do well on piledriver like video encoders and that's it. The majority of applications that your typical consumer uses run faster on Intel cpus.

The real upside is the IGP. But your average consumer has absolutely no use for a high end IGP because they don't play video games. And gamers have no use for them either because even the low end discrete GPUs are massively faster. Not to mention Intel cpus perform much better in games when paired with a discrete GPU (which most gamers have) than AMD cpus.

The only cpu benchmark you showed was a video encoder.

It's very interesting to note that AMD is now in a similar position as Intel was with the Pentium 4. Their chip performs well in the same types of applications and is also insanely power hungry at higher clock rates. Their resonant clock mesh technology allows them to maintain decent power efficiency at lower clock rates but it scales very poorly.

http://www.anandtech.com/bench/Product/675?vs=677
http://www.anandtech.com/show/6347/amd-a10-5800k-a8-5600k-review-trinity-on-the-desktop-part-2/8

Quote:Back onto talking about Vishera, since Piledriver is a refined Bulldozer architecture, would it be possible to bench an FX-8150 in Dolphin, then calculate the performance difference Vishera has over Zambezi via typical PC benchmarks and then extrapolate that difference into Dolphin?

You could get a rough estimate but no you could not calculate an exact result. The IPC improvement is subject to change based on which type of application you're using. We can assume that it's around 8-12% better based on similar applications.
I don't really feel like going into a whole CPU architecture discussion, but I'd just like to say that the Pentium 4 comparison is only half correct. Northwood, viewed as the only "good" Pentium 4, had a 20 stage pipeline - the exact same amount as Bulldozer and Piledriver. It was Prescott that was the really power-hungry one and had a stage pipeline that supposedly varied anywhere from 30 to 100.

Also, I think you'd be surprised at how many people play PC games without being a "hardcore gamer". You don't need to be playing Battlefield 3 or anything - do you know realize many people play Minecraft and World of Warcraft? And Intel iGPU are particularly bad at OpenGL performance, so Minecraft pretty much requires AMD or Nvidia graphics.

EDIT: And I'm finding out that apparently Sandy/Ivy Bridge's iGP + Windows 8 = problems with OpenGL support. Yet again, AMD or Nvidia is pretty much required.
[Image: 50163.png]
Source: http://www.anandtech.com/show/6332/amd-trinity-a10-5800k-a8-5600k-review-part-1/6
Nvidia GT 640 & 440 DDR3 28.8GB/s memory bandwidth ...
AMD 7660D 29.9GB/s
Depending on Nvidia driver for each game . Some games are poor optimized -> Bad result
I've seen Nvidia GTX 680 that only have 10FPS with minesweeper ...
Overall , Nvidia GPU should be faster , not slower (as you can see on other game benchmark)
The GDDR5 version (51.2GB/s) is much faster than AMD 7660D