Dolphin CPU hierarchy [UNOFFICIAL]
|
02-02-2014, 09:45 AM
Why is AMD FX under "very fast" when we all know bulldozer and piledriver CPUs are crap compared to Intel?
02-02-2014, 10:06 AM
(This post was last modified: 02-02-2014, 10:06 AM by Nintendo Maniac 64.)
(02-02-2014, 09:45 AM)drhycodan Wrote: Why is AMD FX under "very fast" when we all know bulldozer and piledriver CPUs are crap compared to Intel? Because the Intel CPUs listed under "very fast" are not unlocked. Also Steamroller would probably be quite the beast if it had a CPU model with an L3 cache, especially with the latency bugfixes that Excavator will have.
Dolphin 5.0 CPU benchmark
CPU: Xeon E3-1246 v3 (4c/8t Haswell/Intel 4th gen) — core & cache @ 3.9GHz via multicore enhancement GPU: Intel integrated HD Graphics P4600 RAM: 4x8GB Corsair Vengence @ DDR3-1600 OS: Linux Mint 20.3 Xfce + [VM] Win7 SP1 x64 02-02-2014, 11:09 AM
But that's just wrong and misleading comparing a 5ghz OCed FX to an i5 stock. This is why people constantly complain about Dolphin's performance and claiming to have a "high end" CPU. They need to be aware that NO FX processor is equal to even an i3 stock.
02-02-2014, 11:13 AM
(This post was last modified: 02-02-2014, 11:13 AM by Nintendo Maniac 64.)
(02-02-2014, 11:09 AM)drhycodan Wrote: They need to be aware that NO FX processor is equal to even an i3 stock. All FX CPUs are unlocked. No i3s are unlocked. 'nuff said
Dolphin 5.0 CPU benchmark
CPU: Xeon E3-1246 v3 (4c/8t Haswell/Intel 4th gen) — core & cache @ 3.9GHz via multicore enhancement GPU: Intel integrated HD Graphics P4600 RAM: 4x8GB Corsair Vengence @ DDR3-1600 OS: Linux Mint 20.3 Xfce + [VM] Win7 SP1 x64 02-02-2014, 12:33 PM
02-06-2014, 10:54 AM
Ugh. Lets get this started I guess. cdoublejj Wrote:Why would say a 2.83ghz Q9550 @ 3.83 ghz be in in the Fast range while a core i3 is in the Very fast range? let alone a stock clocked Q9550? 1. Because the list does not account for clock rate or other differences between specific models. Let alone overclocking. 2. Because depending on which generation you're talking about an i3 could be up to 80% faster clock for clock. A 3.4GHz stock haswell i3 for example would completely demolish an overclocked Q9550. cdoublejj Wrote:wait it can't use more than 1 core O_0? Nobody said you couldn't. cdoublejj Wrote:doesn't the wii have more than core? wth? No it does not. The Wii is based on a modified version of the GC cpu. It has a bigger cache, a faster bus, a higher clock rate, lower power consumption, and it's physically smaller due to a die shrink from a newer manufacturing process. But other than that it's pretty much the same cpu and has the same microarchitecture. The GC cpu was in turn a modified version of the powerPC 740/750 from 1997. Multicore cpus did not exist back then. cdoublejj Wrote:also in that case it isn't just speed it's architecture and IPC, aka how much work it can get don per clock. If "speed" is equivalent to "performance" then we are measuring instructions per unit time. Not IPC. Seconds would be the standard unit of time to use so IPS = IPC * CR IPS = instructions per second (total amount of work done per second of time) IPC = instructions per clock (amount of work done per clock cycle) CR = clock rate (number of clock cycles per second) cdoublejj Wrote:also keep in mind Delphine aside an OCed Q9550 special those paired with DDR3 are on pare with with lower end of the spectrum i5s. No it's not. Not even close. At least not with the current generation. And the use of DDR3 ram in core 2 systems has little to no impact on performance due to the slow FSB and on-board IMC rather than on-die. This is easily verifiable through user testing and some motherboards which supported both DDR2 and DDR3 ram. cdoublejj Wrote:It can actually support a GTX780 with very little bottle neck. I don't doubt this since most PC games require very little cpu performance. That doesn't mean the same will happen with dolphin or more cpu heavy PC games though. cdoublejj Wrote:i'd should try and put the screws to it since i have it paired with a GTX480. any game suggestions? Planetside 2 or BF3 are some good popular cpu heavy PC games. cdoublejj Wrote:i'd also like to try give my 4.0ghz a 1100T a go but, i'm limited with my HD4850s till i can upgrade my GPU. Not in dolphin you're not. cdoublejj Wrote:Then i would think Dolphin performance would very much benefit from a multi core CPUs and not just be entirely on single core strength. It does. But single core cpus haven't been around for many years with the exception of a few single core celerons that retailers don't ever carry. This is why we look at per core performance for evaluating how a cpu will perform with dolphin. Since pretty much all cpus have enough cores for dolphin the only thing that effects their performance is how those cores perform individually. 2, 4, 6, or 8 cores makes no difference with a few minor exceptions for 4 core cpus. cdoublejj Wrote:I think a better way to say it is that Dolphin is sooo demanding it needs all the power it can get thusly newer gen CPUs even if they have less cores benefit better performance wise. What does that even mean! kinkinkijkin Wrote:"Single-Core Performance" does not refer to how the CPU performs when it is only using one core. Yes it does. kinkinkijkin Wrote:It refers to how well it performs each thread. That would be single thread performance. Not single core performance. kinkinkijkin Wrote:Because of some random people who hardly know what they're talking about confusing that, I, myself, use "Per-Core Performance", shortened to PCP. Which is the same thing as single core performance except that nobody will know what you're talking about since you just made up that acronym which is not used in the industry. I don't see how that could possibly be less confusing. kinkinkijkin Wrote:Dolphin cannot use high parallel performance, because there's just not enough things to split off without per-game hacks, Wait. What things could we "split off" with per-game hacks? Dolphin doesn't use more than 2-3 cores because the Wii has only 3 primary microprocessors to emulate. And emulation of a processor is inherently a serial task. kinkinkijkin Wrote:and, even then, the speed boost will be negligible on anything but the most perfectest parallelism ever put in a processor. How would a "most perfectest parallelism processor" (whatever that means) boost performance in a situation where parallelism doesn't boost performance? That makes no sense to me. DJBarry004 Wrote:All that those guys said (KHg8m3r, garrkler, kinkinkijkin...), without mentioning the -hell- it could cause Dolphin if it used all the CPU cores (yes, I´m referring to the temps! They´ll blow up any PC if that happens!). DJBarry004 Wrote:I was referring to the heat it was going to be caused by Dolphin if it used the 4/6/8 CPU cores, maxing them out... Yes imagine the hell caused by making your processor do what it was designed to do...... All cpus are designed not to overheat at full load so long as the guidelines for case design and TIM application are followed correctly. If they couldn't handle working at full load they wouldn't be able to sell them. People aren't going to buy a system that stops working when you push it too hard (laptops excluded because half of them don't follow the cpu manufacturers requirements). kinkinkijkin Wrote:Only on coolness-challenged processors, kinkinkijkin Wrote:like any of intel's newer processors, or any of AMD's 8-core FX-series/6-core Phenom IIs. Look. It's really simple. Every cpu has a maximum Tcase and Tjunction temperature at which damage is caused either in the short term or long term. To prevent the temperature from getting that high and damaging the hardware the cpu/chipset is programmed to constantly read the temperature from digital probes located on the die and throttle when it gets too high. Or under a worst case scenario shut the system down completely. To prevent this from happening Intel/AMD set specific testing standards at their facilities and design the specs. of their cpu models to have a specific TDP. The TDP is the exact amount of heat energy measured in watts that must be dissipated by the cooling system per second to achieve operating (safe) temperatures. For desktops they provide stock coolers that can process this amount of TDP as long as they are properly mounted and the airflow isn't blocked by anything. For laptops the OEMs are responsible for designing and testing a cooling system that meets the required TDP. Although Intel/AMD do provide reference designs that they can choose to follow. As you overclock by raising clock rates and voltages the TDP goes up with it. The rate it increases at is dependent on the architecture. The latest AMD and Intel cpus show exponential TDP increase when overclocking (due to high power consumption for AMD and TIM acting as a bottleneck for Intel). However there is no reason to believe that these cpus would be any more difficult to cool at stock settings than less "coolness-challenged" processors. At stock settings TDP values for recent desktop cpus are as follows: Hexa core gulftown (nehalem): 130 watts Quad core bloomfield (nehalem): 130 watts Quad core lyynfield (nehalem): 95 watts Quad core sandy bridge: 95 watts Quad core haswell: 83 watts S edition lyynfield (nehalem): 82 watts Quad core Ivy bridge: 77 watts Dual core clarkdale (nehalem): 73 watts S edition quad core core i5/i7: 65 watts Dual core sandy bridge: 65 watts Dual core ivy bridge: 55 watts Dual core haswell: 53/54 watts T edition dual core core i3/pentium: 35/45 watts Bulldozer/piledriver: 95/125/220 watts (220 is for the FX 9000 series) A series APUs (as well as athlon II and athlon X2/X4 based on them): 45/65/100 watts (most are 65/100) Phenom II Quad core: 65 (e series)/95/125/140 watts (most are 95/125) Phenom II Tri core: 65 (e series)/95 watts Phenom II Dual core: 65/80 watts (most are 80) Athlon II Quad core: 45 (e series)/95 watts Athlon II Tri core: 45 (e series)/95 watts Athlon II Dual core: 25 (u series)/45 (e series)/65 watts Sempron Single core: 20 (u series)/45 watts As you can see the stock TDP of current Intel and AMD cpus is actually lower than their predecessors. Not higher. People only bitch about them because they overclock them to insane clock rates. If you're curious about the whole issue with Intel switching from solder to TIM in ivy bridge/haswell the armchair analysts of the internet have all jumped to the same conclusions they always jump to when they don't want to invest the time to research the technology that they're complaining about. "They're cheaping out to try and screw consumers!". When in fact the cost difference between using solder and TIM in manufacturing is negligible. How people never seem to consider the idea that the engineers working on this stuff know a lot more about it than they do and usually have good reasons to make the design decisions that they do is beyond me. The real reason they did it is because the die size of these new cpus is too small to support traditional indium solder. "With the reduction in die size the thermal-cycling-driven fatigue cracking between the indium and the indium-gold intermetallic becomes uncontrollable leading to voids forming in the solder." Source: http://iweb.tms.org/PbF/JOM-0606-67.pdf The reason people see a big difference in temperature when they delid their cpus is because Intels manufacturing process applies too much TIM. It's not the quality of the TIM that's hurting temperatures it's the quantity. Much thinner layers can be applied by hand. But doing it by hand is too slow for modern mass manufacturing. The only alternative is too make the chip bigger which increases cost and complexity. To increase the size you either need to lengthen the interconnects, use bigger transistors, or add more transistors (or some combination of those). If you increase the transistor count you end up with a big power hungry heat machine like bulldozer that gets the same high TDP but with a higher power consumption to accompany it (thereby making the design even worse). If you lengthen the interconnects electrical latency increases and you're forced to lower the clock rate to account for higher delays. Making the cpu slower. Using bigger transistors adds both latency and power consumption. What conclusion should we draw from this? Intels engineers knew what they were doing believe it or not and made the right call. Every alternative is objectively much worse than using TIM under the IHS. But don't tell that to the hordes of people on the internet desperately scrambling to make any company they buy expensive products from into some satanic horde trying to screw them over. There are still people who genuinely believe this is part of some conspiracy by Intel to stop overclocking. And that annoys me to no end. As for bulldozer/piledriver we have an architecture that's designed at every level to reduce latency at the expense of power leakage. It's designed this way at the architecture level, the logical gate level, and the physical gate level. So no wonder it scales so poorly with power consumption. However both of these chips do what they were designed to do well at stock settings and don't have any heat issues. These chips aren't really designed for overclocking. No chip ever has been. Intel and AMD target a specific TDP that they know OEMs can work with (keep it cool) and optimize their chips around that. They don't even consider people running them at higher than recommended settings when they're designing the architecture. No sane person would. When 99% of your userbase doesn't OC you want to make sure the chip works well for them, period. Why design a chip that's slower and less efficient at stock just so some users can OC a bit higher? And for reference most stock coolers are designed to handle 50-100 watt TDP for smaller coolers and 100-150 watt TDP for larger coolers. Most aftermarket air coolers and AIO water cooling systems generally cap out at 150-200. Higher end aftermarket air coolers and AIO water cooling systems can hit 200-250. Serious water cooling can push into the 250-350 range. Phase change can go higher than that. Dry ice higher than that. LN2 higher than that. LHe higher than that. kinkinkijkin Wrote:EDIT: Wait, newer intels are the best CPUs for dolphin, aren't they? Darn. Yes. cdoublejj Wrote:people think of mhz and ghz as speed but, it's also sort of like storage. How many instructions and commands can fit in CPU or rather ALUs (aka calculator) in the CPU core at once before it's cycled. (aka when all the switches flip from 1 to 0 or vice versa) What!?!? Clock rate has literally nothing to do at all with anything you just said. I have no idea how you pieced all of that nonsense together. How did he go this long without anybody else bothering to correct him? All you really need to know about clock rates is that the different parts of a cpu are often divided into pipeline "stages" that are linked to each other in a chain like fashion. Separating these stages is a system of gates and registers. When these gates are opened microcode and data can flow from one pipeline stage to another. The clock rate is a signal that controls the opening and closing of these gates so that they all run in sync with one another. At a basic level that's all you need to know. It has nothing to do with storage or anything else you mentioned and it does not cause all of the transistors in the cpu to switch states. cdoublejj Wrote:this might be why a brand new dual core i3 can perform faster than an old/older qaud core. Or maybe the effectiveness of a system of billions of interconnected semiconductor gates can't be reduced down to a single number? There are more variables that effect cpu performance and efficiency than you can possibly imagine. kinkinkijkin Wrote:IPC is a theoretical measurement, SCP/PCP is an arbitrary, relative, and realistic measurement, and also not a measurement. Since when is IPC theoretical? Last time I checked it's an easily measurably real quantity. drhycodan Wrote:Why is AMD FX under "very fast" when we all know bulldozer and piledriver CPUs are crap compared to Intel? Because they're faster than the category 3 cpus and slower than the category 1 cpus. It's that simple. Nintendo Maniac 64 Wrote:especially with the latency bugfixes that Excavator will have Do you know something I don't? As far as I know every latency issue that has been found in piledriver/bulldozer is inherent to the design. Longer pipelines, bigger caches, modular resource sharing, etc. drhycodan Wrote:But that's just wrong and misleading comparing a 5ghz OCed FX to an i5 stock. This is why people constantly complain about Dolphin's performance and claiming to have a "high end" CPU. They need to be aware that NO FX processor is equal to even an i3 stock. Nobody is making that comparison. I compare stock against stock. On average i3 cpus (not just haswell i3s) are slightly faster than FX series cpus but not enough to put them into a different category. Compare an FX-8350 against a core i3 2100 for example. They're about equal. Eh. This is good enough for now. I could have responded to 30 threads in the time it took me write this so far so I'm not going to bother proofreading at this point.
"Normally if given a choice between doing something and nothing, I’d choose to do nothing. But I would do something if it helps someone else do nothing. I’d work all night if it meant nothing got done."
-Ron Swanson "I shall be a good politician, even if it kills me. Or if it kills anyone else for that matter. " -Mark Antony 02-06-2014, 11:09 AM
(This post was last modified: 02-06-2014, 11:10 AM by Nintendo Maniac 64.)
(02-06-2014, 10:54 AM)NaturalViolence Wrote:Nintendo Maniac 64 Wrote:especially with the latency bugfixes that Excavator will have You mean you didn't know? Well then I guess I do know something you don't, unless you prefer to not believe anything until you see it in action. To quote: http://www.anandtech.com/show/6201/amd-d...itecture/2 Wrote:According to AMD, they’ve isolated the reason for the unusually high L3 latency in the Bulldozer architecture, however fixing it isn’t a top priority. Given that most consumers (read: notebooks) will only see L3-less processors (e.g. Llano, Trinity), and many server workloads are less sensitive to latency, AMD’s stance makes sense.
Dolphin 5.0 CPU benchmark
CPU: Xeon E3-1246 v3 (4c/8t Haswell/Intel 4th gen) — core & cache @ 3.9GHz via multicore enhancement GPU: Intel integrated HD Graphics P4600 RAM: 4x8GB Corsair Vengence @ DDR3-1600 OS: Linux Mint 20.3 Xfce + [VM] Win7 SP1 x64 02-06-2014, 11:40 AM
You're correct that I didn't know that. Still we have little reason to believe that will significantly impact performance since AMDs architecture has massive L2 caches with near perfect hit rates. Like they said the L3 is almost never accessed in the vast majority of workloads. And I'm willing to bet that dolphin would be included in that. Anandtechs own analysis confirms that there are a myriad of other much larger contributing factors to bulldozer/piledrivers poor IPC. Not just cache latency. Much less L3 cache latency (their L2 and L1 latencies are also abysmal compared to Intel).
Add to that the fact that for the last 5+ years almost everything that AMD has said about the performance of future architecture has turned out to be a load of horseshit. So we have some reason to doubt them until we see it in action. And this time it looks like they're not even trying to promise significant gains.
"Normally if given a choice between doing something and nothing, I’d choose to do nothing. But I would do something if it helps someone else do nothing. I’d work all night if it meant nothing got done."
-Ron Swanson "I shall be a good politician, even if it kills me. Or if it kills anyone else for that matter. " -Mark Antony 02-06-2014, 11:43 AM
(This post was last modified: 02-06-2014, 12:11 PM by Nintendo Maniac 64.)
All I know is, Dolphin seems to like the L3 quite a bit if you reference the new Dolphin CPU benchmark, in particular Vishera is noticably faster than Richland and Trinity per-GHz.
EDIT: Direct link for your convenience to the benchmark results: https://docs.google.com/spreadsheet/ccc?...Fa1E#gid=0 EDIT 2: Also possibly relevant is that in the same benchmark, Kaveri (AKA L3-less Steamroller) is actually a bit faster per-GHz than Vishera. Here are the links to two Kaveri results that weren't good enough for delroth (link) to add to the chart because the first was OC'd and the second was stock but had turbo disabled: https://forums.dolphin-emu.org/Thread-ne...#pid308229 https://forums.dolphin-emu.org/Thread-ne...#pid308509
Dolphin 5.0 CPU benchmark
CPU: Xeon E3-1246 v3 (4c/8t Haswell/Intel 4th gen) — core & cache @ 3.9GHz via multicore enhancement GPU: Intel integrated HD Graphics P4600 RAM: 4x8GB Corsair Vengence @ DDR3-1600 OS: Linux Mint 20.3 Xfce + [VM] Win7 SP1 x64 |
« Next Oldest | Next Newest »
|
Users browsing this thread: 2 Guest(s)