Oh christ another one of these. I pray I never have to explain this stuff again because I've done it more times than I can count now.
Please read the following post (repeatedly until you completely understand it):
Glossy of terms on OP in this thread, specifically the section on parallelism:
http://forums.dolphin-emu.org/Thread-cpu-microarchitecture-hierarchy
KillAWatt1705 Wrote:I'm going to blabber a bit here seeing as I'm on my lunch-break and it never hurts to have an interesting discussion. Big Grin
Firstly, you're partially right. Hyper-threading is entirely virtual, meaning that for each physical core that is present, the OS treats it as two virtual/logical cores. this shows up as 8 threads in an OS.
This is a bit misleading. AMD told the press this because it sounds better than a more accurate description. Both HT and CMT require similar amounts of additional hardware at the core level to implement in order to allow the core to manage two threads. Both have redundant hardware that is disabled if the core is only running one thread. Both therefore provide two logical cores from one physical set of hardware. The only difference is the use of dedicated vs. shared control schemes. AMD apparently deems this enough to consider a module two cores (since that helps market the product) when it really should be considered one core. Since the term core has never been formally defined and has evolved into a marketing term (introduced for comparison purposes with multicore processors) AMD is free to spin this however they want.
For the record a dedicated control scheme is slightly better in the event that two threads are running on the same core and a shared control scheme is MUCH better if one thread is running on a core.
KillAWatt1705 Wrote:Hyper-threading's main advantage is that it decreases the number of operations in a single pipeline, and allows the OS to schedule executions in this manner as well. If you think about the main principle of parallelization, which is breaking down tasks and executing them individually and simultaneously, this is almost always faster than cramming it down a single queue at a ridiculously faster clock. This is what hyper-threading does, and does well.
Hyperthreading provides two main advantages to heavily multithreaded programs:
1. Higher utilization of resources due to reduced data dependencies
2. In the event of a stall the other thread can continue execution
KillAWatt1705 Wrote:However, because the 4 of the 8 threads in hyper threading are entirely virtual,
None of them are virtual. All of them are logical. And they're not threads, they're cores. You have 4 physical cores and 8 logical cores.
KillAWatt1705 Wrote:this means at a low-level a single physical core is still executing two threads at a rate of whatever that physical core is operating at. In addition, however many floating point, integer units and instruction sets that physical core has is shared between the 2 threads per core.
A core can have multiple instruction sets? What?
KillAWatt1705 Wrote:AMD's Piledriver architecture is very different, the CPU has 8 physical cores, and the OS recognizes that as 8 physical threads.
Physical threads? What?
You have 8 physical cores and 8 logical cores.
KillAWatt1705 Wrote:On top of this, for each core you have a dedicated number of instructions sets (FMA, XOP, F16C instructions, etc)
What? That makes no sense. All cores support the same instruction set extensions.
KillAWatt1705 Wrote:and the floating point/integer units for that thread are computed at an unified speed of the physical core itself, unlike a hyper-threaded counterpart.
What? The makes no sense. Please rephrase.
KillAWatt1705 Wrote:In Intel's core all threads must compete for available execution resources.
All threads that are running on the same core, yes.
KillAWatt1705 Wrote:Where Piledriver lacks is single-threaded performance. Hence why for gaming (where most games while only ever utilize 2-4 threads) Intel's i series generally does better.
The story behind why piledriver sucks for running video game engines is a bit deeper than "because it's not heavily multithreaded". But I don't really have time to discuss that in detail right now.
KillAWatt1705 Wrote:But for any application that uses 4 or more threads, and video encoding/editing is a prime example of this, the Piledriver architecture really does shine.
Being heavily multithreaded is not enough to make piledriver shine. The application needs to have a number of traits as well that video encoders happen to have. Such as infrequent and short data dependency chains, integer heavy arithmetic calculations, fairly predictable memory access patterns. Etc.
KillAWatt1705 Wrote:If you were to throw a heavily multi-threaded application at the FX-8350, the FX-8350 will almost always come out on top,
Not necessarily true. It depends on the type of application.
KillAWatt1705 Wrote:the AMD does well in multi-GPU setups, which exponentially relies on paralleling CPU performance,
Oh brother. No it doesn't.
KillAWatt1705 Wrote:which divides better over 8 physical threads than 4 virtua and 4 phsyical ones: see here (you can find more of these on the overclock.net, where people are able to run 16 instances of minecraft without a hitch, etc).
Please for the love of god look up the terms physical thread, virtual thread, logical thread, logical core, virtual core, and physical core. You clearly do not understand the difference between them.
KillAWatt1705 Wrote:I probably sound pro-AMD saying all this but sadly most people tend to denote a CPU's worth by it's performance in gaming, which for obvious reasons I can see why. But I thought I'd explain my reasons for choosing the CPU for my upcoming research (that and the entire paper relies on architectural CPUs that have octo-cores and their computational efficiency).
You don't sound pro-AMD. You sound like you have no idea what you're talking about.
KillAWatt1705 Wrote:As a side note, FX-8350 CPU's are underlcocked to 4.0GHz. That means that it's very easy to overlcock, if you look over forums, just by raising the voltage slightly and the FSB clock you can easily reach a stable speed of 5Ghz (and this is on air). The FX series are well known for their overclocking prowess.
They are terrible at overclocking. Their power scaling is one of the worst of any architecture in the last 5 years. They are one of the few architectures that requires a voltage bump for even a tiny OC. The mark of a chip that has been pushed to its thermal limits out of desperation.
Sandy bridge can hit 5GHz with half the power consumption and a much smaller bump in voltage.
It is a horribly inefficient architecture. And this is after 7 years of development and more than 3 deadline extensions. Over the last 10 years almost every outstanding electronic engineer at AMD has either left the company or been promoted to management positions where they don't belong. And it shows.
I do not have the willpower right now to spend multiple hours (yes it would take that long) writing a wall of text detailing why you're wrong, what the correct definitions are for many of those words that don't seem to understand, and detailing the differences between these two architectures as well as what software characteristics are affected by them and how. Instead I will ask that in the meantime you search for these things on the forum and read through that post that linked at the beginning of these posts.
Now I guess I'll begin reading some of the other posts in this thread since responding to yours at least peaked my interest a little.