Dolphin, the GameCube and Wii emulator - Forums

Full Version: CPU Microarchitecture Hierarchy
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Temporary. Will be cleaned up later.

Glossary of Terms:

Basic definitions
Processor: A component in a computing system that processes (manipulates) data. It may have multiple die and/or multiple cores and/or multiple packaged ICs. In the case of microprocessors each packaged IC is considered a processor.
CPU (central processing unit): The part of a computing system that executes the instructions of a computer program. It is the main processor in a computer and it is a general purpose processor (not specialized towards any specific type of task) instead of a specialized processor. A computer may have multiple cpus.
Die: A piece of silicon with circuits etched into it. Also called a chip or IC.
Package: A die (or several dies) are usually sealed inside a protective shell called a package. The package typically has pins coming out of it to provide electrical connections to the die (or multiple dies) inside. The package can then be soldered onto a circuit board or mounted into a socket. One package fits into one socket.
Substrate: The surface that the die is mounted on. It is the bottom part of the IC package.
IHS (integrated heat spreader): An aluminum cover that protects the die and dissipates heat. It comes in contact with the bottom of a cooler. It is the top part of the package on modern microprocessors.
Socket: A mechanical component that provides electrical connections between packaged ICs and PCBs (printed circuit boards).
PCB (printed circuit board): More commonly just called a circuit board. Your graphics card and motherboard are both examples of this. Motherboards, daughterboards, and expansion cards are all types of PCBs.
IC (integrated circuit): Circuits that have been integrated into a piece of silicon using photolithography (also called fabrication). This term is commonly used to refer to the resulting chip but can also be used to refer to the circuits themselves. The proper term for the resulting chip is a "packaged IC".
Chip: The more common term for an IC die.
Module: A term used by AMD to refer to each pair of cores in their latest microarchitectures. Each pair of cores shares certain resources and so those resources are collectively called "modules" A 4 module cpu for example has 8 cores.

To demonstrate this more clearly here are some images.

This is a core 2 quad processor with the cover (IHS, heatspreader) on. This entire thing is a cpu package:
[Image: core-2-cpu,E-B-180659-13.jpg]


This is the same processor package but with the IHS removed. On the left you can see the two dies (also called ICs and chips) on the substrate. You also have a shot of the underside of the IC package on the right:
[Image: intel_q6600.jpg]


And here is an image from intel further demonstrating the terms:
[Image: core.jpg]


Remember that each chip can have multiple cores. And also remember the the term processor is commonly used to refer to a processor package in modern times. For example a core 2 quad processor is one package with two dies. Each die has two cores for 4 cores total in the package.


Integration
Microprocessor: All of the components of a processor integrated onto a single chip. A "cpu on a chip".
Microcontroller: A cpu, memory, and I/O all on a single chip.
APU (accelerated processing unit): A marketing term used by AMD to refer to a GPU and CPU on a single chip. Intel does this too but doesn't use the term APU, they just call them CPUs. It is a halfway point between a traditional microprocessor and an SoC.
GPU (graphics processing unit): A microprocessor designed for graphics processing.
IGP (integrated graphics processor): A GPU with no dedicated video memory. It is usually integrated into the chipset, northbridge, cpu package, or cpu die.
GMA (graphics media accelerator): A marketing term Intel used to refer to its IGPs for a long time, they no longer use this term.
SoC (system on a chip): An entire computing system (or almost an entire computing system) on a single chip. Common in small devices.
SIP (system in a package): Multiple dies that comprise the main components of a computer packaged together. The dies are stacked on top of each other vertically (which is called chip stacking).
MCM (multichip module): Multiple dies connected on a shared substrate inside a single package. They are mounted horizontally as opposed to chip stacking where they are mounted vertically.

Classification
ISA (instruction set architecture): A set of instructions that a processor can use. It provides the interface between programs (software) and processors (hardware). It can be thought of as the machine language of the hardware and is therefore commonly called "machine language". x86 is an example of this. A specific ISA may have many different microarchtectures.
Microarchitecture: A specific implementation of the ISA. This type of revision is a major change or complete overhaul of the processor from previous versions. Sometimes called a "core design". In modern multicore chips it refers to an overhaul of the core logic. As in different core counts, cache sizes, bus speeds, etc. don't count as microarchitecture changes. A specific microarchitecture may have many variants.
Family: Basically a brand name that processors are sold under. Core i7 or pentium 4 would be examples.
Variant: Also called a microarchitecture variant. It is a version of the microarchitecture that is fabricated (manufactured). A specific chip/die design. Differences between variants exist before binning. Different variants are manufactured with different die designs. They may make slight or moderate differences in core logic. Variants that belong to the same microarchitecture may have different cache sizes, core counts, steppings, etc. A variant does not have specific specs such as clock rate, that has not been decided yet at this stage of manufacturing. A single variant may produce many different models with different specs. A single variant may also have many different steppings.
Model: A specific processor with exact specs. They are produced by "binning" a variant. Binning involves disabling features of the processor and/or parts of the die and adjusting specs such as voltages and clock rates. The idea of binning is to produce many different processors with different specs from the same chip design. Models that belong to the same microarchitecture variant may have different cache sizes, core counts, steppings, etc.
Stepping: A specific version number for a processor/variant. Usually these are extremely minor revisions to the design of a processor. When the stepping is changed the specs do not change. A stepping may be used to fix a bug, optimize circuits or transistors (to enable higher clock rates), or reduce power consumption. As new steppings are released new models may be released to take advantage of the improvements that were made.
Die shrink: A version of a microarchitecture made with a new manufacturing process that enables it to be built with smaller transistors/circuits and therefore shrink the die to a smaller size. The microarchitecture name may remain the same or be changed depending on the naming conventions of the company.

Parallelism
CMP (chip multiprocessor): Packing multiple cpus/microprocessors/cores onto a single chip/die.
Multi-core processor: An IC package with multiple cpus. It may contain one die or more than one die.
Cores: A single cpu/microprocessor on a multi-core processor.
HT (hyperthreading): Intels marketing name for SMT. Also called HTT (hyperthreaded technology).
CMT (cluster multithreading): AMDs term for their unique implementation of SMT where some components of the core are shared and some are dedicated. AMD doesn't admit that this is a form of SMT though because their marketing department wants to differentiate it from SMT and market it as a separate feature that no one else has.
SMT (simultaneous multithreading): The ability of a cpu or core to processor multiple threads simultaneously. Intel calls it HT. With SMT multiple threads (usually 2) can share access to a cores resources.
SMP (symmetric multiprocessing): A system where multiple CPUs/microprocessors can operate in parallel. It can refer to either a multicore processor or multiple packaged ICs operating in parallel (such as xeons/opterons in different sockets). SMT refers to multiple threads being simultaneous processed by the same cpu while SMP refers to multiple threads being simultaneous processed by different cpus. It is possible (and quite common) for a system to implement both.
Multicore cpu: A term used by people who don't realize that a core is a cpu and that therefore having a multicore cpu is impossible and makes no sense. Unfortunately this has become the most commonly used term to refer to CMP packages so we often have to use this horrible term to communicate with people.
Physical cores: Physical processors (hardware) on a die that process instructions independently and simultaneously. These are cpus.
Hardware threads: A marketing term used by Intel (and now AMD as well) to show the number of threads that particular processor package can process simultaneously. The number of hardware threads represents the number of logical threads that can be run simultaneously at any one time.
Logical cores: A layer of abstraction created by the OS that acts as an interface between hardware threads and software threads.
Virtual cores: "Fake" cores hidden behind a layer of abstraction in a VM (virtual machine).
Program: A passive collection of instructions that fulfills some function. A program may have several instances running at once called processes.
Process: A single instance of a program. A process is the actual execution of a program. It may have one or more threads.
Multi-threading: The process of rewriting a program to use multiple software threads.
Logical threads: Software threads. The threads that software processes like dolphin.exe spawn. They are streams of instructions that can be assigned an affinity.
Thread affinity: The logical core that the thread is assigned to.
Thread: In a hardware context it refers to a hardware thread, in a software context it refers to a logical/software thread.
ILP (instruction level parallelism): The ability of a processor to simultaneously process multiple instructions from a single thread.
DLP (data level parallelism): The ability of a processor to process many data points with a single instruction. It is also called SIMD (single instruction multiple data), packed data, or vector processing. Multiple data points are packed together and act as a single input or output for the instruction. The instruction can process at least two data points from each input simultaneously.
TLP (thread level parallelism): The ability of a processor to process instructions from multiple threads in parallel. SMT, CMT, and SMP are all implementations of this.
MLP (memory level parallelism): The ability for a processor to have multiple outstanding (waiting to be completed) memory operations.
BLP (bit level parallelism): The ability for an instruction to process multiple bits simultaneously from a data point. All processors have this to some degree. A processor can be 4 bit, 8 bit, 16 bit, 32 bit, 64 bit, etc.

When you buy a cpu you are really buying a cpu package. What people commonly call a cpu or microprocessor is actually a cpu package. I rarely ever see the word being used correctly. How did this happen? Well it's fairly simple actually. For a long time time an IC package pretty much always had only one die. And that die contained a single micorprocessor/cpu. So for a long timer 1 package = 1 die = 1 cpu/microprocessor. After awhile consumers who didn't bother to learn the formal definitions of the terms started using them interchangeably. When cpu designers started cramming multiple cpus onto dies and multiple dies onto packages they came up with the term "core" to refer to each cpu in the system, another BS marketing term that unfortunately caught on extremely well. Consumers continued calling the packages cpus as they had always called them and simply used the word cores to refer to the real cpus instead.


Intel Microarchitectures
[Image: 1000px-IntelProcessorRoadmap-3.svg.png]

Sorted by microarchitecture:
-Ivy Bridge
-Sandy Bridge
-Nehalem / westmere
-Core / K10
-Piledriver
-Bulldozer
-K8
-Pentium M (P6 variant)
-Bobcat
-Netburst / Bonnell

Sorted by specific variant (microprocessor):

AMD Microarchitectures
-Todo

Platforms:
-HPC (can be subdivided into nearly a dozen categories)
-Enterprise (can be subdivided into nearly a dozen categories)
-Server (can be subdivided into nearly a dozen categories)
-Desktop
-Home Game Console (irrelevant, none can run dolphin)
-Laptop/Notebook
-Ultrabook
-Nettop
-Netbook
-Tablet (few x86 versions exist)
-MID/UMPC (few x86 versions exist)
-Portable Game Console (irrelevant, none can run dolphin)
-Smartphone (irrelevant none can run dolphin)
-Embedded (irrelevant none can run dolphin)

Ivy Bridge
Ivy bridge variants (server):
Ivy bridge variants (desktop):
Ivy bridge variants (mobile):



Sorted by specific variant (microprocessor):

References below, ignore everything below this line for now.
-----------------------------------------------------------------------------------------------------------------------------------
Performance at stock clocks by microarchitecture codename (consolidated version, only include common cpus):
Ivy Bridge
Sandy Bridge-E
Sandy Bridge Quad Core
Sandy Bridge Dual Core
Bloomfield/gulftown
Lyynfield
Clarkdale
Llano (A8, A6)
Phenom II
Core 2 Quad
Core 2 Duo
Athlon II (1MB per core)
Pentium Dual Core Wolfdale
Athlon II (512KB per core)
Celeron Wolfdale
Phenom/Athlon X2 7xxx
Sempron (K10)

Still not sure where to put these:
FX 6/8 Core (bulldozer)
FX Quad Core (bulldozer)

IPC (performance per clock):
Ivy Bridge Quad Core 8MB
Ivy Bridge Quad Core 6MB
Sandy Bridge-E 15MB
Sandy Bridge-E 12MB
Sandy Bridge-E 10MB
Sandy Bridge Quad Core 8MB
Sandy Bridge Quad Core 6MB
Sandy Bridge Dual Core 3MB
Sandy Bridge Dual Core 2MB
Gulftown/Bloomfield
Lyynfield
Clarkdale 4MB
Clarkdale 3MB
Clarkdale 2MB
Core 2 Extreme Quad Core 12MB
Core 2 Quad 12MB
Core 2 Extreme Quad Core 8MB
Core 2 Quad 8MB
Core 2 Quad 6MB
Core 2 Quad 4MB
Core 2 Quad 3MB
Core 2 Duo 6MB
Core 2 Extreme Dual Core 4MB
Core 2 Duo 4MB
Core 2 Duo 3MB
Core 2 Duo 2MB
Pentium Dual Core Wolfdale 2MB
Celeron Wolfdale 1MB
Pentium Dual Core Allendale 1MB
Celeron Allendale 512KB

AMD List (not complete yet):
A8 Quad Core 4MB L2 (llano)
A6 Quad Core 4MB L2 (llano)
A6 Triple Core 3MB L2 (llano)
Phenom II X6 6MB (Thuban)
Phenom II X4 6MB (Zosma)
Phenom II X4 6MB (Deneb)
Phenom II X3 6MB (Heka)
Phenom II X2 6MB (Callisto)
Phenom II X4 4MB (Deneb)
Phenom II X2 No L3 2MB L2(Regor)
Athlon II X2 No L3 2MB L2 (Regor)
A4 Dual Core 1MB L2 (llano)
Phenom II X4 No L3 2MB L2 (Propus)
Athlon II X4 No L3 2MB L2 (Zosma)
Athlon II X4 No L3 2MB L2 (Propus)
Athlon II X3 No L3 1.5MB L2 (Rana)
Athlon II X2 No L3 1MB L2 (Regor)
Sempron No L3 1MB L2 (Regor)
Phenom X4 2MB (Agena)
Phenom X3 2MB (Toliman)
Athlon X2 2MB 7xxx (K10, Kuma)
Athlon II No L3 1MB L2 (Sargas)
Sempron No L3 1MB L2 (Sargas)
Sempron No L3 512KB L2 (Sargas)

Temporary mixed list (needs sorting):
Ivy Bridge Quad Core 8MB
Ivy Bridge Quad Core 6MB
Sandy Bridge-E 15MB
Sandy Bridge-E 12MB
Sandy Bridge-E 10MB
Sandy Bridge Quad Core 8MB
Sandy Bridge Quad Core 6MB
Sandy Bridge Dual Core 3MB
Sandy Bridge Dual Core 2MB
Gulftown/Bloomfield
Lyynfield
Clarkdale 4MB
Clarkdale 3MB
Clarkdale 2MB
Core 2 Extreme Quad Core 12MB
Core 2 Quad 12MB
Core 2 Extreme Quad Core 8MB
Core 2 Quad 8MB
Core 2 Quad 6MB
A8 Quad Core 4MB L2 (llano)
A6 Quad Core 4MB L2 (llano)
A6 Triple Core 3MB L2 (llano)
Core 2 Quad 4MB
Core 2 Quad 3MB
Core 2 Duo 6MB
Core 2 Extreme Dual Core 4MB
Core 2 Duo 4MB
Core 2 Duo 3MB
Core 2 Duo 2MB
Phenom II X6 6MB (Thuban)
Phenom II X4 6MB (Zosma)
Phenom II X4 6MB (Deneb)
Phenom II X3 6MB (Heka)
Phenom II X2 6MB (Callisto)
Phenom II X4 4MB (Deneb)
Pentium Dual Core Wolfdale 2MB
Phenom II X2 No L3 2MB L2(Regor)
Athlon II X2 No L3 2MB L2 (Regor)
Celeron Wolfdale 1MB
A4 Dual Core 1MB L2 (llano)
Pentium Dual Core Allendale 1MB
Phenom II X4 No L3 2MB L2 (Propus)
Athlon II X4 No L3 2MB L2 (Zosma)
Athlon II X4 No L3 2MB L2 (Propus)
Athlon II X3 No L3 1.5MB L2 (Rana)
Athlon II X2 No L3 1MB L2 (Regor)
Sempron No L3 1MB L2 (Regor)
Phenom X4 2MB (Agena)
Phenom X3 2MB (Toliman)
Athlon X2 2MB 7xxx (K10, Kuma)
Celeron Allendale 512KB
Athlon II No L3 1MB L2 (Sargas)
Sempron No L3 1MB L2 (Sargas)
Sempron No L3 512KB L2 (Sargas)

AM2 vs. AM2+ vs. AM3 vs. AM3+ also affects performance.

The MB number refers to LLC (last level cache). This is just in the last 6 years from intels dekstop market.

Total performance at stock clocks:

Edit: To be continued. Don't worry SS, I'll move this when it's done I swear. I'll also write a consolidated version.
To-do: Add fx series cpus. Sort AMD and intel cpus together. Don't bother listing athlon X2 or pentium D since they're way below the requirements of most games running on dolphin.
I'm debating whether or not to continue working on this. So I decided to let the public decide and make a poll.

The purpose of this thread/post would be to provide a more detailed hierarchy of cpus based on their performance with dolphin. Sorting them by microarchitecture and microprocessor codenames instead of by product brand names.
This will come in handy, it's worth having here.
Yes, please, do this Smile
Yeah it looks like it will be very useful when completed
Please finish this. It's needed.
Your Dolphin hierarchy thread is too general, as I've said before, we need to compensate that and should be able to refer those who're capable of understanding to this thread.

If you need help, feel free to ask.
And we really should have an Ivy Bridge rate column so that we know whether or not to buy a 3GHz locked Ivy processor, or a unlocked AMD one which would OC to 4GHz but no further.
(08-11-2012, 01:17 AM)AnyOldName3 Wrote: [ -> ]And we really should have an Ivy Bridge rate column so that we know whether or not to buy a 3GHz locked Ivy processor, or a unlocked AMD one which would OC to 4GHz but no further.

I am pretty sure locked ivy is better then any amd at 4 ghz.
In fact,I think even sandy bridge locked such as i5 2400 is better then amd at 4 ghz.
Amd can reach i5 2400 perfomance around 4.2 ghz..
So to reach locked ivy it would require atleast 4.6-4.7 ghz...
Since overclocking that high is not easy,and requires additional money spent on high quality coolers,not to mention the fact you might get unlucky and get the chip which just doesnt overclock well,its clear which choice is better,when talking i5 2400 or i5 3450 against amd...
That was a bit of an exaggeration on my part, but it would be nice to have something saying how high you'd need to OC something to be like 3GHz of Ivy. We'd be able to direct a lot of questions here if it did.
1- It would be interesting to make a separate thread about overclocking performances. A good add-on to this excellent thread
2- Excellent thread but I've just mentioned it Wink
3- Who dared to say "No" in the poll Tongue?
Pages: 1 2