Dolphin, the GameCube and Wii emulator - Forums

Full Version: Bulldozer vs. Sandybridge
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Now that I've spent two days pouring over technical details, reviews, and some user analysis of bulldozer I think I've identified all of the major problems with the chips (at least the ones that can be directly observed, obviously some problems may exist at the low level that only the engineers who worked on the chip know about).

1. Cache Thrashing (cores competing for access to cache resources)
Bulldozer has an L2 cache which is shared by a pair of cores while sandy bridge has a dedicated L2 cache for each core. The two cores can sometimes compete for access to the same data in the cache. Some users tests have already shown that disabling every other core on a bulldozer chip can sometimes actually IMPROVE performance for software that is highly multithreaded.

2. Pipeline Flushes (poor branch prediction + long pipeline gives a big penalty)
Bulldozer has a much longer pipeline than deneb/thuban (phenom II) or sandy bridge. Reviews have shown that although AMD promised better branch prediction in bulldozer over its predecessor by decoupling the branch prediction from the pipeline the branch prediction in bulldozer is actually worse than deneb/thuban. Since bulldozer has a longer pipeline these branch misfires (I think that's the word for it) have a much higher performance penalty than in other chips.

3. Only 4 floating point/SIMD units
This isn't so much of a problem as it is a lack of an advantage. Despite having 8 "cores" bulldozer has the same number of FPUs as sandy bridge. This means it has no potential performance gains for x87/SSE applications.

4. Extremely high cache latency/lower cache bandwidth
Cache performance in bulldozer is VERY poor. Bandwidth is lower than the competition and deneb/thuban. Latencies are slightly higher than deneb/thuban and MUCH higher than sandy bridge. L3 cache latency in particular is abysmal. Cache latencies were raised in order to improve yields.

5. Decode/dispatch not able to keep the execution pipelines utilized
It appears that bulldozer is not able to keep the functional units in the execution pipelines occupied. Functional units are being under utilized and poor decode/dispatch may be playing a role in this.

6. Reduced functional units per core
Not really much of a problem since bulldozer isn't even doing a good job at keeping the functional units it does have occupied with work. However I should point out that in deneb/thuban each core has three ALUs, in sandy bridge each core has four ALUs, and in bulldozer each core only has 2 ALUs, less than its predecessor.

7. Clock rates are lower than expected (4.5GHz with turbo core up to 5.0GHz was expected by AMD)
Clock rates were lowered to well below expectations in order to boost yields.

8. Applications and OS not properly designed for the new threading system
Performance for bulldozer on windows 8 appears to be significantly better than windows 7 due to optimizations for bulldozers unique threading system. The problem is that right now multithreaded applications running on bulldozer are distributing their threads in order of available cores, this means that two threads will end up running on the same module rather than two separate modules. Windows 8 fixes this by treating it like a CPU with HT, so threads will be distributed to different modules instead of just different cores until all of the modules are occupied. Luckily dolphin is already set up to do this but in the meantime this is a serious bummer for AMD until windows 8 becomes common (which won't happen for a very long time).

Clock rates were lowered and cache latency was raised in response to very poor yields. Poor yields were a combined result of the ridiculous die size/transistor count and the poor quality 32nm fabrication at global foundries.

Piledriver (2012) is expected to have higher clock rates and better cache latency as yields improve. New instructions will be added, a 5th module will be added (10 cores), power consumption will be reduced, and undisclosed IPC improvements will be made (10-15% according to AMD). We can only hope that this is true, otherwise AMD is in deep shit.

If it wasn't for the additional cores and memory/HT bandwidth bulldozer would actually be worse than its predecessor (deneb/tuban) in every way. The biggest problems seem to be that the chip is too big, the pipeline is too long, the branch prediction is poor, and the yields are poor (resulting in lower clock rates and higher cache latency as well as higher production costs). Performance for dolphin on bulldozer will no doubt be significantly worse than on deneb/thuban (phenom II).
(10-14-2011, 12:39 PM)NaturalViolence Wrote: [ -> ]Piledriver (2012) is expected to have higher clock rates and better cache latency as yields improve. New instructions will be added, a 5th module will be added (10 cores), power consumption will be reduced, and undisclosed IPC improvements will be made (10-15% according to AMD). We can only hope that this is true, otherwise AMD is in deep shit.

Weren't them saying wonders about bulldozer like... a month ago? meh.
Quote:Weren't them saying wonders about bulldozer like... a month ago? meh.

This is why skepticism is important.
Runo Wrote:Is it enough to make this difference?
Some people mourned the loss of cheap and highly overclockable CPUs when SNB showed up. For gaming where GPU matters more it allows for a good budget rig. $85 AXPs kinda got AMD through those dark times but selling 315mm dies for <$200 can't be very profitable. Hopefully the server sales are good enough to offset it.
Quote:1. Cache Thrashing (cores competing for access to cache resources)
The shared L1 instruction $ probably hurts more than the L2$. Core2 had a shared L2 and didn't have any problems.
TSMC is skipping to 28nm which probably makes the not so easy task switching fabs even harder. That's going to be shared with GPUs so supply is going to even worse.
Analysts: AMD is becoming irrelevant in microprocessor market:

http://translate.google.se/translate?sl=sv&tl=en&js=n&prev=_t&hl=sv&ie=UTF-8&layout=2&eotf=1&u=http%3A%2F%2Fwww.sweclockers.com%2Fnyhet%2F14601-analytiker-amd-pa-vag-att-bli-irrelevant-pa-processormarknaden

I don't really care for AMD nor their CPU's so for me they're already irrelevant.
You start to wonder if AMD is somehow doomed or what.
It's not looking good at all, at least that's what I think.
(10-18-2011, 12:48 AM)Maverick Hunter X Wrote: [ -> ]You start to wonder if AMD is somehow doomed or what.
It's not looking good at all, at least that's what I think.

I hope AMD doesn't die, not because I like them (or hate them, it's just that Intel is simply better), but because if they die Intel can jack up their prices because there is no competition.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24