Dolphin, the GameCube and Wii emulator - Forums

Full Version: im trying to show how under rated the fx 8150 is
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
About to put up a 15 minute video of me changing all sorts of settings.
http://www.youtube.com/watch?v=hQZZ54taJSc&feature=youtu.be
http://www.youtube.com/watch?v=ofWvbmMhHhY&feature=youtu.be
I still think my settings could be better
(06-15-2012, 10:51 PM)d3monr3no Wrote: [ -> ]http://www.youtube.com/watch?v=5Ja4pF9t9Pk&feature=youtu.be

Not to be mean, but my processor is clock-locked at 2.8 ghz and I still get better performance than you. ( Identical to rpglord's performance with the LLE vbeam hack, really.)
Roughly 70-80%, otherwise.
my average is 58FPS Without LLE on and I just checked to make sure with fraps. And fraps slows it down too a little.
do it with SMG2 and post video
That game can't be hard to run on LLE no way.
my sound on the next is echos because of my mic
I have also shown that I am running stock, Don't really know how to overclocked safely nor do I want to!

yes I am using camtasia to record I used fraps bench marker thing though Ill email you the files you can read them yourself.
I only use fraps to record fullscreen.
http://www.youtube.com/watch?v=Z58sR4-p-fc&feature=youtu.be
remember I cant run it at all yeah right just did and a Video to prove it

ok maybe I take it back LLE does sound a little better but i go between 73%-100% back and for usually 80-93% on average though in SMG2.
d3monr3no Wrote:yes I am using camtasia to record I used fraps bench marker thing though Ill email you the files you can read them yourself.

Why on earth would you do that?

Squall Leanhart Wrote:LLE is the purest example of what Raw processing performance provides.

the FX8150 just really sucks at Raw processing

What exactly is "raw processing"?

You mean compute intensive processing as in lots of brute arithmetic and very few load/store?
(06-16-2012, 01:46 AM)NaturalViolence Wrote: [ -> ]
d3monr3no Wrote:yes I am using camtasia to record I used fraps bench marker thing though Ill email you the files you can read them yourself.

Why on earth would you do that?

Squall Leanhart Wrote:LLE is the purest example of what Raw processing performance provides.

the FX8150 just really sucks at Raw processing

What exactly is "raw processing"?

You mean compute intensive processing as in lots of brute arithmetic and very few load/store?

Raw performance is generally throwing unoptimised (generally without instruction set optimisations) code at a cpu and having it beat the shit out of a crappy bulldozer ;D
stmok;14447096 Wrote:1st generation FX (Bulldozer) performance isn't due to a single specific thing...And it turns out that its not all because of a long pipeline. Its due to a small number of things that just happen to really matter. They accumulate into the resulting uninspired performance against Intel's line.

The problems (looking at Anandtech's article), are:

L1 cache associativity is too low for two threads!

=> Intel's Sandy Bridge...
* 8-way 32KB of L1 (data) cache.
* 8-way 32KB of L1 (instruction) cache.

=> AMD's Bulldozer...
* 4-way 16KB of L1 (data) cache per integer core.
* 2-way 64KB of L1 (instruction) cache per module.


L2 cache latency is too high for desktop workloads

=> The obvious reason is because the Bulldozer architecture itself was primarily aimed for the server market. Where cache latency matters less because of the heavily threaded nature of server scenarios/workloads.
=> Desktop workloads are more often lightly threaded. Low latency matters more under this scenario.


High branch mis-prediction penalty

=> Intel alleviates this problem with their micro-op cache in Sandy Bridge.
=> Bulldozer does not have micro-op cache.
=> See block diagrams here: http://www.realworldtech.com/page.cfm?ArticleID=RWT091810191937&p=4
...Result? Sandy Bridge suffers from lower mis-predictions and thus, offers better performance.


Yield issues

=> At the time, GlobalFoundries was transitioning to 32nm process.
=> AMD wasn't able to clock Bulldozer higher than planned.


If AMD can address these issues properly with each successive generation, one will see things improve for their side in due time...However, don't expect the upcoming Piledriver-based CPUs and APUs to fix these major issues. (They'll alleviate some). The main reason is because they're getting their engineers to focus on Steamroller architecture (2013) and newer.

...Its pretty obvious when you see that they aren't going to really put a serious effort in for 2012. ie: Minor update to buy their engineering folks time, but nothing major. Re-use existing socket formats and chipset infrastructure as much as possible. Minimise engineering resources on short term lines, focus the bulk of engineering work for 2013 lines. (Their marketing will switch to overdrive to compensate for the lack of new gear in 2012.)

Fundamentally, the plan is this: Bare the short term pain, in order to reap rewards in the long term.

mAJORD;14447764 Wrote:Actually, no. Based on Mispredict penalty, the pipeline(s) aren't much longer than Intel's Nehalem / SB.

The P4 on the other hand had a pipeline depth nearly 3 times deeper than it's predacessor (p6) and AMD's K7 at the time. This was even longer for prescott.

The motivations are entirely different, and as mentioned above there are other comprimises to boost clockspeed that effect IPC, and on that matter - when you talk of IPC, it's importannt to differentiate between when the chip is fully utilised, and not.

If you look at total throughput, it's not far behing SB at all in IPC. 4 Module vs 4 Core) , where it falls over is in power consumption, especially at high clocks, thanks mostly to a shit process.

The other problem here is that Bulldozer should have an advantage when fully utilised, at least for integer workloads, unfortunatly it hasn't quite been realised.

The higher clockspeed headroom is only required for lightly threaded loads, where it does suffer in IPC. this is why it has such high boost speeds. These should have been be even higher though.. remember the talk of 1Ghz boost? It doesn't reach these frequencies due to thermal limitations more than anything. So this hasn't been realised either.

So in the end it's a combination of minor failures, and decisions that add up to the perf deficiency we see in the final product..

1. Architecture missing design goals
2. Architecuture tuned for for Niche Server/highly threaded workloads
2. manufacturing process missing Frequency goals

There were a surprigin amount of people defending G'f 32nm when it launched last yr, but IMO it's been a total disaster, and the evidence of which has been there ever since Llano launched (but ignored with a whole bunch of exuses from people) , with a proven, known existing architecture, failing to perform as well as its 45nm predecessor.

The fact is that next to the outgoing 45nm node it has:

* Zero power consumption advantages at higher clockspeeds (>3Ghz)
* lower clockspeed headroom on a given uarch

and this is still the case a yr later

itsmydamnation;14448290 Wrote:it is no where near the length of the high clocked presscots even if you factoring in its trace cache let alone a complete miss.

Having long pipes doesn't hurt throughput ( look at power 7 z196 etc)

lots of people have done lots of work trying to figure out why bulldozer is weaker then what it should be, no one has come up with anything thats massively wrong but theres lots of little things.

no op/trace cache means bad branches hurt
the 2 way associtative cache for the L1I cant feed two threads
the L2 is slowish but doesn't seem to have any major impact
2 ALU's does hurt peak IPC/latency
main memory latency is oddly high, so if the predictors/prefecters get it wrong theres almost a 20ns penalty vs Intel.
store bandwdith on the FPU doesn't allow for 1 core to have data returned from all FPU resouces in a cycle.

remember that bulldozer was going to have higher IPC then STAR's, the modeling done to determine that is quite accuurate so its more likely implementation issues.


the fact AMD are moving to bulk should tell you there not happy with SOI yet bobcat on TSMC bulk came 6 months ahead of time and for its niche is an absolute gun(but getting very long in the tooth now).

Bulldozer looks pretty weak. My Llano laptop will outperform it clock for clock. Smile
(06-16-2012, 11:07 PM)Starscream Wrote: [ -> ]Bulldozer looks pretty weak. My Llano laptop will outperform it clock for clock. Smile

I think I read that Llano is 10% faster then Phenom XD
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21