Dolphin, the GameCube and Wii emulator - Forums

Full Version: Testers wanted - gx-optimization branch
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5
I wasn't actually arguing that again, I was just explaining how we got to be talking about faking. Smile
Starscream Wrote:I don't think he was actually saying someone was faking. I was saying that an actual game-shot was a better method of testing and then someone mentioned the word "faked" (you). It was all hypothetical talk after that. From what I read on IRC at the time and from my own testing, I'm pretty sure everyone agreed that any speedup with this was either imaginary or the fault of bad testing. That's probably why I was having a hard time believing his speedup claims.

Give this man a cookie.

delroth Wrote:It is not, the statistical significance of a single datapoint is near zero. Testing that way actually encourages bias by making people try to choose the datapoint that matches the best with what they think (which differs from faking results: the datapoint is still real but it could be over the 80-90th percentile for example).

I'm surprised by this speedup too (and still have a hard time believing it) but his measurements are most likely correct and definitely more credible than a screenshot for anyone who knows about stats. I'm just not sure if he's measuring the right thing (settings issue, warm cache, everything I said in my previous post basically).

Perfectly valid points. However we still don't know anything about his testing methods other than that he used fraps. We are meant to assume that he somehow got his character to move through the exact same area in the exact same way. If he didn't then his results are automatically invalid. I can't even think of how he would do that without creating a complex custom script to produce the same input. At least with screenshots you can easily tell if you're in the same spot and facing the same direction.

Also how do we know that fraps fps measurements are as accurate as dolphins fps measurement?

And there is something else I forgot to consider. FPS doesn't always line up with gamespeed perfectly in dolphin.

As it stands there are just so many problems with this test and the information presented (or lack thereof). I for one would like a screenshot comparison to at least provide a second source of data to increase the reliability of the test(s). They're not that hard to do and don't take very long.
Quote: We are meant to assume that he somehow got his character to move through the exact same area in the exact same way. If he didn't then his results are automatically invalid. I can't even think of how he would do that without creating a complex custom script to produce the same input. At least with screenshots you can easily tell if you're in the same spot and facing the same direction.
Dolphin has such a feature built in.

Though he did not use it. Unless he just ran through the title screen with no input, i would indeed consider his tests invalid for that reason.

Quote:Also how do we know that fraps fps measurements are as accurate as dolphins fps measurement?
I'm still a fan of of ignoring the fps counter, and just dividing the total frames played by the amount of time it ran for.
You guys are ridiculous. Let me explain it to you:

First of all, I used Fraps to do the benchmark because it is a well-known program that 99% of the time just works and produces accurate results. Fraps, as you may or may not know, has a benchmarking-feature, where it will record the fps over a user-defined period of time and also output the total number of frames displayed in that timespan. For my tests I used 60 seconds, because that is the max for free users. To get more accurate results I benchmarked each of the two stages I tested (the graphic-heavy one [Delfino Plaza], and the slim one [Final Destination]) TWO times, (actually four times, but I lost my results because I am stupid^^) in each tested Dolphin version (recent master build at that time vs. gx-optimization build). As you can see from the results the fps of the gx-opt-build where significantly higher (at least I'm pretty sure they are, anyone can statistically analyze them?) than the fps in master, both the average and the max, as a
consequence the number of frames displayed in the 60 seconds also was higher. I can also say that it felt a lot faster, in the Delfino Plaza Stage at least.

Also saying that my results are automatically invalid because it isn't possible, or at least feasible, to make the game behave in the exact same manner each time I benchmark is just ridiculous (@NaturalViolence, and there I thought you were one of the smart guys). How the hell do you think biologists work? No fucking biological system behaves the same all the time, still they get valid results from their studies and can build up on that knowledge. I'm not even going to further explain myself regarding this point, though I will add that I admit that I could have run more benchmarks to be perfectly sure, but as I said, I already ran four of them per stage, per build, which makes 16 benchmarks, all showing significantly better results for the gx-optimizations build.

One more thing, yes, I could've just benchmarked the title screen, or just let the characters stand around and benchmark that, but I thought it would be more useful to have some actual results from typical gameplay.

Have a nice day,
dEnigma

P.S. Of course I used the exact same settings, before someone brings that up. Also I didn't check if I repeated myself, so maybe I already said some of this in an earlier post, sorry for that.
If you run many benchmarks it can help negate the significance of variations within those benchmarks (in this case the fact that you're not completing it the exact same way). However you did not originally state that you took 16 benchmarks and there is nothing in your results to suggest that you did. So what did you expect from us? Now you test is a lot more credible.
Quote:Also saying that my results are automatically invalid because it isn't possible, or at least feasible, to make the game behave in the exact same manner each time I benchmark is just ridiculous
The feature it built right into dolphin. It's pretty straight forward.
I will test this on my Linux x64 host sometime this weekend. Thanks for throwing this out there!
This branch has been merged to master one month ago.
(07-13-2012, 03:04 AM)NaturalViolence Wrote: [ -> ]If you run many benchmarks it can help negate the significance of variations within those benchmarks (in this case the fact that you're not completing it the exact same way). However you did not originally state that you took 16 benchmarks and there is nothing in your results to suggest that you did. So what did you expect from us? Now you test is a lot more credible.

Well, yeah, now that I look at my previous posts it looks like I wasn't very clear about that, though I posted two benchmarks per build and said that those were not all of them. I was just upset about people demanding screenshots, when I gave them tabularized benchmark results. And yeah, I forgot to post the results of the "Final Destination" benchmarks as well, I don't know why. Anyway, the changes have already been merged into master and we all can be happy now Wink

(07-14-2012, 04:07 AM)gjfklhg Wrote: [ -> ]
Quote:Also saying that my results are automatically invalid because it isn't possible, or at least feasible, to make the game behave in the exact same manner each time I benchmark is just ridiculous
The feature it built right into dolphin. It's pretty straight forward.

Well, it wasn't me who said it was hard to do, though I never tried it^^

(07-15-2012, 03:03 AM)dEnigma Wrote: [ -> ]
(07-14-2012, 04:07 AM)gjfklhg Wrote: [ -> ]
Quote:Also saying that my results are automatically invalid because it isn't possible, or at least feasible, to make the game behave in the exact same manner each time I benchmark is just ridiculous
The feature it built right into dolphin. It's pretty straight forward.
Well, it wasn't me who said it was hard to do, though I never tried it^^
Err, you said
dEnigma Wrote:because it isn't possible, or at least feasible
All it takes is two clicks before you start the game up each time (actually, that's less work than playing through yourself for each test), and to run through a few minutes of the game, and you'll have perfectly apples to apples comparisons. That is neither impossible, nor infeasible.
Pages: 1 2 3 4 5