I did a little testing with Zelda the Wind Waker (didn't have time to test more games), here are my results:
Comparison of the average FPS between a GLSL build and a Master build, with both my Intel HD 3000 and my Nvidia GT 630m:
At first I wanted to record a dtm-movie and then use it as a benchmark but Dolphin turned out to be not deterministic enough to do that reliably, so I just tried to make the exact same movements in every benchmark, maybe later I will do a benchmark without any user input, e.g. the main menu. For this benchmark I used the Dragon Roost Dungeon because it is hard on my GPU.
As you can see in the chart the master build is always faster than the GLSL with the corresponding GPU. The Intel HD 3000 is especially slow in the GLSL build, but who cares about Intel GPUs. The difference with the Nvidia GPU isn't that big, and maybe some more testing will reveal it to be non-existant, I'll do that later.
I also discovered that the GLSL build fixes
Issue 3313, but only for the Intel HD 3000, here are some screenshots (water shouldn't have pink highlights):
1. GLSL Master with INTEL HD 3000
2. GLSL with Nvidia GT 630m
3. Master with INTEL HD 3000 (you can see that GLSL also fixed some Intel HD specific issues)
4. Master with Nvidia GT 630m
Specs:
OS:Windows 7 64bit (oh yeah, I used the 64bit-builds)
GPU: Nvidia GT 630m and Intel HD 3000
Driver Nvidia: 314.07
Driver Intel: 15.28.12.64.2932
edit: I just saw that the GLSL build produced hundreds of 'p_*.txt' and 'ps_*.txt' files, they all seem to start with "No errors.". I'm guessing that is not the intended behaviour, or is it?
edit2: This only happens when I use the Intel HD GPU. Here is an example:
No errors.
#version 140
#extension GL_ARB_texture_rectangle : enable
// ubo disabled
#define ATTRIN in
#define ATTROUT out
#define VARYIN in
#define VARYOUT out
#define float2 vec2
#define float3 vec3
#define float4 vec4
#define frac(x) fract(x)
#define saturate(x) clamp(x, 0.0f, 1.0f)
#define lerp(x, y, z) mix(x, y, z)
//Pixel Shader for TEV stages
//1 TEV stages, 0 texgens, XXX IND stages
float fmod( float x, float y )
{
float z = fract( abs( x / y) ) * abs( y );
return (x < 0) ? -z : z;
}
uniform sampler2D samp0;
uniform sampler2D samp1;
uniform sampler2D samp2;
uniform sampler2D samp3;
uniform sampler2D samp4;
uniform sampler2D samp5;
uniform sampler2D samp6;
uniform sampler2D samp7;
layout(std140) uniform PSBlock {
float4 color[4] ;
float4 k[4] ;
float4 alphaRef[1] ;
float4 texdim[8] ;
float4 czbias[2] ;
float4 cindscale[2] ;
float4 cindmtx[6] ;
float4 cfog[3] ;
float4 cPLights[40] ;
float4 cPmtrl[4] ;
};
out float4 ocol0;
float depth;
float4 rawpos = gl_FragCoord;
VARYIN float4 colors_02;
VARYIN float4 colors_12;
float4 colors_0 = colors_02;
float4 colors_1 = colors_12;
VARYIN float4 clipPos_2;
float4 clipPos = clipPos_2;
void main()
{
float4 c0 = color[1], c1 = color[2], c2 = color[3], prev = float4(0.0f, 0.0f, 0.0f, 0.0f), textemp = float4(0.0f, 0.0f, 0.0f, 0.0f), rastemp = float4(0.0f, 0.0f, 0.0f, 0.0f), konsttemp = float4(0.0f, 0.0f, 0.0f, 0.0f);
float3 comp16 = float3(1.0f, 255.0f, 0.0f), comp24 = float3(1.0f, 255.0f, 255.0f*255.0f);
float alphabump=0.0f;
float3 tevcoord=float3(0.0f, 0.0f, 0.0f);
float2 wrappedcoord=float2(0.0f,0.0f), tempcoord=float2(0.0f,0.0f);
float4 cc0=float4(0.0f,0.0f,0.0f,0.0f), cc1=float4(0.0f,0.0f,0.0f,0.0f);
float4 cc2=float4(0.0f,0.0f,0.0f,0.0f), cprev=float4(0.0f,0.0f,0.0f,0.0f);
float4 crastemp=float4(0.0f,0.0f,0.0f,0.0f),ckonsttemp=float4(0.0f,0.0f,0.0f,0.0f);
clipPos = float4(rawpos.x, rawpos.y, clipPos.z, clipPos.w);
float3 uv0 = float3(0.0f, 0.0f, 0.0f);
// TEV stage 0
rastemp = colors_0.rgba;
crastemp = frac(rastemp * (255.0f/256.0f)) * (256.0f/255.0f);
textemp = float4(1.0f, 1.0f, 1.0f, 1.0f);
// color combine
prev.rgb = saturate(float3(1.0f, 1.0f, 1.0f)+float3(0.0f, 0.0f, 0.0f));
// alpha combine
prev.a = saturate(rastemp.a+float4(0.0f, 0.0f, 0.0f, 0.0f).a);
// TEV done
float zCoord = czbias[1].x + (clipPos.z / clipPos.w) * czbias[1].y;
depth = zCoord;
ocol0 = prev;
gl_FragDepth = depth;
}
edit3: Totally forgot to test the hacked buffer option, here are my results:
I created a dtm-file of the Zelda WW intro and then used that as my benchmark. Sadly "Pause at end of movie" doesn't seem to work for me, so it still isn't quite perfect and I suspect that the Nvidia GPU might be limited by my CPU in this test, but you can't have everything. Amazingly the Intel HD 3000 performs faster in GLSL +hacked buffer option than in master and still doesn't show the weird issues that it has in master builds, very nice work. The nvidia benchmarks were repeated once, since a single run was over way too fast to be representative.
edit4: I used the built-in "Log FPS to file"-option for every benchmark