Code bug report (based on 3.0-735)

08-05-2012, 01:27 AM #1

I try optimize source code for experimental from 2 years ago, but found some bug in code. Someone dolphin developer, please check.

================================
[1] Lucid bug
================================
(1) Line 442 in OpcodeDecoding.cpp:
*DataReadU32xFuncs = *DataReadU32xFuncs_SSSE3;
Replace to:
for (int i = 0; i < 16; ++i) DataReadU32xFuncs[i] = DataReadU32xFuncs_SSSE3[i];

--------------------------------
(2) Line 443 in JitAsmCommon.cpp:
MOVD_xmm(XMM0, M(&psTemp[0]));
Replace to:
MOVD_xmm(XMM0, R(EAX));

--------------------------------
(3) Line 124 in DSPLLE.cpp:
Common::SetCurrentThreadAffinity(1 << core_id);
Replace to:
Common::SetCurrentThreadAffinity(1 << (core_id - 1));

--------------------------------
(4) Line 152 in VertexLoader_Color.cpp
const u8 *iAddress = cached_arraybases[ARRAY_COLOR+colIndex] + (Index * arraystrides[ARRAY_COLOR]+colIndex);
Replace to:
const u8 *iAddress = cached_arraybases[ARRAY_COLOR+colIndex] + (Index * arraystrides[ARRAY_COLOR+colIndex]);

================================
[2] Probably bug (careless miss?)
================================
(1) Line 287 in Jit_Util.cpp:
if (false && cpu_info.bSSSE3) {
This line must move to line 272.

--------------------------------
(2) Line 142 in Jit_LoadStore.cpp:
gpr.Flush(FLUSH_ALL);
This line can delete.

================================
[3] Might solve bug (Expected to speed up)
================================
(1) Jit64::ps_sel() function in Jit_Paired.cpp: (I confirmed fix 'sengoku musou 3', more need check at anothor game.)
Replace to:
void Jit64::ps_sel(UGeckoInstruction inst)
{
INSTRUCTION_START
JITDISABLE(Paired)

if (inst.Rc) {
Default(inst); return;
}
int d = inst.FD;
int a = inst.FA;
int b = inst.FB;
int c = inst.FC;
fpr.Lock(a, b, c, d);
fpr.BindToRegister(d, d==b || d==c || d==a, true);
MOVAPD(XMM1, fpr.R(a));
XORPD(XMM0, R(XMM0));
CMPPD(XMM1, R(XMM0), 1); //less-than = 111111
MOVAPD(XMM0, R(XMM1));
ANDPD(XMM1, fpr.R(b));
ANDNPD(XMM0, fpr.R©);
ORPD(XMM0, R(XMM1));
MOVAPD(fpr.RX(d), R(XMM0));
fpr.UnlockAll();
}

--------------------------------
(2) Jit64::psq_l() function in Jit_LoadStorePaired.cpp: (Need check just in case. Already JitIL is implimented.)
Line 174 insert to:
if (inst.W) OR(32, R(EDX), Imm8(8));

Line 151-156 comment out.

================================
[4] Do not know code
================================
(1) 'min_filter' is defined as 3 bit field in BPMemory.h. but '.min_filter == 8' and '.min_filter != 8' in some VertexManager.cpp and Render.cpp.

================================
[5] Simple optimize hint for a bit speedup (Free gift. Some pickup from my optimized code.)
================================
(1) Line 134 in x64Emitter.h:
bool IsImm() const {return scale == SCALE_IMM8 || scale == SCALE_IMM16 || scale == SCALE_IMM32 || scale == SCALE_IMM64;}
Replace to:
bool IsImm() const {return (scale & 0xfc) == 0xf0; }

--------------------------------
(2) LoadBPReg() function in BPMemory.cpp replace to:
void LoadBPReg(u32 value0)
{
//handle the mask register
int opcode = value0 >> 24;
int oldval = ((u32*)&bpmem)[opcode];
int newval = oldval ^ ((value0 ^ oldval) & bpmem.bpMask); // (oldval & ~bpmem.bpMask) | (value0 & bpmem.bpMask);

if (opcode != 0xFE) {
//reset the mask register
bpmem.bpMask = 0xFFFFFF;
int changes = (oldval ^ newval) & 0xFFFFFF;
BPCmd bp = {opcode, changes, newval};
BPWritten(bp);
} else {
bpmem.bpMask = newval;
}
}

--------------------------------
(3) Matrix44::Multiply() function in MathUtil.cpp replace to:
inline void MatrixMul4(const float *a, const float *b, float *result)
{
const __m128 b0 = _mm_load_ps(b + 0);
const __m128 b1 = _mm_load_ps(b + 4);
const __m128 b2 = _mm_load_ps(b + 8);
const __m128 b3 = _mm_load_ps(b + 12);

for (int i = 0; i < 4; ++i) {
__m128 a_ = _mm_load_ps(a + i*4);

__m128 a0 = _mm_shuffle_ps(a_, a_, _MM_SHUFFLE(0, 0, 0, 0));
__m128 a1 = _mm_shuffle_ps(a_, a_, _MM_SHUFFLE(1, 1, 1, 1));
__m128 a2 = _mm_shuffle_ps(a_, a_, _MM_SHUFFLE(2, 2, 2, 2));
__m128 a3 = _mm_shuffle_ps(a_, a_, _MM_SHUFFLE(3, 3, 3, 3));

a0 = _mm_mul_ps(a0, b0);
a1 = _mm_mul_ps(a1, b1);
a2 = _mm_mul_ps(a2, b2);
a3 = _mm_mul_ps(a3, b3);

_mm_store_ps(result + i*4, _mm_add_ps(_mm_add_ps(a0, a1), _mm_add_ps(a2, a3)));
}
}
void Matrix44::Multiply(const Matrix44 &a, const Matrix44 &b, Matrix44 &result)
{
MatrixMul4(a.data, b.data, result.data);
}

And, Line 198 in MathUtil.h replace to:
float GC_ALIGNED16(data[16]);

================================
This report is created based on 'dolphin 3.0-735'.
If I found anothor bug, report again.
Was this report write to google code 'dolphin-emu issues' better?
If you have question or impressions, please reply to this by 'simple english'.

Sorry my bad english from japan.

**delroth** · 08-05-2012, 01:55 AM #2

Thanks a lot for your work! I will take some time today to confirm the bug fixes you have found. I will also add the performance improvement changes to a new Dolphin branch so people can test it and report issues you may not have found.

**delroth** · 08-05-2012, 04:36 AM #3

Most of these issues seem valid (but they are not happening with common Dolphin builds which are compiled without SSSE3 support). I'll commit the fixes and credit you for the work, thank you!

**neobrain** · 08-05-2012, 06:04 AM #4

The min_filter stuff possibly gave me a good hint at that annoying mipmapping issue in kirby air ride and mario golf. Thanks a bunch Smile

**ExtremeDude2** · 08-05-2012, 12:35 PM #5

(08-05-2012, 06:04 AM)neobrain Wrote: The min_filter stuff possibly gave me a good hint at that annoying mipmapping issue in kirby air ride and mario golf. Thanks a bunch

yesssssssssssssss

**delroth** · 08-05-2012, 01:37 PM #6

Build with all of these changes: http://dl.dolphin-emu.org/builds/dolphin...745-x64.7z

Almost everything has been added into master (for bug fixes) or the new misc-opts branch (for optimization changes). Again, thanks a lot for you work Smile

***MayImilae*** · 08-05-2012, 01:52 PM #7

Can you create a list of things to test? I'd be happy to test it but I don't know what to try.

**delroth** · 08-05-2012, 02:04 PM #8

Almost every game could be impacted by the changes in this branch. Test as much games as possible and check if new bugs occur.

**Axxer** · 08-05-2012, 02:48 PM #9

I may be able to help you with some testing -- I have ~46 games I can test (though one of them I know to be a bad iso). I'll see what I can do and post back here.

fagoatse · 08-06-2012, 08:04 AM **#10**

Played 3 hours of mario kart wii and no issues whatsover. Cant measure perfromance due to the fact that i run all my games at full speed. It seems Vtune support is coming so perhaps that will be used to measure any performance gains/loss in the future.