I am going in circles somehow. Its frustrating.
Issue is the following:
After adding few traces the issue most of the time looks like this:
49:22:350 e:\dolphin-master-org\dolphin\source\core\core\powerpc\jitarm64\jit.cpp:562 D[JIT]: JIT64 PC: 8031cb64 SRR0: 803082a4 SRR1: 00003032 FPSCR: 00000004 MSR: 00003032 LR: 8031cb64 r00: 00000000 r01: 80414450 r02: 80407520 r03: 804145a0 r04: 00000001 r05: 00000000 r06: 804144c0 r07: 80414500 r08: 80414543 r09: 0000c200 r10: 0000c208 r11: 80414688 r12: 802b69c0 r13: 804058e0 r14: 00000000 r15: 00000000 r16: 00000000 r17: 00000000 r18: 00000000 r19: 01000000 r20: 80414540 r21: 80414600 r22: 80414640 r23: cc000000 r24: 804145c0 r25: 00000000 r26: 00000000 r27: 00a00000 r28: 80414580 r29: 80414a04 r30: 00000003 r31: 00000001
49:22:360 e:\dolphin-master-org\dolphin\source\core\core\memtools.cpp:39 N[JIT]: EXCEPTION: CODE:c0000005 ACCESSTYPE: 0 BADADDRESS:00000002CC005020
49:22:361 e:\dolphin-master-org\dolphin\source\core\core\memtools.cpp:39 N[JIT]: EXCEPTION: CODE:c0000005 ACCESSTYPE: 1 BADADDRESS:00000002CC005020
49:22:371 e:\dolphin-master-org\dolphin\source\core\core\memtools.cpp:39 N[JIT]: EXCEPTION: CODE:c0000005 ACCESSTYPE: 0 BADADDRESS:00000002CC005022
49:22:372 e:\dolphin-master-org\dolphin\source\core\core\memtools.cpp:39 N[JIT]: EXCEPTION: CODE:c0000005 ACCESSTYPE: 1 BADADDRESS:00000002CC005022
49:22:381 e:\dolphin-master-org\dolphin\source\core\core\memtools.cpp:39 N[JIT]: EXCEPTION: CODE:c0000005 ACCESSTYPE: 8 BADADDRESS:FFFFFE9261370000
The last line tell us, that we are at an invalid PC location (ACCESS_TYPE=8). Up till then everything looks ok. Issue is, its not deterministic. If i am looking at the logfile, sometimes it works only 800 lines, one run later it might work for 1500 lines - so its extremely hard to catch with the debugger.
I did catch the issue with the debugger once. Reason was wrong trampoline for the fault handler.
00000217`8e0349f0 a9bf47fe stp lr,xip1,[sp,#-0x10]!
00000217`8e0349f4 2a1903e0 mov w0,w25
00000217`8e0349f8 d28cd91e mov lr,#0x66C8
00000217`8e0349fc f2a1e55e movk lr,#0xF2A lsl #0x10
00000217`8e034a00 f2cffefe movk lr,#0x7FF7 lsl #0x20
00000217`8e034a04 d63f03c0 blr lr
00000217`8e034a08 a8c17bf1 ldp xip1,lr,[sp],#0x10 <<<<<< different pair order than store
00000217`8e034a0c d65f03c0 ret lr
Issue was that the order registers put on the stack are different order than they were restored -> which led to PC = 0 after return.
However i did review the generators of the trampolines and i have no explanation how this could have happened. Everything looks ok.
Both ABI_PushRegisters(gprs_to_push) and ABI_PopRegisters(gprs_to_push) should, from what i understand push and pop registers in the very same order. The order is defined by an iterator from common:BitSet<u32>. I do not see how the iterator would iterate over the bitfield in two different orders.
In summary, something goes wrong non-deterministically (it fails at different points). Second observation: from the logs, the PC is set to invalid location right after an exception is handled which sets up a trampoline for device access. Third observation: last valid exception is almost always a write operation. Fourth observation: i could in one case root-cause the issue with a wrong trampoline (see above) but i cannot explain how this wrong trampoline could be generated wrongly.
I appreciate any ideas on how to go on from here.
Issue is the following:
After adding few traces the issue most of the time looks like this:
49:22:350 e:\dolphin-master-org\dolphin\source\core\core\powerpc\jitarm64\jit.cpp:562 D[JIT]: JIT64 PC: 8031cb64 SRR0: 803082a4 SRR1: 00003032 FPSCR: 00000004 MSR: 00003032 LR: 8031cb64 r00: 00000000 r01: 80414450 r02: 80407520 r03: 804145a0 r04: 00000001 r05: 00000000 r06: 804144c0 r07: 80414500 r08: 80414543 r09: 0000c200 r10: 0000c208 r11: 80414688 r12: 802b69c0 r13: 804058e0 r14: 00000000 r15: 00000000 r16: 00000000 r17: 00000000 r18: 00000000 r19: 01000000 r20: 80414540 r21: 80414600 r22: 80414640 r23: cc000000 r24: 804145c0 r25: 00000000 r26: 00000000 r27: 00a00000 r28: 80414580 r29: 80414a04 r30: 00000003 r31: 00000001
49:22:360 e:\dolphin-master-org\dolphin\source\core\core\memtools.cpp:39 N[JIT]: EXCEPTION: CODE:c0000005 ACCESSTYPE: 0 BADADDRESS:00000002CC005020
49:22:361 e:\dolphin-master-org\dolphin\source\core\core\memtools.cpp:39 N[JIT]: EXCEPTION: CODE:c0000005 ACCESSTYPE: 1 BADADDRESS:00000002CC005020
49:22:371 e:\dolphin-master-org\dolphin\source\core\core\memtools.cpp:39 N[JIT]: EXCEPTION: CODE:c0000005 ACCESSTYPE: 0 BADADDRESS:00000002CC005022
49:22:372 e:\dolphin-master-org\dolphin\source\core\core\memtools.cpp:39 N[JIT]: EXCEPTION: CODE:c0000005 ACCESSTYPE: 1 BADADDRESS:00000002CC005022
49:22:381 e:\dolphin-master-org\dolphin\source\core\core\memtools.cpp:39 N[JIT]: EXCEPTION: CODE:c0000005 ACCESSTYPE: 8 BADADDRESS:FFFFFE9261370000
The last line tell us, that we are at an invalid PC location (ACCESS_TYPE=8). Up till then everything looks ok. Issue is, its not deterministic. If i am looking at the logfile, sometimes it works only 800 lines, one run later it might work for 1500 lines - so its extremely hard to catch with the debugger.
I did catch the issue with the debugger once. Reason was wrong trampoline for the fault handler.
00000217`8e0349f0 a9bf47fe stp lr,xip1,[sp,#-0x10]!
00000217`8e0349f4 2a1903e0 mov w0,w25
00000217`8e0349f8 d28cd91e mov lr,#0x66C8
00000217`8e0349fc f2a1e55e movk lr,#0xF2A lsl #0x10
00000217`8e034a00 f2cffefe movk lr,#0x7FF7 lsl #0x20
00000217`8e034a04 d63f03c0 blr lr
00000217`8e034a08 a8c17bf1 ldp xip1,lr,[sp],#0x10 <<<<<< different pair order than store
00000217`8e034a0c d65f03c0 ret lr
Issue was that the order registers put on the stack are different order than they were restored -> which led to PC = 0 after return.
However i did review the generators of the trampolines and i have no explanation how this could have happened. Everything looks ok.
Both ABI_PushRegisters(gprs_to_push) and ABI_PopRegisters(gprs_to_push) should, from what i understand push and pop registers in the very same order. The order is defined by an iterator from common:BitSet<u32>. I do not see how the iterator would iterate over the bitfield in two different orders.
In summary, something goes wrong non-deterministically (it fails at different points). Second observation: from the logs, the PC is set to invalid location right after an exception is handled which sets up a trampoline for device access. Third observation: last valid exception is almost always a write operation. Fourth observation: i could in one case root-cause the issue with a wrong trampoline (see above) but i cannot explain how this wrong trampoline could be generated wrongly.
I appreciate any ideas on how to go on from here.
