• Login
  • Register
  • Dolphin Forums
  • Home
  • FAQ
  • Download
  • Wiki
  • Code


Dolphin, the GameCube and Wii emulator - Forums › Dolphin Emulator Discussion and Support › Development Discussion v
« Previous 1 ... 36 37 38 39 40 ... 116 Next »

Understanding Dolphin: Wii Memory Architecture
View New Posts | View Today's Posts

Pages (2): 1 2 Next »
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Thread Modes
Understanding Dolphin: Wii Memory Architecture
01-22-2015, 01:08 PM
#1
Stevie-O Offline
Junior Member
**
Posts: 6
Threads: 2
Joined: Jan 2015
I'm trying to understand the Wii memory architecture so I can better grasp how Dolphin works, but I've hit a stumbling block.

I'm using the following as references:
* User manual for PPC 740/750 (https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF7785256996006C28E2)
* Programming Environments Manual for 32-Bit Implementations of the. PowerPC™ Architecture (http://www.freescale.com/files/product/doc/MPCFPE32B.pdf)
(As best as I can tell, these datasheets correspond to the CPU codenamed "Broadway" that the Wii uses.)

As I understand it, when address translation is enabled, there are three types of addresses that are in use:
* 32-bit CPU Virtual addresses (those used by all memory accesses generated within the CPU proper, including load/store, instruction fetch, etc.)
* 52-bit(!) MMU Virtual addresses (those used by the MMU to do page-table lookups for virtual->physical mapping)
* 32-bit Real/physical addresses (those that show up on the memory bus connecting the MMU to RAM and memory-mapped devices and whatnot)

Furthermore, my understanding is this:
* Point 1: The upper 4 bits of CPU virtual addresses specify one of sixteen segment registers.  The low-order 24 bits of that segment register are combined with the low-order 28 bits of the CPU virtual address to get the MMU virtual address


I jailbroke a Wii and wrote a little Homebrew program to dump out the sixteen segment registers.  The results were not what I expected: I see 0x80000000 for all SRs!

That would indicate that all segments are direct-store segments, except that my docs say that the 750 doesn't support direct-store segments.

I then modified my program to print the MSR. The output is: 0x0000b032, which according to my docs is:
EE=1 PR=0 FP=1 ME=1 IR=1 DR=1 RI=1
In particular, DR=1 and IR=1 means that address translation is, in fact, enabled for both loads/stores and for instruction fetches.

I then tried this:
* Create a global variable and set it to 42.
* Take the address of that variable (0x80034c60)
* Set the upper 4 bits to 0111 (0x70034c60)
* Read the value at that address.

Source: http://www.klozoff.org/wii/hwinfo/
(Main file: http://www.klozoff.org/wii/hwinfo/source/hwinfo.c

If SR7 and SR8 contain the same data, then 0x70034c60 and 0x80034c60 should generate the same virtual addresses, and thus generate the same physical adddresses.  Therefore, they should reference exactly the same memory, and the answer should be 42.

What I actually see happen is:
* On Dolphin (4.0-5099), the value printed for *(int*)0x70034c60 is 0, instead of 42.
* On a real Wii, the program crashes with a DSI exception at the point where it attempts to dereference the mangled pointer.

Obviously, I am missing something. But what?
Find
Reply
01-22-2015, 03:26 PM
#2
tueidj Offline
Senior Member
****
Posts: 552
Threads: 0
Joined: Apr 2013
You're missing Block Address Translation, which is regularly used instead of the Segment Descriptors (which are only used for some gamecube games due to ARAM not being directly addressable).
The reason Dolphin doesn't throw a DSI is because it a) doesn't emulate the BATs and SRs completely and b) has a "fake" memory block to cover the range normally used by games for virtual memory and you just happened to pick an address that hit it.
Find
Reply
01-23-2015, 11:24 AM
#3
Stevie-O Offline
Junior Member
**
Posts: 6
Threads: 2
Joined: Jan 2015
(01-22-2015, 03:26 PM)tueidj Wrote: You're missing Block Address Translation, which is regularly used instead of the Segment Descriptors (which are only used for some gamecube games due to ARAM not being directly addressable).
The reason Dolphin doesn't throw a DSI is because it a) doesn't emulate the BATs and SRs completely and b) has a "fake" memory block to cover the range normally used by games for virtual memory and you just happened to pick an address that hit it.

Gah! I had seen a comment in the Dolphin source about even MMU-heavy games not messing with the BATs. (Upon a second look, it does actually specify *custom* BATs.) Between that, and the very small number of BATs (4) compared with segments (16), I had assumed that the BATs weren't being used at all.

My BATs looked like this:
IBAT0: 80001fff 00000002
IBAT1: 00000000 00000000
IBAT2: 00000000 00000000
IBAT3: 00000000 00000000

DBAT0: 80001fff 00000002
DBAT1: c0001fff 0000002a
DBAT2: 00000000 00000000
DBAT3: 00000000 00000000


This suggested that 0x8xxxxxxx and 0xCxxxxxxx access the same physical addresses, but the versions that start with 0xC bypass the cache.
I was able to confirm the 'same physical address' bit: When I rewrote my code so it changes the upper 4 bits from 0x8 to 0xC, and read from the resulting pointer, the program says:

&global_variable = 0x80035160
*0xc0035160 is 42


Now I have a bunch more questions -- ones I don't think I can answer myself simply by experimenting:

(1) What is the point of being able to access main RAM in both a cached and non-cached manner? My spec clearly states that doing so without carefully clearing the cache is "a programming error".
(2) What is ARAM?
(3) This covers addresses beginning with 0x8 and 0xC, but ReadFromHardware in MemmapFunctions.cpp has a *lot* more going on. How do the addresses that begin with 0x0, 0x9, 0x1, 0xD, 0xE, 0x7 and 0x4 work?

Additionally, I would greatly appreciate if someone could explain what the heck "BitSet32(0xCFC)[segment]" does. I've read the source for BitSet several times and can't follow it.
Find
Reply
01-23-2015, 11:55 AM (This post was last modified: 01-23-2015, 11:55 AM by tueidj.)
#4
tueidj Offline
Senior Member
****
Posts: 552
Threads: 0
Joined: Apr 2013
There are plenty of situations where you might want uncached access. Maybe you want to store data out or bring data in without polluting the cache, or maybe check to see if a particular address has been updated by other hardware. It's not a programming error at all.
The uncached range doesn't just cover the physical memory, it also includes all the MMIO ranges which must be uncached for obvious reasons.

ARAM is a 16MB region of memory on the gamecube which is only accessible to the CPU by doing DMA transfers to/from main memory. It can also be accessed by the DSP so the primary purpose is to hold audio samples, but it can be used for anything.

Dolphin doesn't believe in cached memory or real mode so the following ranges are basically equivalent:
0x0xxxxxxx : 0x8xxxxxxx : 0xCxxxxxxx = MEM1 (and MMIO stuff/direct EFB access for large 0xCxxxxxxx addresses)
0x1xxxxxxx : 0x9xxxxxxx : 0xDxxxxxxx = MEM2 (wii only)
0xE000xxxx = a special feature of Gekko/Broadway allows half of the L1 data cache to be used as directly addressable memory, the GC/Wii SDK maps it to this range.
0x7xxxxxxx / 0x4xxxxxxx = miscellaneous virtual memory ranges used by some games (paged memory backed by ARAM), Dolphin simulates real memory here so it doesn't have to bother dealing with page faults.
Find
Reply
01-23-2015, 01:27 PM (This post was last modified: 01-23-2015, 01:28 PM by Fiora.)
#5
Fiora Offline
x86 JIT Princess
**********
Developers (Some Administrators and Super Moderators)
Posts: 237
Threads: 0
Joined: Aug 2014
Note that magumagu has been working on a branch to largely fix Dolphin's memory handling, i.e. to handle the distinction between real and virtual mode, to handle host vs emulated memory accesses correctly, handle null pointers correctly, handle arbitrary BAT setups... the whole shebang. This will be required for games like The Clone Wars, Cars 2, etc.
Website Find
Reply
01-23-2015, 02:05 PM
#6
Stevie-O Offline
Junior Member
**
Posts: 6
Threads: 2
Joined: Jan 2015
(01-23-2015, 11:55 AM)tueidj Wrote: There are plenty of situations where you might want uncached access. Maybe you want to store data out or bring data in without polluting the cache, or maybe check to see if a particular address has been updated by other hardware. It's not a programming error at all.
I did qualify my statement using 'without carefully clearing the cache'.  Note that the words 'programming error' come straight out of the PPC manual -- they appear repeatedly in cache-related sections. In particular, it is clearly stated in several places that when accessing a particular physical address via *both* cached and non-cached accesses, great care must be taken to ensure any cached version is flushed before performing a non-cached access.
This is less straightforward than it immediately appears to be. Consider this code:
Code:
int x = *(volatile int*)0x80000000;
int y = *(volatile int*)0xc000001c; // programming error!
The 750 has 32-byte cache lines, so the first fetch will cache physical addresses 0x00-0x1F. This will then conflict with the non-cached read from 0x1C.
Even worse, prefetching means that this sort of thing can happen even if the first line was never actually executed (e.g. skipped over by a conditional branch).

(01-23-2015, 11:55 AM)tueidj Wrote: The uncached range doesn't just cover the physical memory, it also includes all the MMIO ranges which must be uncached for obvious reasons.

Okay, now I'm starting to get it.
Both 0x8 and 0xC cover a 256MB range of addresses, covering various pieces of hardware sitting on the memory bus:
* Some actual RAM chips (nobody makes 24MB chips; there must be a 16MB and an 8MB, or more likely three 8MB chips)
 Presumably, actual RAM is always accessed via 0x8 (no point in *not* doing that)
* Some MMIO registers, which would presumably always be accessed via 0xC

However, instead of carefully separating them out (having 0x8 only map to RAM chips and 0xC only map to MMIO devices), the Wii makes everything available via either and it's up to the programmer to access everything correctly.


(01-23-2015, 11:55 AM)tueidj Wrote: ARAM is a 16MB region of memory on the gamecube which is only accessible to the CPU by doing DMA transfers to/from main memory. It can also be accessed by the DSP so the primary purpose is to hold audio samples, but it can be used for anything.
If it can't be accessed directly by the CPU, then how can segment descriptors help a game access ARAM?


(01-23-2015, 11:55 AM)tueidj Wrote: Dolphin doesn't believe in cached memory or real mode so the following ranges are basically equivalent:
0x0xxxxxxx : 0x8xxxxxxx : 0xCxxxxxxx  = MEM1 (and MMIO stuff/direct EFB access for large 0xCxxxxxxx addresses)
0x1xxxxxxx : 0x9xxxxxxx : 0xDxxxxxxx  = MEM2 (wii only)
0xE000xxxx  = a special feature of Gekko/Broadway allows half of the L1 data cache to be used as directly addressable memory, the GC/Wii SDK maps it to this range.
0x7xxxxxxx / 0x4xxxxxxx  = miscellaneous virtual memory ranges used by some games (paged memory backed by ARAM), Dolphin simulates real memory here so it doesn't have to bother dealing with page faults.

I'm less concerned with how Dolphin views it than how it works on a real Wii.  (Actually, I'm concerned with both, but I already have the source code to Dolphin; most of that is already quite clear from how ReadFromHardware is written.)
Knowing how Dolphin does it isn't enough, because (as has been established) Dolphin takes shortcuts.    That isn't bad; until we all have terahertz CPUs, some shortcuts will probably be required in order to run various games.  But it does mean I can't necessarily look at Dolphin and automatically know how a real Wii would behave in certain circumstances.  What if there's a different shortcut that runs faster, or disrupts fewer games? What if it's not a shortcut, but a plain old bug?
Consider this line from ReadFromHardware:

Code:
if ((em_address & 0xC8000000) == 0xC8000000)
That looks buggy to me.  I would expect it to be

Code:
if ((em_address & 0xF8000000) == 0xC8000000)
But I can't know for sure -- maybe a real Wii treats 0xC8xxxxxx, 0xD8xxxxxx, 0xE8xxxxxx, and 0xF8xxxxxx the same. I'll need to find a stable MMIO register that normally returns an unusual bit pattern in order to test it on a real Wii.

The L1 cache thing is interesting (when I saw Memmap.c declare a region called "L1 cache" I was quite confused).  Do you know if that portion of the L1 cache is *only* directly-addressable (i.e. not usable as actual cache memory)?  Since L1 cache is normally much faster than external RAM, that would make for a small high-speed memory area.  Useful for code that needs to execute quickly, but can't fit its working set into the 28 or so GPRs available.

So as I *presently* understand it, the physical memory map -- the set of devices found on the Wii's memory bus -- is:

0x00000000-0x017FFFFF  MEM1 (so, should be accessed via 0x80000000)
0x08000000-0x0BFFFFFF   EFB [ReadFromHardware/WriteToHardware] so should be accessed via 0xC8000000
0x0C000000-0x0C000FFF   GPU Commands [InitMMIO]
0x0C001000-0x0C001FFF   Pixel Engine [InitMMIO]
0x0C002000-0x0C002FFF   Video Interface [InitMMIO]
0x0C003000-0x0C003FFF   Processor Interface [InitMMIO]
0x0C004000-0x0C004FFF   Memory Interface [InitMMIO]
0x0C005000-0x0C005FFF   DSP [InitMMIO]
0x0C006000-0x0C0063FF   DVD [InitMMIO]
0x0C006400-0x0C0067FF   Serial [InitMMIO]
0x0C006800-0x0C006BFF   "Expansion" [InitMMIO]
0x0C006C00-0x0C006FFF   Audio [InitMMIO]
0x0C008000-0x0C008FFF   Some FIFOs [WriteToHardware]
??? No idea where MEM2 lives

When my little Wii program is launched by the Homebrew channel (or by Dolphin), based on the contents of the BATs and the segment registers:
- The above devices can variously be accessed via 0x80000000-0x8FFFFFFF and 0xC0000000-0xCFFFFFFF, due to IBATS0, DBATS0, and DBATS1.
- The segment registers are all invalid (they have T=1, which means 'direct-store', which isn't supported on the 750).
Therefore, on a real Wii, addresses that do not begin with 0x8 or 0xC should not exist.
On Dolphin, they *do* exist. So how do they come into being on a Wii?
Find
Reply
01-23-2015, 03:04 PM (This post was last modified: 01-23-2015, 03:19 PM by magumagu.)
#7
magumagu Offline
Developer
**********
Developers (Some Administrators and Super Moderators)
Posts: 42
Threads: 1
Joined: May 2014
You might have a slightly easier time looking at https://github.com/magumagu/dolphin/blob/dynamic-bat/Source/Core/Core/PowerPC/MMU.cpp#L138 (from PR1882). This is much closer to how a Wii actually works.

MEM2 is 64MB starting at 0x10000000 on Wii. Wii games usually map this using BATs to 0x90000000 and 0xD0000000.

Whenever an exception is triggered on PowerPC, translation gets turned off, so games end up accessing 0x00000000 etc.

0x40000000, 0x70000000, and 0xE0000000 don't actually correspond to anything on hardware. Some Gamecube games set up the segment registers/page tables to map 0x40000000 or 0x70000000; Dolphin has a hack (which is turned on by disabling the "MMU" setting) that backs these with actual memory because it didn't have a decent MMU implementation for a long time.

Games use the locked L1 extension allocate cache lines out of the L1 cache; as far as we know, all games use 0xE0000000 for this, but that isn't required by the hardware. (Games that do this modify the BAT to map 0xE0000000 to 0xE0000000.) You can probably find the actual Broadway manual on the web, which has a complete description of how this works.

And yes, the "(em_address & 0xC8000000) == 0xC8000000" thing is a bug.
Find
Reply
01-23-2015, 03:15 PM
#8
magumagu Offline
Developer
**********
Developers (Some Administrators and Super Moderators)
Posts: 42
Threads: 1
Joined: May 2014
Oh, and one more thing: the Wii has some extra BAT registers which you might have missed: https://github.com/dolphin-emu/dolphin/blob/96a2b74c02e3ee12b23b9a2fcfcf51c32d9524ed/Source/Core/Core/PowerPC/Gekko.h#L780 .
Find
Reply
01-23-2015, 07:57 PM
#9
tueidj Offline
Senior Member
****
Posts: 552
Threads: 0
Joined: Apr 2013
(01-23-2015, 02:05 PM)Stevie-O Wrote: I did qualify my statement using 'without carefully clearing the cache'.  Note that the words 'programming error' come straight out of the PPC manual -- they appear repeatedly in cache-related sections. In particular, it is clearly stated in several places that when accessing a particular physical address via *both* cached and non-cached accesses, great care must be taken to ensure any cached version is flushed before performing a non-cached access.
This is less straightforward than it immediately appears to be. Consider this code:
Code:
int x = *(volatile int*)0x80000000;
int y = *(volatile int*)0xc000001c; // programming error!
The 750 has 32-byte cache lines, so the first fetch will cache physical addresses 0x00-0x1F. This will then conflict with the non-cached read from 0x1C.
Even worse, prefetching means that this sort of thing can happen even if the first line was never actually executed (e.g. skipped over by a conditional branch).
But it's absolutely not a programming error for two distinct addresses to return different data. The CPU isn't going to crash and burn and nothing catastrophic will happen; the only error would be if the programmer didn't understand what they were doing.
Consider if you want to read a single word to see if some other hardware has completed an operation; accessing it via a cached pointer means the entire cacheline gets fetched (potentially evicting other useful data) and then it needs to be invalidated every time you recheck. So instead of repeatedly transferring 8 bytes over the bus it repeatedly fetches the entire 32-byte cacheline.

Quote:If it can't be accessed directly by the CPU, then how can segment descriptors help a game access ARAM?
By using paging. A small area of MEM1 is used to temporarily hold a few pages of ARAM, when a DSI/page fault occurs DMA transfers are used to swap pages in and out.

Quote:I'm less concerned with how Dolphin views it than how it works on a real Wii.  (Actually, I'm concerned with both, but I already have the source code to Dolphin; most of that is already quite clear from how ReadFromHardware is written.)
Knowing how Dolphin does it isn't enough, because (as has been established) Dolphin takes shortcuts.    That isn't bad; until we all have terahertz CPUs, some shortcuts will probably be required in order to run various games.  But it does mean I can't necessarily look at Dolphin and automatically know how a real Wii would behave in certain circumstances.  What if there's a different shortcut that runs faster, or disrupts fewer games? What if it's not a shortcut, but a plain old bug?
Dolphin has lots of issues because it's only designed to run games, which don't really do very interesting stuff with memory configurations. As an example, Star Wars: Clone Wars does use a custom setup and currently doesn't run.

Quote:The L1 cache thing is interesting (when I saw Memmap.c declare a region called "L1 cache" I was quite confused).  Do you know if that portion of the L1 cache is *only* directly-addressable (i.e. not usable as actual cache memory)?  Since L1 cache is normally much faster than external RAM, that would make for a small high-speed memory area.  Useful for code that needs to execute quickly, but can't fit its working set into the 28 or so GPRs available.
By default the L1 data cache is 32KB. Activating the "locked cache" halves it and addresses can be manually assigned to the "locked" region. These lines are never evicted (they should not map to physical memory), that's where the "locked" naming comes from.
In addition to being directly addressable data can also be transferred in/out of each locked cacheline using a DMA engine on the CPU. It's a bit awkward to use though, since it's part of the CPU it has no interrupt.

Quote:So as I *presently* understand it, the physical memory map -- the set of devices found on the Wii's memory bus -- is:

0x00000000-0x017FFFFF  MEM1 (so, should be accessed via 0x80000000)
0x08000000-0x0BFFFFFF   EFB [ReadFromHardware/WriteToHardware] so should be accessed via 0xC8000000
EFB is pretty small, I don't remember if it's 4MB or 8MB total but I do remember the first half is RGBA data for each pixel while the second half accesses the Z buffer.
Quote:0x0C000000-0x0C000FFF   GPU Commands [InitMMIO]
0x0C001000-0x0C001FFF   Pixel Engine [InitMMIO]
0x0C002000-0x0C002FFF   Video Interface [InitMMIO]
0x0C003000-0x0C003FFF   Processor Interface [InitMMIO]
0x0C004000-0x0C004FFF   Memory Interface [InitMMIO]
0x0C005000-0x0C005FFF   DSP [InitMMIO]
The actual ranges are again much smaller.
Quote:0x0C006000-0x0C0063FF   DVD [InitMMIO]
0x0C006400-0x0C0067FF   Serial [InitMMIO]
0x0C006800-0x0C006BFF   "Expansion" [InitMMIO]
0x0C006C00-0x0C006FFF   Audio [InitMMIO]
These were all shifted to 0x0D006xxx for the Wii. The DVD interface has extra weird stuff for security reasons (it's only accessible as 0x0D8060xx from the PowerPC, after some special bits are twiddled).
Quote:0x0C008000-0x0C008FFF   Some FIFOs [WriteToHardware]
There's only one FIFO "in" endpoint at 0x0C008000. The memory it writes to is controlled by the processor interface. Dolphin is pretty hacky about this because it thinks paired-single writes are two separate accesses, half the data goes to 0x0C008004 so it just makes the FIFO eat anything stored around the "real" FIFO address.

For the wii there are a lot more MMIO registers in the 0x0D00xxxx/0x0D80xxxx range, only a few are meant to be accessible from the PowerPC while the rest are handled by Starlet.
Find
Reply
01-23-2015, 09:08 PM (This post was last modified: 01-23-2015, 09:09 PM by Fiora.)
#10
Fiora Offline
x86 JIT Princess
**********
Developers (Some Administrators and Super Moderators)
Posts: 237
Threads: 0
Joined: Aug 2014
(01-23-2015, 07:57 PM)tueidj Wrote: There's only one FIFO "in" endpoint at 0x0C008000. The memory it writes to is controlled by the processor interface. Dolphin is pretty hacky about this because it thinks paired-single writes are two separate accesses, half the data goes to 0x0C008004 so it just makes the FIFO eat anything stored around the "real" FIFO address.
I think this is only true of the interpreter; the JIT just makes it a single write. I didn't actually know that a single write was the correct behavior though; I suppose the interpreter should be fixed.
Website Find
Reply
« Next Oldest | Next Newest »
Pages (2): 1 2 Next »


  • View a Printable Version
  • Subscribe to this thread
Forum Jump:


Users browsing this thread: 1 Guest(s)



Powered By MyBB | Theme by Fragma

Linear Mode
Threaded Mode