Alright, fine. Yesterday I thought a bit about EFB copies, how our two (or three with that hybrid stuff) approaches to this differ and why EFB to texture fails. I think I've understood the most stuff about this now and I'm kind of proud of it, even though it's somewhat nerdy oO
Anyway, with respect: As far as I can see, Squall is just throwing some random expressions around without actually having a clue what he's talking about. Maybe he knows his way around the Wii internals, but I guess he doesn't really know the Dolphin source as much as me. (FWIW, you're free to correct me on this statement)
So here's a little lesson for you guys, haven't read any explanation like this from anyone else on the forums, yet: (fwiw, I only know the D3D side, but most stuff should apply to OGL as well, apart from the API-specific terminology maybe)
1. What is the Embedded Frame Buffer (EFB)? - it's a memory area of 2 MB inside the Flipper (GPU) which kind of acts as a (color and depth) render target with variable size. It's the only area where the GPU can render (draw stuff) to. There's two other important things which can be done with the EFB: It can be accessed by the CPU (pixel wise, as far as I could see; color and depth values are seperate), only reads are supported, right now. Also, the EFB can be copied to the Wii's RAM and can be reused as a texture lateron (the Wii's textures are stored in RAM).
2. So, what are we doing when we copy the EFB to RAM? - This is the "proper" way of doing it (talking about the "old" EFB to RAM btw). It's implementation is pretty complex, but it boils down to reading the EFB (which is emulated with a D3D texture), converting it to a native texture format and storing it in the Wii's memory area. This also explains why it's so ****** slow

The actual D3D texture is created once that RAM data is being reused as a texture then (using our funky Safe Texture Cache).
3. And.. what are we doing when we copy the EFB to a texture? - Since most times the EFB data is just being used as a texture anyway, we just directly copy it to a D3D texture which will get reused lateron (again, using funky texture cache algorithms

). Advantage: It's much faster, since we only need to draw a textured quad. The downside of it is that we can't track if the game changed the texture meanwhile.
4. Where exactly is the problem with EFB to texture? - Alright, if a Wii game requests an EFB copy to texture, it expects the EFB data to be stored in RAM afterwards. The game can read some bytes, write some bytes, etc... but we don't actually copy the EFB to the RAM, so the game just reads and writes whatever texture was stored in that RAM area before. This will cause bugs, since the resulting texture won't be the same.
5. In the case that the Wii does try to write something to that RAM area, will we use the resulting (wrong) texture or the old (unmodified) EFB data? - nice question, I think (!) it's the old EFB data without STC and the wrong texture when STC is enabled.
6. Could you explain why in NSMBW the coins aren't spinning with EFB copy to texture? - fwiw, I didn't debug this issue and can only guess. I think the game is rendering the coins to the EFB, copies the rendered coins to RAM and does some funky postprocessing then.
Preparation: NSMBW stores the coin texture somewhere in the RAM.
First frame: NSBMW renders a coin with the coin texture to the EFB. It then tries to copy the rendered data back to the coin texture, which will create a D3D texture with the rendered coin, but won't update the Wii's RAM contents.
Second frame:NSMBW renders a coin to the EFB (again). Then it tries to copy the EFB data back to the texture again (still the RAM won't get updated). When the coin is displayed then, STC will completely ignore the stuff drawn to the EFB and just use the old RAM data which will never get updated. And so it goes on, which would be why you only see one frame of the coin.
7. About FrameBuffer Objects (FBOs): They won't fix anything. IIRC we are (or at least were) using FBOs in OpenGL anyway, and there's not equivalent of FBOs in Direct3D. The only advantage of them is that AccessEFB is much faster, but that won't help us either regarding EFB copies. ANY texture which is not stored in the (emulated) Wii's RAM will cause problems, simple as that.
8. The only thing you can do is copy the EFB both to a texture and to RAM, that's the way that this hybrid approach is working IIRC.
So much for that, did I miss anything?
EDIT, tl;dr: direct VRAM access for the CPU wouldn't help us at all since the actual problem is a) that we have no way to tell whether a texture changed until it gets set (because textures are stored in RAM and could be modified any time) b) that the only way to let the Wii game read/write textures copied from the EFB is to actually enable EFB copies to RAM.
tl;dr the tl;dr: Just forget about it, there's no way to improve the situation.
EDIT2: Fine, apparently there's some fancy technique(s) which I don't know of, but which could improve the situation. I'll talk to mudlord about this.