The EFB cache in the OGL backend is useful to not multiply GL API to query individual pixel, something that is unlikely to be efficient compared to reading back a full rectangle. But it still needs to flush the FIFO and worse it flush the driver command buffer ( that usually is rendering a frame behind ).
Instead if requests result are pushed in a Buffer on the GPU side ( like with a small compute shader writing to an append buffer ) and we only Map that buffer the next frame, we prevent the Driver to flush itself too as we are likely to have pass the fence protecting it already. Of course, that is something that we are more likely to do directly in a graphic engine in modern games, but depending of why a game read back a rendered texture, the hack can have a virtual free penalty compared to not doing EFB access at all without breaking ( or not ) the game feature using it.
Of course, that's only my debut with the code base, i still miss the global picture. Anyway, i won 4ms on the press start screen of "OnePiece:UA" only with basic cleaning in Renderer::ApplyState
Instead if requests result are pushed in a Buffer on the GPU side ( like with a small compute shader writing to an append buffer ) and we only Map that buffer the next frame, we prevent the Driver to flush itself too as we are likely to have pass the fence protecting it already. Of course, that is something that we are more likely to do directly in a graphic engine in modern games, but depending of why a game read back a rendered texture, the hack can have a virtual free penalty compared to not doing EFB access at all without breaking ( or not ) the game feature using it.
Of course, that's only my debut with the code base, i still miss the global picture. Anyway, i won 4ms on the press start screen of "OnePiece:UA" only with basic cleaning in Renderer::ApplyState
