It's not really an issue. Shonumi just noticed that some parts of SMG2 have a bigger gap between HLE and LLE than others, and he pinpointed it specifically to be graphics related. I said sort of, that it probably had more to do with the GPU thread, but you have stated that it probably actually has more to do with transfers between RAM and VRAM than anything else.
Indeed, there is no "issue" here, it's just that the only time I ever noticed the HLE/LLE gap virtually disappear was in specific spots in SMG2 (the gap got smaller, not bigger Axxer :p) It appears graphically related, specifically to how much is drawn in the background. I don't think it's something on the CPU side since HLE and LLE both take a hit to similar levels, and whenever I switch the view to 1st person and look at something else (a section of the screen with nothing but clouds, for example), my FPS jump back. This is relevant because, as I said, this is the only time I've gotten the HLE/LLE gap to lower to about a difference of 1 or 2 FPS. If we can determine where this bottleneck is happening, it'll be useful in identifying where this could happen in other games, and thus rule them out as it relates to testing HLE vs LLE. It's mostly for the latter reason that I want to find out if temporary bottlenecks like this are possible and reproducible.
Don't expect me to know much when it comes to GPUs though. In fact, I very much rely on you to tell me when I'm off when it comes to GPUs. Having said that don't always feel the need to correct people. It's great that you do it, but don't make it a mission. You'll end up,
like this guy. Though you're reasonable enough to know when to stop :p
Anyway, these are the spots where I got the slowdowns. Just pics showing where the slowdowns happen, I'll post the speeds in the morning. These are all in Yoshi Star Galaxy. The first is by the Comet Medal, just before you take the path to get to the next mini-planet with the checkpoint. The second one takes place right after you leave said mini-planet with the checkpoint. You'll start climbing to the right, just before a Piranha Plant. I'm just not able to get fullspeed on some of these parts, regardless of revision, graphical settings, and even using HLE.
Quote:I said sort of, that it probably had more to do with the GPU thread, but you have stated that it probably actually has more to do with transfers between RAM and VRAM than anything else.
If it's related to main memory <-> video memory transfers then it's related to the GPU thread.
For the record I don't like to call it the GPU thread since that seems to make people think that it runs entirely on the GPU. I just call it the video thread. I shouldn't call it that either but I have no idea what the actual name of the thread is (since nobody cares).
Keep in mind that HLE always runs on the video thread.
LLE can run on the video thread or in a separate thread.
@Shonumi
I'm now thoroughly confused. Does the performance difference between HLE and LLE go up or down in areas with high visible polycounts?
rpglord Wrote:I can confirm Shonumi's results that something is wrong with LLE in latest revison.
HLE performs the same in 3.0-776 as in 3.0,however LLE have pretty significant perfomance hit going from 3.0 to 3.0-776.
I have already posted my results in SMG2 in Yoshi Star with 3.0 776 where with LLE I got 86 fps.
In exactly same spot,with exactly same settings ( I even disabled fast mipmaps so everything is the same ) I get 97 fps with 3.0.
Thats 11 fps more !
But,there is only 1 fps more if using HLE....
Very interesting.......looks like it might be a threading issue. The affinity for the audio thread with LLE on thread was changed recently.
Does lock threads to cores on/off change the situation?
Quote:Having said that don't always feel the need to correct people. It's great that you do it, but don't make it a mission. You'll end up, like this guy.
But I am that guy.
Quote:Though you're reasonable enough to know when to stop
.....yeah......sure.......*darts eyes*
Quote:If it's related to main memory <-> video memory transfers then it's related to the GPU thread.
The actual thread though is just waiting, right? It isn't running much at that time because it is stuck waiting for the data transfer.
Quote:I'm now thoroughly confused. Does the performance difference between HLE and LLE go up or down in areas with high visible polycounts?
Performance difference goes down. My mistake; I misspoke.
Yes, the performance difference between HLE and LLE goes down (dramatically). In the first test scenario (first screenshot) HLE and LLE were both at 56 FPS. I know I said I'd post them this morning, but I've got work early (and I won't be getting off any earlier, just working longer :p). So, tonight I'll get 4 pics up of the latest rev running HLE and LLE in those spots.
Quote:The actual thread though is just waiting, right? It isn't running much at that time because it is stuck waiting for the data transfer.
Yup.
Quote:Performance difference goes down. My mistake; I misspoke.
So then:
High poly count: low framerate regardless of HLE or LLE
Low poly count: High framerate with HLE, much lower with LLE
Correct?
Yes. Essentially, the bottleneck with HLE is always the video/HLE thread, while with LLE (on thread) it trades off between the LLE thread and the video thread depending on poly count.
Just checked out the places where
Shonumi got slowdowns.
Naturally I'm able to stay well above the 60FPS, but here are some screenshots showing the difference between HLE and LLE.
I apologize for the shifted camera angle in the first screenshot. It's pretty hard to move the camera at that spot.
These were made with 3.0-793.
Spot 1 HLE ─ 80FPS
Spot 1 LLE ─ 79FPS
Spot 2 HLE ─ 96FPS
Spot 2 LLE ─ 90FPS
(10-25-2012, 05:41 AM)NaturalViolence Wrote: [ -> ]Very interesting.......looks like it might be a threading issue. The affinity for the audio thread with LLE on thread was changed recently.
Does lock threads to cores on/off change the situation?
It doesnt do anything for me,I got same fps.
@Garteal can you test at the first checkpoint in yoshi star,where difference is supposed to be the highest ?
Quote:Yes. Essentially, the bottleneck with HLE is always the video/HLE thread, while with LLE (on thread) it trades off between the LLE thread and the video thread depending on poly count.
Aren't you forgetting something? Some other thread perhaps that has been observed to be the bottleneck most of the time in most games?