Is it possible to submit ucode to denver?

Nintonito · 02-10-2015, 06:46 AM #1

Thought crossed my head while I was thinking about denver potential optimizations, and I wondered, is it possible to have a JIT compile and submit ucode instead of ARM code to denver, since that alone would HUGELY speed up denvers processing speed, even further than it already achieves. Does android even allow that? I would think chrome would benefit as well, because it seems to also hit code optimizer bottlenecks on denver.

**Sonicadvance1** · 02-10-2015, 06:48 AM #2

It's not possible. The VLIW architecture won't ever be public in any case.

Nintonito · (This post was last modified: 02-10-2015, 07:26 AM by Nintonito.)

(02-10-2015, 06:48 AM)Sonicadvance1 Wrote: It's not possible. The VLIW architecture won't ever be public in any case.

Huh i wasn't aware denver was using unique microcode. I assumed nvidia was using some common ucode set, and that with mild abstraction it could work. My mistake. So nvidia's architecture is entirely proprietary then. So then another question would be, is it feasible to be able to reliably optimize a program to run better on nvidia's code optimizer scheme (since that is essential to actually getting reasonable performance from the architecture).

**Sonicadvance1** · 02-10-2015, 07:43 AM #4

It's more about optimizing for the 2-wide in-order execution until the DCO can recompile it to native VLIW.
Which is a bit hard to do in our recompiler without using an IR. Which testing on the Nexus 9 you can visually see the performance increase as it recompiles more code to VLIW.

Nintonito · 02-10-2015, 08:02 AM #5

(02-10-2015, 07:43 AM)Sonicadvance1 Wrote: It's more about optimizing for the 2-wide in-order execution until the DCO can recompile it to native VLIW.
Which is a bit hard to do in our recompiler without using an IR. Which testing on the Nexus 9 you can visually see the performance increase as it recompiles more code to VLIW.

Yeah see, that works in the end, because at least the DCO actually makes the gains within a reasonable time frame. Web browsers seem to have a unique ability to break the DCO (chrome causes the CPU to work up a sweat because it uses a weird Java/Native combo that breaks DCO). Still sucks that you basically are building performance. So many weird unexplained decisions with denver.

tueidj · 02-10-2015, 01:10 PM #6

It would be easier if the Nexus 9 actually worked with nvidia's profiler, like they claim it does... unfortunately google forgot to include the necessary tegra kernel patches.

**Sonicadvance1** · 02-10-2015, 02:55 PM #7

That too would also be amazing. I can't profile the AArch64 code on the Nexus 9 due to it.
Both the CPU profiler and GPU profiler don't work due to Google's failure.

Nintonito · 02-10-2015, 02:59 PM #8

F**k Google. Seems like they want everybody to do things their way (and to not let anyone have an edge over qualcomm, who seems to have bought them out at this point).

tueidj · 02-10-2015, 04:35 PM #9

nvidia have said that the patches will be in the next update (presumably 5.1), so here's hoping. It would indeed be a great help to get SOME idea of how the arm64 instructions perform, given nobody wants to publish any latency/timing documentation (especially the unique tbl/tbx ops which are very handy but use up to 5 input registers/4 output registers).

Nintonito · 02-11-2015, 12:03 AM **#10**

(02-10-2015, 04:35 PM)tueidj Wrote: nvidia have said that the patches will be in the next update (presumably 5.1), so here's hoping. It would indeed be a great help to get SOME idea of how the arm64 instructions perform, given nobody wants to publish any latency/timing documentation (especially the unique tbl/tbx ops which are very handy but use up to 5 input registers/4 output registers).

I'm surprised Nvidia is dealing with this BS at all. Google clearly intended for a botched product release, ans gave the Tegra chip the minimal feature set that only embarrasses the GPU. All this clearly intending to force the K1 to operate within the same limits as the snapdragon 805/810 so that Google could continue sucking up to qualcomm, their lord and savior.