I feel this will be killed by the host processor's cache - 'sharing' data between cores requires the cache line to be evicted from the l1 of CORE1 then read in to the l1 of CORE2. This is relatively expensive - often in the order of hundreds of cycles (depending on topology, implementation details and where in the cache the contended data currently is).
Doing this at every memory read or write will likely make it slower than just running the cache logic on the same CPU as running the JIT itself - which is how the current cache code is implemented.
Doing this at every memory read or write will likely make it slower than just running the cache logic on the same CPU as running the JIT itself - which is how the current cache code is implemented.
