So, if we leave out the `L2_TRANS_L2_WB` from the `L3` performance group, we can include both `IDI_MISC_WB*` events and still have one counter register left. In this counter we could measure the L3 hits to characterize the load path. Unfortunately, the `MEM_LOAD_L3_*` are likely to be listed in the specification updates. This is not the case for [Intel Skylake SP](https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-scalable-spec-update.pdf) but for [Intel Skylake Desktop (SKL128)](https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/desktop-6th-gen-core-family-spec-update.pdf).
Memory data volume [GBytes] = 1.0E-09*(SUM(CAS_COUNT_RD)+SUM(CAS_COUNT_WR))*64.0
-
Profiling group to measure L2 - L3 - memory traffic. This group differs from previous CPU generations due to the
L3 victim cache. Since data can be loaded from L3 or memory, the memory controllers need to be measured as well.
```
(##) Commonly, all data should be loaded from memory directly to L2 except the LLC prefetcher is active (like in this case). One might assume that all cache lines evicted to L3 for re-use are also loaded again from L3 but that would mean that the heuristics are always the optimal decision.