... | ... | @@ -93,8 +93,25 @@ In a meeting with Intel, we got a list of events: |
|
|
|
|
|
All events marked with (**) are not published and consequently not usable by LIKWID. We tried the other events but for some it was clear that it wouldn't work. E.g. the MEM_INST_RETIRED.ALL_* events count the number of loads resp. stores that are issued, executed and retired (completed) by the core, hence some units away from L2, L3 and memory. Moreover, there are cases where an instruction triggers data movement in the background (e.g. read-for-ownership for stores where the destination cache line is not present in the L1 cache).
|
|
|
|
|
|
### `load` benchmark
|
|
|
<img src="https://raw.githubusercontent.com/wiki/RRZE-HPC/likwid/images/skx_caches/SKX_L3TRI_load.png" alt="Data volume per loop iteration of L3 and memory controller for the `load` benchmark on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz">
|
|
|
|
|
|
|
|
|
### `store` benchmark
|
|
|
<img src="https://raw.githubusercontent.com/wiki/RRZE-HPC/likwid/images/skx_caches/SKX_L3TRI_store.png" alt="Data volume per loop iteration of L3 and memory controller for the `store` benchmark on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz">
|
|
|
|
|
|
|
|
|
### `copy` benchmark
|
|
|
<img src="https://raw.githubusercontent.com/wiki/RRZE-HPC/likwid/images/skx_caches/SKX_L3TRI_copy.png" alt="Data volume per loop iteration of L3 and memory controller for the `copy` benchmark on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz">
|
|
|
|
|
|
|
|
|
### `stream` benchmark
|
|
|
<img src="https://raw.githubusercontent.com/wiki/RRZE-HPC/likwid/images/skx_caches/SKX_L3TRI_stream.png" alt="Data volume per loop iteration of L3 and memory controller for the `stream` benchmark on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz">
|
|
|
|
|
|
|
|
|
### `triad` benchmark
|
|
|
<img src="https://raw.githubusercontent.com/wiki/RRZE-HPC/likwid/images/skx_caches/SKX_L3TRI_triad.png" alt="Data volume per loop iteration of L3 and memory controller for the `triad` benchmark on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz">
|
|
|
|
|
|
## Implications on the use of the L3 performance group for Intel Skylake
|
|
|
The L3 performance group for Intel Skylake still uses the two events mentioned above. So, keep in mind that L2_LINES_IN_ALL contains loads from L3 and memory and L2_TRANS_L2_WB contains writebacks to L3 (and memory).
|
|
|
|
... | ... | |