|
|
# Memory Measurements on Intel Broadwell EP/EX
|
|
|
|
|
|
This blog post describes the case when memory measurements on Intel Broadwell EP/EX systems are
|
|
|
too high.
|
|
|
|
|
|
# Problem
|
|
|
If you measure the memory traffic on Intel Broadwell EP/EX with the `MEM*` groups, some systems
|
|
|
return strange high numbers. I got some reports about that from different computing centers.
|
|
|
|
|
|
```
|
|
|
+------------------------------------------+---------+------------+-----------------+
|
|
|
| Event | Counter | Core 0 | Core 10 |
|
|
|
+------------------------------------------+---------+------------+-----------------+
|
|
|
| INSTR_RETIRED_ANY | FIXC0 | 2093481000 | 1528120000 |
|
|
|
| CPU_CLK_UNHALTED_CORE | FIXC1 | 5590396000 | 5125980000 |
|
|
|
| CPU_CLK_UNHALTED_REF | FIXC2 | 3975666000 | 3638550000 |
|
|
|
| PWR_PKG_ENERGY | PWR0 | 53.7656 | 55.2363 |
|
|
|
| PWR_DRAM_ENERGY | PWR3 | 11.8070 | 12.5511 |
|
|
|
| FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE | PMC0 | 0 | 0 |
|
|
|
| FP_ARITH_INST_RETIRED_SCALAR_DOUBLE | PMC1 | 379993 | 379990 |
|
|
|
| FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE | PMC2 | 902215000 | 902215000 |
|
|
|
| CAS_COUNT_RD | MBOX0C0 | 56361240 | 56586960 |
|
|
|
| CAS_COUNT_WR | MBOX0C1 | 28293350 | 28353280 |
|
|
|
| CAS_COUNT_RD | MBOX1C0 | 56474850 | 56681730 |
|
|
|
| CAS_COUNT_WR | MBOX1C1 | 28406420 | 28456820 |
|
|
|
| CAS_COUNT_RD | MBOX2C0 | 56366030 | 56582170 |
|
|
|
| CAS_COUNT_WR | MBOX2C1 | 28295140 | 28349200 |
|
|
|
| CAS_COUNT_RD | MBOX3C0 | 56362180 | 56567480 |
|
|
|
| CAS_COUNT_WR | MBOX3C1 | 28293360 | 28344380 |
|
|
|
| CAS_COUNT_RD | MBOX4C0 | 56595560 | 141008100000000 |
|
|
|
| CAS_COUNT_WR | MBOX4C1 | 28384190 | 141008100000000 |
|
|
|
| CAS_COUNT_RD | MBOX5C0 | 56596110 | 141008100000000 |
|
|
|
| CAS_COUNT_WR | MBOX5C1 | 28384440 | 141008100000000 |
|
|
|
| CAS_COUNT_RD | MBOX6C0 | 0 | 0 |
|
|
|
| CAS_COUNT_WR | MBOX6C1 | 0 | 0 |
|
|
|
| CAS_COUNT_RD | MBOX7C0 | 0 | 0 |
|
|
|
| CAS_COUNT_WR | MBOX7C1 | 0 | 0 |
|
|
|
+------------------------------------------+---------+------------+-----------------+
|
|
|
|
|
|
```
|
|
|
|
|
|
MBOX4-5 should not be active.
|
|
|
|
|
|
|
|
|
# What is the problem
|
|
|
Commonly, the problem comes from only partly deactivated memory controller counter registers. Intel
|
|
|
Broadwell EP systems have 4 memory channels active in most cases. LIKWID does not know how many
|
|
|
channels (PCI devices) are active, it tests all and marks them available if all tests are positive.
|
|
|
Besides checking the availibility of the PCI devices, it also tries to read and write to the counter
|
|
|
registers. So for these unreliable systems, the checks pass successfully for 6 channels. Often, the
|
|
|
additional memory channel devices return zero, so it does not make any difference.
|
|
|
|
|
|
# How to fix it
|
|
|
Since there is not much LIKWID can do (besides the accessibility checks), the only way is to update
|
|
|
the `MEM*` groups for the systems and remove the faulty memory channels:
|
|
|
|
|
|
```
|
|
|
cp <LIKWID_BASE>/share/likwid/perfgroups/broadwellEP/MEM.txt ~/.likwid/groups/broadwellEP/MEM_BDX.txt
|
|
|
edit ~/.likwid/groups/broadwellEP/MEM_BDX.txt
|
|
|
- remove unneeded registers
|
|
|
- update metric formulas
|
|
|
likwid-perfctr -g MEM_BDX ...
|
|
|
``` |