[BUG] likwid-perfctr -c resetting cpuset of hybrid MPI+OpenMP binaries
Created by: jklinkenberg
Describe the bug
likwid-perfctr -c <cores>
should only monitor the cores specified but should not do any pinning like likwid-perfctr -C <cores>
. Nevertheless it messes with the cpuset of called binaries.
Attached you can find a zip file with a reproducer. likwid-perfctr_cpuset_reproducer.zip I ran a simple hybrid MPI+OpenMP application that only outputs the cpuset the particular process is restricted to. The Slurm script starts 2 ranks on a 2-socket compute node, so each rank should be assigned/restricted to one socket.
If the script is executed without likwid-perfctr -c
I see the expected behavior with the following output:
# regular taskset call before application
pid 86544's current affinity list: 0-23
pid 86545's current affinity list: 24-47
Command executed for rank 0: ./check_cpu_set.exe
Command executed for rank 1: ./check_cpu_set.exe
# taskset that is returned from hybrid application
Rank 00 of 02 1625173008.734828 taskset pid 86698's current affinity list: 0-23
Rank 01 of 02 1625173008.734825 taskset pid 86699's current affinity list: 24-47
If the script is executed with likwid-perfctr -c
the cores that are monitored seem to be correct in the resulting hwc_*
files. The cpuset inside the hybrid binary however is corrupted as seen in the following output:
# regular taskset call before application
pid 88729's current affinity list: 0-23
pid 88730's current affinity list: 24-47
Command executed for rank 0: likwid-perfctr -o hwc_R0.csv -f -c N:0-3 -g L2CACHE ./check_cpu_set.exe
Command executed for rank 1: likwid-perfctr -o hwc_R1.csv -f -c N:0-3 -g L2CACHE ./check_cpu_set.exe
# taskset that is returned from hybrid application
Rank 00 of 02 1625173296.002843 taskset pid 88993's current affinity list: 0-47
Rank 01 of 02 1625173296.002842 taskset pid 88992's current affinity list: 0-47
To Reproduce
- likwid-perfctr -- Version 5.1.1 (commit: 233ab943543480cd46058b34616c174198ba0459)
- Operating system: CentOS Linux release 7.9.2009
- Libraries: MPI + OpenMP
- Intel Compiler 19.0.1.144
- Intel MPI 2018.4.274
- No marker API but the same happens when using marker API
To Reproduce with a LIKWID command
- Please supply the output of the command with
-V 3
added to the command: - ==> Debug output is also attached in the zip file