NvMon / GPU Marker API: Can't `init_topology_gpu` after CUDA usage?
Created by: carstenbauer
Issue (MWE):
julia> using LIKWID
julia> using CUDA
julia> LIKWID.init_topology_gpu()
true
vs
julia> using LIKWID
julia> using CUDA
julia> x = CUDA.rand(Float32, 100);
julia> LIKWID.init_topology_gpu()
false
Consequences:
-
@nvmon
/nvmon
fails with
julia> metrics, events = @nvmon "FLOPS_SP" saxpy!(z, a, x, y);
ERROR: Couldn't init gpu topology.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] init(gpus::Vector{Int32})
@ LIKWID.NvMon /scratch/pc2-mitarbeiter/bauerc/devel/LIKWID.jl/src/nvmon.jl:16
[3] init
@ /scratch/pc2-mitarbeiter/bauerc/devel/LIKWID.jl/src/nvmon.jl:28 [inlined]
[4] nvmon(f::var"#3#4", group_or_groups::String; gpuids::Int64, print::Bool)
@ LIKWID.NvMon /scratch/pc2-mitarbeiter/bauerc/devel/LIKWID.jl/src/nvmon.jl:427
[5] nvmon(f::Function, group_or_groups::String)
@ LIKWID.NvMon /scratch/pc2-mitarbeiter/bauerc/devel/LIKWID.jl/src/nvmon.jl:426
[6] top-level scope
@ /scratch/pc2-mitarbeiter/bauerc/devel/LIKWID.jl/src/nvmon.jl:477
[7] top-level scope
@ /scratch/pc2-mitarbeiter/bauerc/.julia/packages/CUDA/tTK8Y/src/initialization.jl:52
(It works if we do LIKWID.init_topology_gpu()
right after using LIKWID
.)
- GPU Marker API doesn't work:
➜ bauerc@dgx-01 LIKWID.jl git:(cb/perfmonrev) likwid-perfctr -G 0 -W FLOPS_SP -m julia --project=. perfctr_gpu.jl
--------------------------------------------------------------------------------
CPU name: AMD EPYC 7742 64-Core Processor
CPU type: AMD K17 (Zen2) architecture
CPU clock: 2.25 GHz
--------------------------------------------------------------------------------
Error init GPU Marker API.
--------------------------------------------------------------------------------
GPU Marker API result file does not exist. This may happen if the application has not called LIKWID_GPUMARKER_CLOSE.
where the input file is
# perfctr_gpu.jl
using LIKWID
using LinearAlgebra
using CUDA
# LIKWID.init_topology_gpu() # example works if one uncomments this line
@assert CUDA.functional()
const N = 10_000
const a = 3.141f0 # Float32
# Note: CUDA defaults to Float32
const x = CUDA.rand(N)
const y = CUDA.rand(N)
const z = CUDA.zeros(N)
saxpy!(z,a,x,y) = z .= a .* x .+ y
saxpy!(z,a,x,y) # warmup
GPUMarker.init()
GPUMarker.startregion("saxpy")
saxpy!(z,a,x,y)
GPUMarker.stopregion("saxpy")
GPUMarker.close()