Skip to content

NvMon / GPU Marker API: Can't `init_topology_gpu` after CUDA usage?

Created by: carstenbauer

Issue (MWE):

julia> using LIKWID

julia> using CUDA

julia> LIKWID.init_topology_gpu()
true

vs

julia> using LIKWID

julia> using CUDA

julia> x = CUDA.rand(Float32, 100);

julia> LIKWID.init_topology_gpu()
false

Consequences:

  • @nvmon / nvmon fails with
julia> metrics, events = @nvmon "FLOPS_SP" saxpy!(z, a, x, y);
ERROR: Couldn't init gpu topology.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] init(gpus::Vector{Int32})
   @ LIKWID.NvMon /scratch/pc2-mitarbeiter/bauerc/devel/LIKWID.jl/src/nvmon.jl:16
 [3] init
   @ /scratch/pc2-mitarbeiter/bauerc/devel/LIKWID.jl/src/nvmon.jl:28 [inlined]
 [4] nvmon(f::var"#3#4", group_or_groups::String; gpuids::Int64, print::Bool)
   @ LIKWID.NvMon /scratch/pc2-mitarbeiter/bauerc/devel/LIKWID.jl/src/nvmon.jl:427
 [5] nvmon(f::Function, group_or_groups::String)
   @ LIKWID.NvMon /scratch/pc2-mitarbeiter/bauerc/devel/LIKWID.jl/src/nvmon.jl:426
 [6] top-level scope
   @ /scratch/pc2-mitarbeiter/bauerc/devel/LIKWID.jl/src/nvmon.jl:477
 [7] top-level scope
   @ /scratch/pc2-mitarbeiter/bauerc/.julia/packages/CUDA/tTK8Y/src/initialization.jl:52

(It works if we do LIKWID.init_topology_gpu() right after using LIKWID.)

  • GPU Marker API doesn't work:
➜  bauerc@dgx-01 LIKWID.jl git:(cb/perfmonrev)  likwid-perfctr -G 0 -W FLOPS_SP -m julia --project=. perfctr_gpu.jl
--------------------------------------------------------------------------------
CPU name:       AMD EPYC 7742 64-Core Processor                
CPU type:       AMD K17 (Zen2) architecture
CPU clock:      2.25 GHz
--------------------------------------------------------------------------------
Error init GPU Marker API.
--------------------------------------------------------------------------------
GPU Marker API result file does not exist. This may happen if the application has not called LIKWID_GPUMARKER_CLOSE.

where the input file is

# perfctr_gpu.jl
using LIKWID
using LinearAlgebra
using CUDA
# LIKWID.init_topology_gpu() # example works if one uncomments this line

@assert CUDA.functional()

const N = 10_000
const a = 3.141f0 # Float32
# Note: CUDA defaults to Float32
const x = CUDA.rand(N)
const y = CUDA.rand(N)
const z = CUDA.zeros(N)

saxpy!(z,a,x,y) = z .= a .* x .+ y
saxpy!(z,a,x,y) # warmup

GPUMarker.init()
GPUMarker.startregion("saxpy")
saxpy!(z,a,x,y)
GPUMarker.stopregion("saxpy")
GPUMarker.close()