this is a sort of intermission
Getting perf to work up to a point
Apparently the opensbi-mediated access to the performance counter does not map so using the usual cycles
and instructions
event works in perf record
. I got this board mainly to help with dav1d development efforts, so not having perf support would make harder to reason about performance.
The best workaround after a discussion in the forums, is to build the pmu-events
to include custom ones and then rely on the overly precise cpu-specific events instead:
$ perf list | grep cycle
bus-cycles [Hardware event]
cpu-cycles OR cycles [Hardware event]
ref-cycles [Hardware event]
stalled-cycles-backend OR idle-cycles-backend [Hardware event]
stalled-cycles-frontend OR idle-cycles-frontend [Hardware event]
m_mode_cycle
[M-mode cycles]
rtu_flush_cycle
s_mode_cycle
[S-mode cycles]
stalled_cycle_backend
[Stalled cycles backend]
stalled_cycle_frontend
[Stalled cycles frontend]
u_mode_cycle
[U-mode cycles]
vidu_total_cycle
vidu_vec0_cycle
vidu_vec1_cycle
...
$ perf list | grep inst
branch-instructions OR branches [Hardware event]
instructions [Hardware event]
br_inst
[Branch instructions]
cond_br_inst
[Conditional branch instructions]
indirect_br_inst
[Indirect branch instructions]
taken_cond_br_inst
[Taken conditional branch instructions]
uncond_br_inst
[Unconditional branch instructions]
instruction:
alu_inst
[ALU (integer) instructions]
amo_inst
[AMO instructions]
atomic_inst
[Atomic instructions]
bus_fence_inst
[Bus FENCE instructions]
csr_inst
[CSR instructions]
div_inst
[Division instructions]
ecall_inst
[ECALL instructions]
failed_sc_inst
[Failed SC instructions]
fence_inst
[FENCE instructions]
fp_div_inst
[Floating-point division instructions]
fp_inst
[Floating-point instructions]
fp_load_inst
[Floating-point load instructions]
fp_store_inst
[Floating-point store instructions]
load_inst
[Load instructions]
lr_inst
[LR instructions]
mult_inst
[Multiplication instructions]
sc_inst
[SC instructions]
store_inst
[Store instructions]
unaligned_load_inst
[Unaligned load instructions]
unaligned_store_inst
[Unaligned store instructions]
vector_div_inst
[Vector division instructions]
vector_inst
[Vector instructions]
vector_load_inst
[Vector load instructions]
vector_store_inst
[Vector store instructions]
id_inst_pipedown
[ID instruction pipedowns]
id_one_inst_pipedown
[ID one instruction pipedowns]
issued_inst
[Issued instructions]
rf_inst_pipedown
[RF instruction pipedowns]
rf_one_inst_pipedown
[RF one instruction pipedowns]
Building perf
Perf way to deal with cpu-specific events is through some machinery called jevents.
It lives in tools/perf/pmu-events
and you can manually trigger it with.
./jevents.py riscv arch pmu-events.c
And produce C code from a bunch of JSON and a CSV map file.
When I tried build the sources the first time I tried to cut it by setting most NO_{}
make variables and left NO_JEVENTS=1
, luckily I fixed it after noticing the different output in the forum.
## I assume you have here the custom linux sources
cd /usr/src/pi-linux/tools/perf
## being lazy I disabled about everything instead of installing dependencies, one time I disabled too much.
make -j 8 V=1 VF=1 HOSTCC=riscv64-unknown-linux-gnu-gcc HOSTLD=riscv64-unknown-linux-gnu-ld CC=riscv64-unknown-linux-gnu-gcc CXX=riscv64-unknown-linux-gnu-g++ AR=riscv64-unknown-linux-gnu-ar LD=riscv64-unknown-linux-gnu-ld NM=riscv64-unknown-linux-gnu-nm PKG_CONFIG=riscv64-unknown-linux-gnu-pkg-config prefix=/usr bindir_relative=bin tipdir=share/doc/perf-6.8 'EXTRA_CFLAGS=-O2 -pipe' 'EXTRA_LDFLAGS=-Wl,-O1 -Wl,--as-needed' ARCH=riscv BUILD_BPF_SKEL= BUILD_NONDISTRO=1 JDIR= CORESIGHT= GTK2= feature-gtk2-infobar= NO_AUXTRACE= NO_BACKTRACE= NO_DEMANGLE= NO_JEVENTS=0 NO_JVMTI=1 NO_LIBAUDIT=1 NO_LIBBABELTRACE=1 NO_LIBBIONIC=1 NO_LIBBPF=1 NO_LIBCAP=1 NO_LIBCRYPTO= NO_LIBDW_DWARF_UNWIND= NO_LIBELF= NO_LIBNUMA=1 NO_LIBPERL=1 NO_LIBPFM4=1 NO_LIBPYTHON=1 NO_LIBTRACEEVENT= NO_LIBUNWIND=1 NO_LIBZSTD=1 NO_SDT=1 NO_SLANG=1 NO_LZMA=1 NO_ZLIB= TCMALLOC= WERROR=0 LIBDIR=/usr/libexec/perf-core libdir=/usr/lib64 plugindir=/usr/lib64/perf/plugins -f Makefile.perf install
Now I have a perf
with still cycles
and instructions
not working with perf record
, I wonder if there is a way at opensbi or kernel level to aggregate events to make it work properly, but I never had to look into perf internals so probably I poke it way later if nobody address it otherwise, anyway
perf record --group -e u_mode_cycle,m_mode_cycle,s_mode_cycle
produces something close enough for cycles, well u_mode_cycle
is enough.
While for instructions the situation is a bit more annoying
perf record --group -e alu_inst,amo_inst,atomic_inst,fp_div_inst,fp_inst,fp_load_inst,fp_store_inst,load_inst,lr_inst,mult_inst,sc_inst,store_inst,unaligned_load_inst,unaligned_store_inst
is close to count all the scalar instructions, but trying to add vector_div_inst,vector_inst,vector_load_inst,vector_store_inst
somehow makes perf record stop collecting samples silently, adding just 3 more events works though, so I guess I can be happy with u_mode_cycle,alu_inst,atomic_inst,fp_inst,vector_inst
at least.
Top comments (0)