Blue Gene/Q Hardware Counters
The Blue Gene/Q compute node's A2 processor contains 24 hardware counters per core. These hardware counters can be used to evaluate application performance during execution. The hardware counters can be read using BGPM or other performance evaluation tools.
The following is an excerpt from Bob Walkup's Application Performance Presentation at SC12.
24 counters per A2 core can be used, so just 6 counters per hardware thread when counting on all four hardware threads. Each counter is 64-bits wide.
Good default choice of A2 counters:
PEVT_LSU_COMMIT_CACHEABLE_LDS //load instructions PEVT_L1P_BAS_MISS //load missed L1P buffer PEVT_INST_XU_ALL //XU instructions int/ld/st/br PEVT_INST_QFPU_ALL //AXU = FPU instructions PEVT_INST_QFPU_FPGRP1 //weighted floating point ops
Use along with L2 counters:
PEVT_L2_HITS //L2 hits PEVT_L2_MISSES //L2 misses PEVT_L2_FETCH_LINE //128 byte lines loaded from memory PEVT_L2_STORE_LINE //128 byte lines stored to memory
The A2 counters are hardware thread specific. The L2 counters are shared across the node. These counters give instruction throughput, instruction mix, information about load misses at all levels of cache/memory, and the load/store traffic to memory. Other counters are needed to get more details.
The bgpm header is here
For C/C++ the include line is:
The bgpm library is here
For C/C++ the link line is:
mpicxx <objects> -L/bgsys/drivers/ppcfloor/bgpm/lib/ -lbgpm
- IBM Blue Gene/Q Application Development Redbook
- BGPM Event list