Blue Gene/Q Hardware Counters

From CCI User Wiki
Jump to: navigation, search

The Blue Gene/Q compute node's A2 processor contains 24 hardware counters per core. These hardware counters can be used to evaluate application performance during execution. The hardware counters can be read using BGPM or other performance evaluation tools.


The following is an excerpt from Bob Walkup's Application Performance Presentation at SC12.

24 counters per A2 core can be used, so just 6 counters per hardware thread when counting on all four hardware threads. Each counter is 64-bits wide.

Good default choice of A2 counters:

 PEVT_LSU_COMMIT_CACHEABLE_LDS   //load instructions
 PEVT_L1P_BAS_MISS               //load missed L1P buffer
 PEVT_INST_XU_ALL                //XU instructions int/ld/st/br
 PEVT_INST_QFPU_ALL              //AXU = FPU instructions
 PEVT_INST_QFPU_FPGRP1           //weighted floating point ops

Use along with L2 counters:

 PEVT_L2_HITS        //L2 hits
 PEVT_L2_MISSES      //L2 misses
 PEVT_L2_FETCH_LINE  //128 byte lines loaded from memory
 PEVT_L2_STORE_LINE  //128 byte lines stored to memory

The A2 counters are hardware thread specific. The L2 counters are shared across the node. These counters give instruction throughput, instruction mix, information about load misses at all levels of cache/memory, and the load/store traffic to memory. Other counters are needed to get more details.


The bgpm header is here


For C/C++ the include line is:

 #include "bgpm/include/bgpm.h"

The bgpm library is here


For C/C++ the link line is:

 mpicxx <objects> -L/bgsys/drivers/ppcfloor/bgpm/lib/ -lbgpm 

Useful Links