IBM Power 9

From CCI User Wiki
Jump to: navigation, search

This page is a guide for IBM Power 9 (POWER9) systems. It is divided into two sections:

  1. A guide for users at CCI wishing to run on these systems
  2. A guide for system administrators at other sites looking to deploy and run these systems

For CCI users

This system is experimental!

This system is experimental and may be unavailable at any time.

User may connect to bgrs01 to build and submit jobs via Slurm.

System information

Two nodes each housing:

  • Two IBM Power 9 processors clocked at 3 GHz. Each processor contains 20 cores with 4 hardware threads (160 logical processors per system).
  • Three NVIDIA Tesla V100 GPUs
  • 512 GB RAM

Nodes are connected with FDR Infiniband and connect to the unified File System.

Building software

Currently the following are available:

  • gcc 4.8.5 (system default)
  • xlC and xlf (xl or xl_r module, prefer xl_r)
  • MPICH 3.2.1 (mpich module, built with XL compiler)
  • CUDA 9.1 (cuda module, beta)

Note: When mixing CUDA and MPI, please make sure an xl module is loaded and nvcc is called with -ccbin $(CXX) otherwise linking will fail.

Submitting jobs

Jobs are submitted via Slurm to one of the following partitions: debug, test.

The debug partition is limited to single node jobs, running up to 30 minutes, and may only use a maximum of 128G of memory.

The test partition makes both nodes and all their resources available for up to 6 hours. This partition will typically not be available during business hours to ensure nodes are available for compiling/debugging. Batch jobs may be submitted to this partition to be run when it is available.

Note: When submitting MPI jobs via Slurm, users must specify --mpi=pmi2.

Note: When submitting CUDA jobs via Slurm, users must specify --gres=gpu:# to specify the number of GPUs desired per node.

Profiling

One method for profiling is reading the time base registers (mftb, mftbu). An example of this is found in the FFTW cycle header.

The time base for the Power 9 processor is 512000000.

Documentation

IBM Redbook: Section 6.1.1 of Implementing an IBM High-Performance Computing Solution on IBM Power System S822LC

IBM XL

See also

For system administrators

This is an effort to record all the various guides, quirks, and additional information for system administrators to install and run an IBM Power 9 system.

Setting up the BMC

Use the IBM provided guide to setup the BMC network address. The process uses "ipmitool" on the node but the system does not actually use IPMI for management functions. The system will not give any indication that the network configuration has changed until the final ipmitool raw command is executed (i.e. ipmitool lan print 1 will return the original information until the raw command is sent).

Red Hat Enterprise Linux (RHEL) 7 for Power 9 (ppc64le)

Installing

Pre-requisites
  1. Ensure your Red Hat login has an activiation code specifically for a Power 9. Previous Red Hat licenses for Power systems will not grant access to the downloads necessary for Power 9.
  2. Obtain the DVD ISO Product Variant "Red Hat Enterprise Linux for Power 9" ppc64le "Red Hat Enterprise Linux Alternate Architectures 7.4 Binary DVD". Note: The "Red Hat Enterprise Linux for Power, little endian" product variant is not the correct variant for a Power 9 system.
Option 1: Installing via USB

The IBM guide for installing Linux on Power 9 via USB is fairly complete. You can prep the USB device as follows (this will overwrite data on the USB drive):

dd if=rhel-alt-server-7.4-ppc64le-dvd.iso of=<usb device>

Note: You must use the "Red Hat Enterprise Linux for Power 9"/"Alternate Architectures" variant of RHEL 7 for this method or the system will freeze when loading the installer.

Option 2: Installing via xCAT

This assumes you already have a working xCAT install. It may be helpful to review the xCAT OpenPower documentation. Power 9 uses OpenBMC and the tables/keys are slightly different than those for IPMI-based systems.

Review the CORAL information in the xCAT documentation for known issues, additional steps, and quirks!

Version information

Versions
Initial 7.4 kernel 4.11.0-44.el7a.ppc64le
Latest booting kernel 4.11.0-44.2.1.el7a.ppc64le

NVIDIA CUDA

Current version

Linux POWER9 RHEL 7
Driver version 387.36 BETA
Driver release Date 2017.12.21
CUDA Toolkit 9.1

IBM Spectrum Scale (GPFS)

No known GA build for Power 9. Current 4.2.3 (4.2.3.6) GPL build fails:

/usr/lpp/mmfs/src/gpl-linux/tracelin.c: In function my_send_sig_info:
/usr/lpp/mmfs/src/gpl-linux/tracelin.c:571:6: error: implicit declaration of function send_sig_info [-Werror=implicit-function- declaration]
send_sig_info(mySig, sigData, tsP);

Installing

Review installation guide, particularly item specific to Power 9:

Firmware updates

Firmware updates can be performed remotely with openbmctool per IBM's instructions:

openbmctool -U <username> -P <password> -H <BMC IP address or BMC host name> firmware flash <bmc or pnor> -f xxx.tar

where bmc or pnor is the type of image you wish to flash to the system.

Version information

Versions
BMC Firmware OP9_v1.19_1.111

See also