DRP Cluster

From CCI User Wiki
Jump to: navigation, search

Specifications

The cluster consists of 64 nodes connected via 56Gb FDR Infiniband. Each node has two eight-core 2.6 GHz Intel Xeon E5-2650 processors and 128GB of system memory.

Accessing the System

Note: Not all projects have access to the cluster. Job submissions to Slurm may be rejected even if access to the front-end node is authorized.

Running on the cluster first requires connecting to one of its front end nodes drpfen01 or drpfen02. These machines are accessible from the landing pads.

HyperThreading

By default Slurm will assign 32 processes to each node. The 2x factor is the result of hyperthreading being enabled. Some applications may benefit from hyperthreading, others will not. Initial testing with Fluent indicates that running one process per physical core yields the best performance.

Passing the '--bind-to-core' option to OpenMPI will specify process affinity to cores, and along with Slurm options '-N', to specify the number of nodes, and '-n', to specify the number of processes, the physical cores will each run a single process. For example passing '-N 2 -n 32' to Slurm and '--bind-to-core' to mpirun will result in 32 processes running on 32 cores on two nodes.

Alternatively, passing '-c 2' to srun will assign two cores per process and will prevent execution of more than 16 processes per node.

Building Executables

MVAPICH2 and OpenMPI compiler wrappers are available via the 'mpi' modules. Please refer to Modules for use of modules and their interactions with Slurm.

Software/Libraries

Compilers+MPI

Supported GCC
openmpi 4.7.4[1] 4.8.5[1] 4.9.4[1] 5.4.0 6.2.0 6.3.0 6.4.0 7.1.0 7.2.0
1.8.8[1] Yes[2] Yes[2] Yes[2] Yes[2] Yes[2]
1.10.6[1] Yes[2] Yes[2] Yes[2] Yes[2] Yes[2] Yes[2]
2.0.2 Yes[2] Yes[2] Yes[2] Yes Yes Yes
2.0.3
2.1.0 Yes[2] Yes[2]
2.1.1 Yes Yes Yes
3.0.0

Submitting and Managing Jobs

Partitions

Name Time Limit (hr) Max Nodes
debug 1 2
drp 6 unlimited

Example job submission scripts

Please see Slurm for more info.

MVAPICH2

Example job script slurmDrpMvapich2.sh:

 #!/bin/bash -x 
 module load mvapich2
 srun ./a.out

Note, users with applications needing MPI_THREAD_MULTIPLE support must set the environment variable MV2_ENABLE_AFFINITY to 0 before running.

 export MV2_ENABLE_AFFINITY=0

OpenMPI

Note, pmi2 is not a typo. It is the "Process Management Interface".

Using sbatch/srun

(All OpenMPI modules now specify PMI2 support in the environment. Providing the --mpi=pmi2 option is no longer necessary.)

Example job batch script slurmDrpOpenMpi.sh:

#!/bin/bash -x
module load openmpi
srun ./a.out
Using mpirun

The srun/pmi2 method should be preferred. Please contact support if you find using srun does not work for your application.

Example job script slurmDrpOpenMpi.sh:

#!/bin/bash -x 
srun hostname -s > /tmp/hosts.$SLURM_JOB_ID
if [ "x$SLURM_NPROCS" = "x" ] 
then
  if [ "x$SLURM_NTASKS_PER_NODE" = "x" ] 
  then
    SLURM_NTASKS_PER_NODE=1
  fi
  SLURM_NPROCS=`expr $SLURM_JOB_NUM_NODES \* $SLURM_NTASKS_PER_NODE`
fi
mpirun -hostfile /tmp/hosts.$SLURM_JOB_ID -np $SLURM_NPROCS ./a.out
rm /tmp/hosts.$SLURM_JOB_ID

Matlab

Multi-node Matlab scripts will require unique configuration. Single-node script (multi-threaded) may use a script like the following. Please contact support for more information.

#!/bin/bash
module load matlab
srun matlab -nodisplay -nosplash -nodesktop -nojvm -r example
# or
matlab -nodisplay -nosplash -nodesktop -nojvm < example.m

Notes

  1. This version is no longer maintained upstream.
  2. This compiler+MPI combination is not recommended for new work.