From CCI User Wiki
Jump to: navigation, search

Slurm is a resource manager developed at Lawrence Livermore National Laboratory and developed primarily by Moe Jette and Danny Auble of SchedMD.

Quick Reference


  • squeue lists your jobs in the queue
  • sinfo lists the state of all machines in the cluster
  • sbatch submits batch jobs (use srun for an interactive job on the blades or Blue Gene)
  • sprio lists the relative priorities of pending jobs in the queue and how they are calculated
  • sacct display accounting and submission data for jobs
  • scancel is used to cancel jobs


See the individual system pages (List of Available Systems).


Please see the FAQ in the official Slurm documentation:

Resource specification

Options of interest (see the manual page for sbatch for a complete list):

 -n, --ntasks=ntasks         number of tasks to run
 -N, --nodes=N               number of nodes on which to run (N = min[-max])
 -c, --cpus-per-task=ncpus   number of cpus required per task
     --ntasks-per-node=n     number of tasks to invoke on each node
 -i, --input=in              file for batch script's standard input
 -o, --output=out            file for batch script's standard output
 -e, --error=err             file for batch script's standard error
 -p, --partition=partition   partition requested
 -t, --time=minutes          time limit
 -D, --chdir=path            change remote current working directory
 -D, --workdir=directory     set working directory for batch script
     --mail-type=type        notify on state change: BEGIN, END, FAIL or ALL
     --mail-user=user        who to send email notification for job state changes

Note that any of the above can be specified in a batch file by preceeding the option with #SBATCH. All options defined this way must appear first in the batch file with nothing separating them. For example, the following will send the job's output to a file called joboutput.<the job's ID>:

#SBATCH -o joboutput.%J

Example job submission scripts

See also: Modules for any additional options/requirements of specific MPI implementations. Typically, it is necessary to load the same modules at runtime (before calling srun) that were used when building a binary.

Simple (non-MPI)

A simple (non-MPI) job can be started by just calling srun:

#!/bin/bash -x
srun ./a.out

For example, the above jobs could be submitted to run 16 tasks on 1 nodes, in the partition "cluster", with the current working directory set to /foo/bar, email notification of the job's state turned on, a time limit of four hours (240 minutes), and STDOUT redirected to /foo/bar/baz.out as follows (where is the script):

sbatch -p cluster -N 1 -n 16 --mail-type=ALL -t 240 -D /foo/bar -o /foo/bar/baz.out ./

Note: In a simple, non-MPI case, running multiple tasks will create multiple instances of the same binary.


Interactive jobs are supported. See the srun command manual page for details. Remember to always specify a partition (-p). Here is a usage example launching xterm on the compute node allocated to an interactive session:

salloc -p cluster xterm -e 'ssh -X `srun hostname`'

Or by an alternative method:

salloc -p opterons
ssh -X `srun -s hostname`


Example job script

 #!/bin/bash -x 
 module load <compilerModuleName> mpich
 srun ./a.out


Example job script

 #!/bin/bash -x 
 module load mvapich2
 srun ./a.out

Note, users with applications needing MPI_THREAD_MULTIPLE support must set the environment variable MV2_ENABLE_AFFINITY to 0 before running.



Example job batch script

#!/bin/bash -x
module load openmpi
srun ./a.out

IBM Spectrum MPI or Mellanox HPC-X

These implementations do not have direct Slurm support and it is necessary to use mpirun. You must have passwordless SSH keys setup for mpirun to work. If mpirun outputs ORTE was unable to reliably start one or more daemons. then you need to setup SSH keys.

Example job batch script

#!/bin/bash -x

if [ "x$SLURM_NPROCS" = "x" ] 
  if [ "x$SLURM_NTASKS_PER_NODE" = "x" ] 
  if [ "x$SLURM_NTASKS_PER_NODE" = "x" ]

srun hostname -s | sort -u > /tmp/hosts.$SLURM_JOB_ID
awk "{ print \$0 \"-ib slots=$SLURM_NTASKS_PER_NODE\"; }" /tmp/hosts.$SLURM_JOB_ID >/tmp/tmp.$SLURM_JOB_ID
mv /tmp/tmp.$SLURM_JOB_ID /tmp/hosts.$SLURM_JOB_ID

module load spectrum-mpi
mpirun --bind-to core --report-bindings -hostfile /tmp/hosts.$SLURM_JOB_ID -np $SLURM_NPROCS /path/to/your/executable

rm /tmp/hosts.$SLURM_JOB_ID

To enable GPU-Direct ('CUDA aware MPI') pass the -gpu flag to mpirun.

Job arrays/Many small jobs

For many small jobs running simultaneously or in quick succession, it is often better to submit one large job rather than many small jobs. On some systems, doing otherwise leads to resource fragmentation and poor scheduler performance. Example:

#SBATCH --job-name=TESTING
#SBATCH -t 04:00:00
#SBATCH -D /gpfs/u/<home or barn or scratch>/<project>/<user>
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<email>

srun -N8 -o testing.log ./my-executable <options> &
srun -N8 -o testing2.log ./my-executable <options> &
srun -N8 -o testing3.log ./my-executable <options> &
srun -N8 -o testing4.log ./my-executable <options> &

The differences are the addition of an ampersand (&) at then end of each srun command and the wait command at the end. This will run all 4 jobs in parallel within the allocation and wait until all 4 are complete. For this example, the batch script should be run as sbatch -N32 <script> to ensure enough nodes are allocated for all the jobs that will run in parallel. This can also be done with individual tasks to fill a minimal number of nodes (replace -N with -n in sbatch/srun calls).

On a cluster that uses consumable resources, such as the ERP cluster, it is important to also specify a subset of the resources for each srun command. Otherwise, the first srun command will use all resources assigned to the job and the next srun command will wait until they are released, printing the warning "Job step creation temporarily disabled".

By default -c, --cpus-per-task=1 and can be left out if tasks only require one core. However, more complex job arrays where some processes require fewer/greater CPUs for each srun command will need the option supplied.

Example, submitted with sbatch -n4 --mem=16G --cpu:

#SBATCH --job-name=TESTING
#SBATCH -t 04:00:00
#SBATCH -D /gpfs/u/<home or barn or scratch>/<project>/<user>
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<email>

srun -n1 --mem=4G -o testing.log ./my-executable <options> &
srun -n1 --mem=4G -o testing2.log ./my-executable <options> &
srun -n1 --mem=4G -o testing3.log ./my-executable <options> &
srun -n1 --mem=4G -o testing4.log ./my-executable <options> &


Multi-node Matlab scripts will require unique configuration. Single-node script (multi-threaded) may use a script like the following. Please contact support for more information.

module load matlab
srun matlab -nodisplay -nosplash -nodesktop -nojvm -r example
# or
matlab -nodisplay -nosplash -nodesktop -nojvm < example.m

CCI customizations


The slurm-account-usage tool queries the Slurm database to report project usage for a given system. Running the tool without any arguments will output the number of allocations granted (via sbatch, salloc, or an interactive srun) and the total number of core-hours used by the invoking user's project (i.e. all allocations and cpu-hours by all members of the project). It can be supplied with optional start and end dates to narrow the result.

Note, srun commands run from within an allocation are not counted towards the number of records the tool reports.

For customers/partners with many projects, a user can be designated to view information from other projects under the umbrella organization. For more information please contact support.


Login to the front-end node of the system you which to retrieve usage for (bgrs01, drpfen01, amos, etc...) and run the following command:

 slurm-account-usage [START_DATE [END_DATE]]

The optional START_DATE and END_DATE define the inclusive period to retrieve usage from. The date for each field should be specified as YYYY-MM-DD.