Slurm

From CCI User Wiki
Jump to: navigation, search

Slurm is a resource manager developed at Lawrence Livermore National Laboratory and developed primarily by Moe Jette and Danny Auble of SchedMD.

Quick Reference

Commands

  • squeue lists your jobs in the queue
  • sinfo lists the state of all machines in the cluster
  • sbatch submits batch jobs (use srun for an interactive job on the blades or Blue Gene)
  • sprio lists the relative priorities of pending jobs in the queue and how they are calculated
  • sacct display accounting and submission data for jobs

Queues

See the individual system pages (List of Available Systems).

FAQ

Please see the FAQ in the official Slurm documentation: http://www.schedmd.com/slurmdocs/faq.html

Resource specification

Options of interest (see the manual page for sbatch for a complete list):

 -n, --ntasks=ntasks         number of tasks to run
 -N, --nodes=N               number of nodes on which to run (N = min[-max])
 -c, --cpus-per-task=ncpus   number of cpus required per task
     --ntasks-per-node=n     number of tasks to invoke on each node
 -i, --input=in              file for batch script's standard input
 -o, --output=out            file for batch script's standard output
 -e, --error=err             file for batch script's standard error
 -p, --partition=partition   partition requested
 -t, --time=minutes          time limit
 -D, --chdir=path            change remote current working directory
 -D, --workdir=directory     set working directory for batch script
     --mail-type=type        notify on state change: BEGIN, END, FAIL or ALL
     --mail-user=user        who to send email notification for job state changes

Note that any of the above can be specified in a batch file by preceeding the option with #SBATCH. All options defined this way must appear first in the batch file with nothing separating them. For example, the following will send the job's output to a file called joboutput.<the job's ID>:

#SBATCH -o joboutput.%J

Example job submission scripts

See also: Modules for any additional options/requirements of specific MPI implementations. Typically, it is necessary to load the same modules at runtime (before calling srun) that were used when building a binary.

Simple (non-MPI)

A simple (non-MPI) job can be started by just calling srun:

#!/bin/bash -x
srun ./a.out

For example, the above jobs could be submitted to run 16 tasks on 1 nodes, in the partition "cluster", with the current working directory set to /foo/bar, email notification of the job's state turned on, a time limit of four hours (240 minutes), and STDOUT redirected to /foo/bar/baz.out as follows (where script.sh is the script):

sbatch -p cluster -N 1 -n 16 --mail-type=ALL --mail-user=example@rpi.edu -t 240 -D /foo/bar -o /foo/bar/baz.out ./script.sh

Note: In a simple, non-MPI case, running multiple tasks will create multiple instances of the same binary.

Interactive

Interactive jobs are supported. See the srun command manual page for details. Remember to always specify a partition (-p). Here is a usage example launching xterm on the compute node allocated to an interactive session:

salloc -p cluster xterm -e 'ssh -X `srun hostname`'

Or by an alternative method:

salloc -p opterons
ssh -X `srun -s hostname`

MVAPICH2

Example job script slurmMvapich2.sh:

 #!/bin/bash -x 
 module load mvapich2
 srun ./a.out

Note, users with applications needing MPI_THREAD_MULTIPLE support must set the environment variable MV2_ENABLE_AFFINITY to 0 before running.

 export MV2_ENABLE_AFFINITY=0

OpenMPI

Example job batch script slurmOpenMpi.sh:

#!/bin/bash -x
module load openmpi
srun ./a.out

Job arrays/Many small jobs

For many small jobs running simultaneously, it is better to submit one large job rather than many small jobs. Doing otherwise leads to resource fragmentation and poor scheduler performance. Example:

#!/bin/sh
#SBATCH --job-name=TESTING
#SBATCH -t 04:00:00
#SBATCH -D /gpfs/u/<home or barn or scratch>/<project>/<user>
#
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<email>

srun -N8 -o testing.log ./my-executable <options> &
srun -N8 -o testing2.log ./my-executable <options> &
srun -N8 -o testing3.log ./my-executable <options> &
srun -N8 -o testing4.log ./my-executable <options> &
wait

The differences are the addition of an ampersand (&) at then end of each srun command and the wait command at the end. This will run all 4 jobs in parallel within the allocation and wait until all 4 are complete. For this example, the batch script should be run as sbatch -N32 <script> to ensure enough nodes are allocated for all the jobs that will run in parallel. This can also be done with individual tasks to fill a minimal number of nodes (replace -N with -n in sbatch/srun calls).

Matlab

Multi-node Matlab scripts will require unique configuration. Single-node script (multi-threaded) may use a script like the following. Please contact support for more information.

#!/bin/bash
module load matlab
srun matlab -nodisplay -nosplash -nodesktop -nojvm -r example
# or
matlab -nodisplay -nosplash -nodesktop -nojvm < example.m

CCI customizations

slurm-account-usage

The slurm-account-usage tool queries the Slurm database to report project usage for a given system. Running the tool without any arguments will output the number of allocations granted (via sbatch, salloc, or an interactive srun) and the total number of core-hours used by the invoking user's project (i.e. all allocations and cpu-hours by all members of the project). It can be supplied with optional start and end dates to narrow the result.

Note, srun commands run from within an allocation are not counted towards the number of records the tool reports.

For customers/partners with many projects, a user can be designated to view information from other projects under the umbrella organization. For more information please contact support.

Usage

Login to the front-end node of the system you which to retrieve usage for (bgrs01, drpfen01, amos, etc...) and run the following command:

 slurm-account-usage [START_DATE [END_DATE]]

The optional START_DATE and END_DATE define the inclusive period to retrieve usage from. The date for each field should be specified as YYYY-MM-DD.