Slurm is a resource manager developed at Lawrence Livermore National Laboratory and developed primarily by Moe Jette and Danny Auble of SchedMD.
- squeue lists your jobs in the queue
- sinfo lists the state of all machines in the cluster
- sbatch submits batch jobs (use srun for an interactive job on the blades or Blue Gene)
- sprio lists the relative priorities of pending jobs in the queue and how they are calculated
- sacct display accounting and submission data for jobs
See the individual system pages (List of Available Systems).
Please see the FAQ in the official Slurm documentation: http://www.schedmd.com/slurmdocs/faq.html
Options of interest (see the manual page for sbatch for a complete list):
-n, --ntasks=ntasks number of tasks to run -N, --nodes=N number of nodes on which to run (N = min[-max]) -c, --cpus-per-task=ncpus number of cpus required per task --ntasks-per-node=n number of tasks to invoke on each node -i, --input=in file for batch script's standard input -o, --output=out file for batch script's standard output -e, --error=err file for batch script's standard error -p, --partition=partition partition requested -t, --time=minutes time limit -D, --chdir=path change remote current working directory -D, --workdir=directory set working directory for batch script --mail-type=type notify on state change: BEGIN, END, FAIL or ALL --mail-user=user who to send email notification for job state changes
Note that any of the above can be specified in a batch file by preceeding the option with #SBATCH, for example the following will send the job's output to a file called joboutput.<the job's ID>
#SBATCH -o joboutput.%J
- Batch job file preparation (for parallel jobs): you would launch an OpenMPI job as follows:
#!/bin/bash -x srun hostname -s > /tmp//hosts.$SLURM_JOB_ID if [ "x$SLURM_NPROCS" = "x" ] then if [ "x$SLURM_NTASKS_PER_NODE" = "x" ] then SLURM_NTASKS_PER_NODE=1 fi SLURM_NPROCS=`expr $SLURM_JOB_NUM_NODES \* $SLURM_NTASKS_PER_NODE` fi mpirun -hostfile /tmp/hosts.$SLURM_JOB_ID -np $SLURM_NPROCS ./a.out rm /tmp/hosts.$SLURM_JOB_ID
See Modules for the run command with other MPI implementations.
For example, the above jobs could be submitted to run on 16 processors in the opterons group of machines including only those with the oss label, with the current working directory set to /foo/bar, email notification of the job's state turned on, a time limit of four hours (240 minutes), and STDOUT redirected to /foo/bar/baz.out as follows (where script.sh is the script):
sbatch -p opterons -C oss -n 16 --mail-type=ALL --firstname.lastname@example.org -t 240 -D /foo/bar -o /foo/bar/baz.out ./script.sh
- Interactive jobs are now supported. See the srun command manual page for details. Remember to always specify a partition (-p). Here is a usage example launching xterm on the compute node allocated to an interactive session:
salloc -p opterons xterm -e 'ssh -X `srun hostname`'
an alternative method:
salloc -p opterons ssh -X `srun -s hostname`
slurm-account-usage tool queries the Slurm database to report project usage for a given system. Running the tool without any arguments will output the number of allocations granted (via sbatch, salloc, or an interactive srun) and the total number of core-hours used by the invoking user's project (i.e. all allocations and cpu-hours by all members of the project). It can be supplied with optional start and end dates to narrow the result.
srun commands run from within an allocation are not counted towards the number of records the tool reports.
For customers/partners with many projects, a user can be designated to view information from other projects under the umbrella organization. For more information please contact support.
Login to the front-end node of the system you which to retrieve usage for (rsa, drpfen01, amos, etc...) and run the following command:
slurm-account-usage [START_DATE [END_DATE]]
The optional START_DATE and END_DATE define the inclusive period to retrieve usage from. The date for each field should be specified as