Job submission

Every job on CSI HPCC is submitted through SLURM. This page collects annotated templates for the common job shapes. Copy one, edit the #SBATCH directives, add your module loads, and submit with sbatch.

Three rules that apply to every job:

Start from /scratch/<username>, never from /global/u/<username> (your home).
Use SLURM syntax. Older PBS Pro scripts must be converted.
Never run jobs on the login (head) node. Any job found running there will be killed and the account may be suspended.

Anatomy of a SLURM script

#!/bin/bash
#SBATCH --job-name=my_job          # a short name that shows up in squeue
#SBATCH --nodes=1                  # how many nodes
#SBATCH --ntasks=1                 # how many MPI tasks total
#SBATCH --cpus-per-task=1          # CPU cores per task (>1 for threaded work)
#SBATCH --mem-per-cpu=4G           # RAM per core
#SBATCH --time=01:00:00            # wall-clock limit (HH:MM:SS)
#SBATCH --output=slurm-%j.out      # stdout file (%j = job ID)
#SBATCH --error=slurm-%j.err       # stderr file
#SBATCH --qos=<qos_name>           # your project's QOS
#SBATCH --partition=<part_name>    # your project's partition

module purge
module load <modules_you_need>

cd $SLURM_SUBMIT_DIR
srun ./your_program

Real jobs on HPCC typically need --qos and --partition matching your project (for example --qos=qoschem --partition=partchem). If you don’t know which values to use, ask your PI or the HPC Helpline. Those values are omitted from examples below so you can paste them in once.

Partitions and QOS

Most production jobs must name a partition (--partition) and the QOS value assigned to your project (--qos). The current HPCC Wiki lists these operational partitions:

Partition	Max cores/job	Max jobs/user	Max cores/group	Wall-clock limit	Tier	GPU types listed by HPCC
`partnsf`	128	50	256	240 h	Advanced	K20m, V100/16, A100/40
`partchem`	128	50	256	No limit	Condo	A100/80, A30
`partcfd`	96	50	96	No limit	Condo	A40
`partsym`	96	50	96	No limit	Condo	A30
`partasrc`	48	16	16	No limit	Condo	A30
`partmatlabD`	128	50	256	240 h	Advanced	V100/16, A100/40
`partmatlabN`	384	50	384	240 h	Advanced	None
`partphys`	96	50	96	No limit	Condo	L40

partdev is dedicated to development. The HPCC Wiki describes it as available to all HPCC users with a four-hour time limit on a 16-core node with 64 GB memory and 2 K20m GPUs.

Run sinfo -s to see which partitions are currently up, and sacctmgr show assoc user=$USER format=Account,Partition,QOS to confirm which partition and QOS values your account is allowed to use.

Submitting, watching, and cancelling

sbatch script.sh                    # submit (prints a job ID)
squeue -u $USER                     # your jobs in the queue
squeue -j <jobid>                   # one specific job
sacct -j <jobid> --format=JobID,State,Elapsed,MaxRSS
scancel <jobid>                     # cancel a job
scontrol show job <jobid>           # everything SLURM knows about it

Serial job (one core)

The simplest case: one process, one core.

serial.sh

#!/bin/bash
#SBATCH --job-name=serial_job
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=8G
#SBATCH --time=01:00:00
#SBATCH --qos=qoschem
#SBATCH --partition=partchem

module purge
module load <your_modules>

cd $SLURM_SUBMIT_DIR
srun ./my_serial_program

Multi-threaded (OpenMP)

One task, multiple cores on the same node. Set OMP_NUM_THREADS so your program actually uses the cores SLURM allocated.

openmp.sh

#!/bin/bash
#SBATCH --job-name=omp_job
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=4G
#SBATCH --time=01:00:00

module purge
module load <your_modules>       # must include an OpenMP-capable compiler runtime

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

cd $SLURM_SUBMIT_DIR
srun ./my_openmp_program         # built with -fopenmp (or compiler equivalent)

MPI (multiple nodes)

Distributed-memory parallelism across nodes.

mpi.sh

#!/bin/bash
#SBATCH --job-name=mpi_job
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=32
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=16G
#SBATCH --time=04:00:00

module purge
module load <compiler_module>
module load <mpi_module>

cd $SLURM_SUBMIT_DIR
srun ./my_mpi_program            # 64 ranks total: 32 × 2 nodes

Hybrid MPI + OpenMP

MPI between nodes, OpenMP threads within each rank.

hybrid.sh

#!/bin/bash
#SBATCH --job-name=hybrid_job
#SBATCH --nodes=2
#SBATCH --ntasks=24
#SBATCH --ntasks-per-node=12
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=16G
#SBATCH --time=04:00:00

module purge
module load <compiler_module>
module load <mpi_module>

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK

cd $SLURM_SUBMIT_DIR
srun ./my_hybrid_program         # 24 ranks × 2 OMP threads each

The prototype above allocates 12 ranks per node × 2 nodes = 24 MPI ranks, each spawning 2 OpenMP threads. Adjust --qos, --partition, and --mem-per-cpu for your project before submitting.

GPU job

Request GPUs with --gres=gpu:<count>. On Arrow, the HPCC Wiki lists GPU nodes ranging from 2 to 8 GPUs per node.

gpu.sh

#!/bin/bash
#SBATCH --job-name=gpu_job
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=16G
#SBATCH --gres=gpu:1
#SBATCH --time=02:00:00

module purge
module load <cuda_or_framework_module>

cd $SLURM_SUBMIT_DIR
srun ./my_gpu_program

GPU with a specific type

Several partitions host different NVIDIA GPU types. Use sinfo to inspect the constraints currently advertised by the scheduler, then constrain your job only when the workload requires a specific GPU.

sinfo -o "%P %G %f"

#SBATCH --gres=gpu:1 --constraint='gpu_sku:A100'

Job array (parameter sweep)

Run many copies of the same job, each with a different $SLURM_ARRAY_TASK_ID.

array.sh

#!/bin/bash
#SBATCH --job-name=array_job
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=16G
#SBATCH --time=01:00:00
#SBATCH --array=0-5
#SBATCH --output=slurm-%A_%a.out       # %A = array job ID, %a = task index
#SBATCH --error=slurm-%A_%a.err

module purge
module load <your_modules>

cd $SLURM_SUBMIT_DIR
echo "Array task ID: $SLURM_ARRAY_TASK_ID"
srun ./my_program --case "$SLURM_ARRAY_TASK_ID"

This submits 6 jobs (indices 0–5) sharing a single array job ID.

Interactive debugging

For quick, interactive access to a compute node (short sessions only; don’t hold nodes idle):

srun --pty --nodes=1 --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4G --time=00:30:00 bash

Load modules and run commands as if you were on a compute node. Exit the shell to release the allocation.

Troubleshooting cheatsheet

Symptom	First thing to check
Job sits `PENDING` indefinitely	Run `squeue -j <jobid> -o "%i %T %r"`; the reason column explains why (priority, resources, QOS limit, etc.).
Job fails immediately with “invalid partition / QOS”	Your `--qos` or `--partition` values are wrong for your project.
Job runs but crashes with no output	You launched from `/global/u`. Move to `/scratch/$USER` and resubmit.
`srun: error: Unable to create TCP connection`	Usually a transient node issue; resubmit, or check with the helpline if it repeats.
GPU allocated but program can’t see it	Add `nvidia-smi` to your script to confirm, and make sure you `module load`ed the matching CUDA runtime.

Still stuck? Open a ticket with the job ID, the command you ran, and the contents of the .out and .err files. The FAQ on the HPCC Wiki covers more edge cases.

Getting started

Systems

Running jobs

Reference

Anatomy of a SLURM script

Partitions and QOS

Submitting, watching, and cancelling

Serial job (one core)

Multi-threaded (OpenMP)

MPI (multiple nodes)

Hybrid MPI + OpenMP

GPU job

GPU with a specific type

Job array (parameter sweep)

Interactive debugging

Troubleshooting cheatsheet

Getting started

Systems

Running jobs

Reference

Documentation Index

​Anatomy of a SLURM script

​Partitions and QOS

​Submitting, watching, and cancelling

​Serial job (one core)

​Multi-threaded (OpenMP)

​MPI (multiple nodes)

​Hybrid MPI + OpenMP

​GPU job

​GPU with a specific type

​Job array (parameter sweep)

​Interactive debugging

​Troubleshooting cheatsheet

Anatomy of a SLURM script

Partitions and QOS

Submitting, watching, and cancelling

Serial job (one core)

Multi-threaded (OpenMP)

MPI (multiple nodes)

Hybrid MPI + OpenMP

GPU job

GPU with a specific type

Job array (parameter sweep)

Interactive debugging

Troubleshooting cheatsheet