Day 2: Tue, 11 June
Research Computing at Brown
Today's tutorials will occur along three tracks running concurrently. Each tab below corresponds to one of these tracks. The tutorials associated with each track are listed on the relevant tab.
Slurm for Beginners | 9:30 - 11:00 EDT
A primer on submitting jobs to the job scheduler on Oscar. Some basic familiarity with Unix/Linux systems is assumed. Topics covered include: an overview of the use of Slurm for resource allocation, submitting jobs to Slurm, and using Bash scripts to configure and submit jobs to Slurm.
Advanced Slurm | 11:00 - 12:30 EDT
This workshop is for people who are already familiar with Slurm, but would like to use Slurm's more powerful features. Topics covered include: dependencies for conditional execution of jobs, job arrays for parameter sweeps, dealing with hundreds or thousands of small tasks, how to limit the number of jobs running at once, and how to cancel multiple jobs.
Checkpointing and DMTCP | 1:30 - 3:00 EDT
This workshop will introduce users to checkpointing in HPC workloads. Checkpointing allows users to periodically save the state of a distributed/serial computation to disk. This allows user to restart a job from a checkpoint file in case of a node/job failure. This workshop will include a hands-on demonstration on using DMTCP to checkpoint batch jobs, job-arrays, multithreaded programs, and MPI applications.
GPU Computing on Oscar | 3:00 - 4:30 EDT *Hybrid*
An introduction to GPUs, the architectures available on Oscar, submitting, running, and evaluating jobs. Using nvidia ngc containers on Oscar for deep learning, installing or building python packages, and using docker to make changes. Prerequisites: command line, basic bash, and some python experience.
Last updated