Wednesday, 7 June

Research Computing at Brown

Today's tutorials will occur along three tracks running concurrently. Each tab below corresponds to one of these tracks. The tutorials associated with each track are listed on the relevant tab.

Slurm for Beginners | 9:30 - 11:00 EDT

A primer on submitting jobs to the job scheduler on Oscar. Some basic familiarity with Unix/Linux systems is assumed. Topics covered include: an overview of the use of Slurm for resource allocation, submitting jobs to Slurm, and using Bash scripts to configure and submit jobs to Slurm.

slides | repo | video

Advanced Slurm | 11:00 - 12:30 EDT

This workshop is for people who are already familiar with Slurm, but would like to use Slurm's more powerful features. Topics covered include: dependencies for conditional execution of jobs, job arrays for parameter sweeps, dealing with hundreds or thousands of small tasks, how to limit the number of jobs running at once, and how to cancel multiple jobs.

slides | repo | video

Checkpointing and DMTCP | 1:30 - 3:00 EDT

This workshop will introduce users to checkpointing in HPC workloads. Checkpointing allows users to periodically save the state of a distributed/serial computation to disk. This allows user to restart a job from a checkpoint file in case of a node/job failure. This workshop will include a hands-on demonstration on using DMTCP to checkpoint batch jobs, job-arrays, multithreaded programs, and MPI applications.

slides | repo | video

Computing with GPUs on Oscar | 3:00 - 4:30 EDT

A general Introduction to GPU architectures available on Oscar, using NGC container images to leverage RT cores on higher-end GPUs, and optimizing GPU jobs for better filesystem IO.

slides | video

Last updated