Wednesday, 7 June
Research Computing at Brown
Last updated
Research Computing at Brown
Last updated
Today's tutorials will occur along three tracks running concurrently. Each tab below corresponds to one of these tracks. The tutorials associated with each track are listed on the relevant tab.
A primer on submitting jobs to the job scheduler on Oscar. Some basic familiarity with Unix/Linux systems is assumed. Topics covered include: an overview of the use of Slurm for resource allocation, submitting jobs to Slurm, and using Bash scripts to configure and submit jobs to Slurm.
| |
This workshop is for people who are already familiar with Slurm, but would like to use Slurm's more powerful features. Topics covered include: dependencies for conditional execution of jobs, job arrays for parameter sweeps, dealing with hundreds or thousands of small tasks, how to limit the number of jobs running at once, and how to cancel multiple jobs.
| |
This workshop will introduce users to checkpointing in HPC workloads. Checkpointing allows users to periodically save the state of a distributed/serial computation to disk. This allows user to restart a job from a checkpoint file in case of a node/job failure. This workshop will include a hands-on demonstration on using DMTCP to checkpoint batch jobs, job-arrays, multithreaded programs, and MPI applications.
| |
A general Introduction to GPU architectures available on Oscar, using NGC container images to leverage RT cores on higher-end GPUs, and optimizing GPU jobs for better filesystem IO.
|
|