This page describes installing popular frameworks like TensorFlow, PyTorch & JAX, etc. on your Oscar account.
Preface: Oscar is a heterogeneous cluster meaning we have nodes with different architecture GPUs (Pascal, Volta, Turing, and Ampere). We recommend building the environment first time on Ampere GPUs with the latest CUDA11 modules so it's backward compatible with older architecture GPUs.
In this example, we will install PyTorch (refer to sub-pages for TensorFlow and Jax).
Step 1: Request an interactive session on a GPU node with Ampere architecture GPUs
interact -q gpu -g 1 -f ampere -m 20g -n 4
Here, -f = feature. We only need to build on Ampere once.
Step 2: Once your session has started on a compute node, run nvidia-smi to verify the GPU and then load the appropriate modules
Step 3: Create and activate the virtual environment, unload the pre-loaded modules then load cudnn and cuda dependencies
module purge
unset LD_LIBRARY_PATH
module load cudnn cuda