Ampere Architecture GPUs

The new Ampere architecture GPUs on Oscar (A6000's and RTX 3090's)

The new Ampere architecture GPUs do not support older CUDA modules. Users must re-compile their applications with the newer CUDA/11 or older modules. Here are detailed instructions to compile major frameworks such as PyTorch, and TensorFlow.

PyTorch

Users can install PyTorch from a pip virtual environment or use pre-built singularity containers provided by Nvidia NGC.

To install via virtual environment:

# Make sure none of the LMOD modules are loaded
module purge 
module list

# create and activate the environment
python -m venv pytorch.venv
source pytorch.venv/bin/activate
pip install torch torchvision torchaudio

# test if it can detect GPUs 

To use NGC containers via Singularity :

  • Pull the image from NGC

singularity build pytorch:21.06-py3 docker://nvcr.io/nvidia/pytorch:21.06-py3
  • Export PATHs to mount the Oscar file system

export SINGULARITY_BINDPATH="/gpfs/home/$USER,/gpfs/scratch/$USER,/gpfs/data/"
  • To use the image interactively

singularity shell --nv pytorch\:21.06-py3
  • To submit batch jobs

#!/bin/bash

# Request a GPU partition node and access to 1 GPU
#SBATCH -p 3090-gcondo,gpu --gres=gpu:1

# Ensures all allocated cores are on the same node
#SBATCH -N 1

# Request 2 CPU cores
#SBATCH -n 2
#SBATCH --mem=40g
#SBATCH --time=10:00:00

#SBATCH -o %j.out

export SINGULARITY_BINDPATH="/gpfs/home/$USER,/gpfs/scratch/$USER,/gpfs/data/"
singularity --version

# Use environment from the singularity image
singularity exec --nv pytorch:21.06-py3 python pytorch-cifar100/train.py -net vgg16 -gpu

Last updated