1 of 4

GPUs on Oscar

To view the various GPUs available on Oscar, use the command

nodes gpu

Interactive Use

To start an interactive session on a GPU node, use the interact command and specify the gpu partition. You also need to specify the requested number of GPUs using the -g option:

$ interact -q gpu -g 1

To start an interactive session on a particular GPU type (QuadroRTX, 1080ti, p100 etc) use the feature -f option:

interact -q gpu -f quadrortx

GPU Batch Job

For production runs, please submit a batch job to the gpu partition. E.g. for using 1 GPU:

$ sbatch -p gpu --gres=gpu:1 <jobscript>

This can also be mentioned inside the batch script:

#SBATCH -p gpu --gres=gpu:1

You can view the status of the gpu partition with:

$ allq gpu

Sample batch script for CUDA program:

~/batch_scripts/cuda.sh

Getting started with GPUs

While you can program GPUs directly with CUDA, a language and runtime library from NVIDIA, this can be daunting for programmers who do not have experience with C or with the details of computer architecture.

You may find the easiest way to tap the computation power of GPUs is to link your existing CPU program against numerical libraries that target the GPU:

CUBLAS is a drop-in replacement for BLAS libraries that runs BLAS routines on the GPU instead of the CPU.
CULA is a similar library for LAPACK routines.
CUFFT, CUSPARSE, and CURAND provide FFT, sparse matrix, and random number generation routines that run on the GPU.
MAGMA combines custom GPU kernels, CUBLAS, and a CPU BLAS library to use both the GPU and CPU to simultaneously use both the GPU and CPU; it is available in the 'magma' module on Oscar.
Matlab has a GPUArray feature, available through the Parallel Computing Toolkit, for creating arrays on the GPU and operating on them with many built-in Matlab functions. The PCT toolkit is licensed by CIS and is available to any Matlab session running on Oscar or workstations on the Brown campus network.
PyCUDA is an interface to CUDA from Python. It also has a GPUArray feature and is available in the cuda module on Oscar.

OpenACC

OpenACC is a portable, directive-based parallel programming construct. You can parallelize loops and code segments simply by inserting directives - which are ignored as comments if OpenACC is not enabled while compiling. It works on CPUs as well as GPUs. We have the PGI compiler suite installed on Oscar which has support for compiling OpenACC directives. To get you started with OpenACC:

MATLAB

GPU Programming in Matlab

NVLink Enabled GPU Nodes

NVLink enables GPUs to pool memory over high speed links (25 G/s). This will increase performance of your application code.

Nodes gpu[1210,1211,1212]have 4 fully connected NVLink (SXM2) V100 GPUs.

To submit interactive job to NVLink Enabled GPU nodes:

interact -q gpu -f v100

To submit batch job(s) add following line to your batch script.

#SBATCH --constraint=v100

Grace Hopper GH200 GPUs

Oscar has two Grace Hopper GH200 GPU nodes. Each node combines Nvidia Grace Arm CPU and Hopper GPU architecture.

Hardware Specifications

Each GH200 node has 72 Arm cores with 550G memory. Multiple-Install GPU (MIG) is enabled on only one GH200 node that has 4 MIGs. The other GH200 node doesn't have MIGs and only one GPU. Both CPU and GPU threads on GH200 nodes can now concurrently and transparently access both CPU and GPU memory.

Access

The two GH200 nodes are in the gracehopper partition.

gk-condo Account

A gk-condo user can submit jobs to the GH200 nodes with their gk-gh200-gcondo account, i.e.,

#SBATCH --account=gk-gh200-gcondo
#SBATCH --partition=gracehopper

CCV Account

For users who are not a gk-condo user, a High End GPU priority account is required for accessing the gracehopper partition and GH200 nodes. All users with access to the GH200 nodes need to submit jobs to the nodes with the ccv-gh200-gcondo account, i.e.

#SBATCH --account=ccv-gh200-gcondo
#SBATCH --partition=gracehopper

MIG Access

To request a MIG, the feature mig needs be specified, i.e.

#SBATCH --constraint=mig

Running NGC Containers

NGC containers provide the best performance from the GH200 nodes. Running tensorflow containers is an example for running NGC containers.

A NGC container must be built on a GH200 node for the container to run on GH200 nodes

Running Modules

The two nodes have Arm CPUs. So Oscar modules do not run on the two GH200 nodes. Please contact support@ccv.brown.edu about installing and running modules on GH200 nodes.

H100 NVL Tensor Core GPUs

Oscar has two DGX H100 nodes. H100 is based on the Nividia Hopper architecutre that accelerates the training of AI models. The two DGX nodes provides better performance when multiple GPUS are used, in particular with Nvidia software like NGC containers.

Multiple-Instance GPU (MIG) is not enabled on the DGX H100 nodes

Hardware Specifications

Each DGX H100 node has 112 Intel CPUs with 2TB memory, and 8 Nvidia H100 GPUs. Each H100 GPU has 80G memory.

Access

The two DGX H100 nodes are in the gpu-he partition. To access H100 GPUs, users need to submit jobs to the gpu-he partition and request the h100 feature, i.e.

#SBATCH --partition=gpu-he
#SBATCH --constraint=h100

Running NGC Containers

NGC containers provide the best performance from the DGX H100 nodes. Running tensorflow containers is an example for running NGC containers.

Running Oscar Modules

The two nodes have Intel CPUs. So Oscar modules can still be loaded and run on the two DGX nodes.

Ampere Architecture GPUs

The new Ampere architecture GPUs on Oscar (A6000's and RTX 3090's)

The new Ampere architecture GPUs do not support older CUDA modules. Users must re-compile their applications with the newer CUDA/11 or older modules. Here are detailed instructions to compile major frameworks such as PyTorch, and TensorFlow.

PyTorch

Users can install PyTorch from a pip virtual environment or use pre-built singularity containers provided by Nvidia NGC.

To install via virtual environment:

To use NGC containers via Singularity :

Pull the image from NGC

Export PATHs to mount the Oscar file system

To use the image interactively

To submit batch jobs

Grace Hopper GH200 GPUs

Oscar has two Grace Hopper GH200 GPU nodes. Each node combines Nvidia Grace Arm CPU and Hopper GPU architecture.

Hardware Specifications

Access

The two GH200 nodes are in the gracehopper partition.

gk-condo Account

A gk-condo user can submit jobs to the GH200 nodes with their gk-gh200-gcondo account, i.e.,

#SBATCH --account=gk-gh200-gcondo
#SBATCH --partition=gracehopper

CCV Account

#SBATCH --account=ccv-gh200-gcondo
#SBATCH --partition=gracehopper

MIG Access

To request a MIG, the feature mig needs be specified, i.e.

#SBATCH --constraint=mig

Running NGC Containers

NGC containers provide the best performance from the GH200 nodes. Running tensorflow containers is an example for running NGC containers.

A NGC container must be built on a GH200 node for the container to run on GH200 nodes

Running Modules

The two nodes have Arm CPUs. So Oscar modules do not run on the two GH200 nodes. Please contact support@ccv.brown.edu about installing and running modules on GH200 nodes.

GPUs on Oscar

To view the various GPUs available on Oscar, use the command

nodes gpu

Interactive Use

To start an interactive session on a GPU node, use the interact command and specify the gpu partition. You also need to specify the requested number of GPUs using the -g option:

$ interact -q gpu -g 1

To start an interactive session on a particular GPU type (QuadroRTX, 1080ti, p100 etc) use the feature -f option:

interact -q gpu -f quadrortx

GPU Batch Job

For production runs, please submit a batch job to the gpu partition. E.g. for using 1 GPU:

$ sbatch -p gpu --gres=gpu:1 <jobscript>

This can also be mentioned inside the batch script:

#SBATCH -p gpu --gres=gpu:1

You can view the status of the gpu partition with:

$ allq gpu

Sample batch script for CUDA program:

~/batch_scripts/cuda.sh

Getting started with GPUs

You may find the easiest way to tap the computation power of GPUs is to link your existing CPU program against numerical libraries that target the GPU:

CUBLAS is a drop-in replacement for BLAS libraries that runs BLAS routines on the GPU instead of the CPU.
CULA is a similar library for LAPACK routines.
CUFFT, CUSPARSE, and CURAND provide FFT, sparse matrix, and random number generation routines that run on the GPU.
MAGMA combines custom GPU kernels, CUBLAS, and a CPU BLAS library to use both the GPU and CPU to simultaneously use both the GPU and CPU; it is available in the 'magma' module on Oscar.
Matlab has a GPUArray feature, available through the Parallel Computing Toolkit, for creating arrays on the GPU and operating on them with many built-in Matlab functions. The PCT toolkit is licensed by CIS and is available to any Matlab session running on Oscar or workstations on the Brown campus network.
PyCUDA is an interface to CUDA from Python. It also has a GPUArray feature and is available in the cuda module on Oscar.

OpenACC

MATLAB

GPU Programming in Matlab

NVLink Enabled GPU Nodes

NVLink enables GPUs to pool memory over high speed links (25 G/s). This will increase performance of your application code.

Nodes gpu[1210,1211,1212]have 4 fully connected NVLink (SXM2) V100 GPUs.

To submit interactive job to NVLink Enabled GPU nodes:

interact -q gpu -f v100

To submit batch job(s) add following line to your batch script.

#SBATCH --constraint=v100