H100 NVL Tensor Core GPUs

Oscar has two DGX H100 nodes. H100 is based on the Nividia Hopper architecutre that accelerates the training of AI models. The two DGX nodes provides better performance when multiple GPUS are used, in particular with Nvidia software like NGC containers.

Multiple-Instance GPU (MIG) is not enabled on the DGX H100 nodes

Hardware Specifications

Each DGX H100 node has 112 Intel CPUs with 2TB memory, and 8 Nvidia H100 GPUs. Each H100 GPU has 80G memory.

Access

The two DGX H100 nodes are in the gpu-he partition. To access H100 GPUs, users need to submit jobs to the gpu-he partition and request the h100 feature, i.e.

#SBATCH --partition=gpu-he
#SBATCH --constraint=h100

Running NGC Containers

NGC containers provide the best performance from the DGX H100 nodes. Running tensorflow containers is an example for running NGC containers.

Running Oscar Modules

The two nodes have Intel CPUs. So Oscar modules can still be loaded and run on the two DGX nodes.

PreviousGrace Hopper GH200 GPUs NextAmpere Architecture GPUs

Last updated 11 months ago

Was this helpful?