H100 NVL Tensor Core GPUs
Oscar has two DGX H100 nodes. H100 is based on the Nividia Hopper architecutre that accelerates the training of AI models. The two DGX nodes provides better performance when multiple GPUS are used, in particular with Nvidia software like NGC containers.
Multiple-Instance GPU (MIG) is not enabled on the DGX H100 nodes
Hardware Specifications
Each DGX H100 node has 112 Intel CPUs with 2TB memory, and 8 Nvidia H100 GPUs. Each H100 GPU has 80G memory.
Access
The two DGX H100 nodes are in the gpu-he
partition. To access H100 GPUs, users need to submit jobs to the gpu-he partition and request the h100 feature, i.e.
Running NGC Containers
NGC containers provide the best performance from the DGX H100 nodes. Running tensorflow containers is an example for running NGC containers.
Running Oscar Modules
The two nodes have Intel CPUs. So Oscar modules can still be loaded and run on the two DGX nodes.
Last updated