The Oscar GPUs are in a separate partition to the regular compute nodes. The partition is called gpu. To see how many jobs are running and pending in the gpu partition, use
To start an session on a GPU node, use the interact command and specify the gpu partition. You also need to specify the requested number of GPUs using the -g option:
interact -q gpu -g 1
Here is an example batch script for a cuda job that uses 1 gpu and 1 cpu for 5 minutes
# Request a GPU partition node and access to 1 GPU
#SBATCH -p gpu --gres=gpu:1
# Request 1 CPU core
#SBATCH -n 1
#SBATCH -t 00:05:00
# Load a CUDA module
module load cuda
# Run program
To submit this script:
DGX GPU Nodes in the GPU-HE Partition
All the nodes in the gpu-he partition have V100 GPUs. However, two of them are DGX nodes (gpu1404/1405) which have 8 GPUs. When a gpu-he job requests for more than 4 GPUs, the job will automatically be allocated to the DGX nodes.
The other non-DGX nodes actually have a better NVLink interconnect topology as all of them have direct links to the other. So the non-DGX nodes are better for a gpu-he job if the job does not require more than 4 GPUs.