arrow-left

All pages
gitbookPowered by GitBook
1 of 2

Loading...

Loading...

Accessing Oscar Filesystem

By default, a Singularity image only have access to a limited set of paths once created. Without any special configurations, your $HOME (~/) and /tmp/ (among a few other system-specific locations) are accessible from within a container. However, this will not automatically bind your data/ or scratch/ directories, and thus they will not be accessible. The easiest method to gain access to these directories is to use the bind functionality to mount these volumes to the container on runtime.

hashtag
Binding Using Command Line Arguments

Binding is achieved using the --bind or -B argument followed by the <hostPath>:<containerPath>

This will bind /oscar/data, /oscar/scratch and /oscar/home from OSCAR's GPFS to /oscar/data and /oscar/scratch within the container, respectively. Doing this will allow any existing links you have to your data and scratch directories to function properly.

hashtag
Binding Using Environment Variables

An alternative approach is to use the SINGULARITY_BINDPATH environment variable which is used as a list of additional bind paths that will be included in any singularity commands you execute, including run and shell. Using the environment variable instead of the command line argument, this would be:

You can add various additional command options to configure the read/write permissions for these mounted volumes. For more information regarding file or path binds, please see the official documentation from Singularity.

$ singularity shell -B /oscar/home/$USER,/oscar/scratch/$USER,/oscar/data <yourContainer.simg>
Mounting and bindingarrow-up-right
export SINGULARITY_BINDPATH="/oscar/home/$USER,/oscar/scratch/$USER,/oscar/data"
singularity run <yourContainer.simg>

Example Container (TensorFlow)

There are multiple ways to install and run TensorFlow. Our recommended approach is via NGC containers. The containers are available via NGC Registryarrow-up-right. In this example we will pull TensorFlow NGC container

  1. Build the container:

This process will take some time, and once it completes, you should see a .simg file.

triangle-exclamation

Working with Apptainer images requires a significant amount of storage space. By default, Apptainer will use ~/.apptainer as a cache directory, which may exceed your home quota. You can set temporary directories as follows:

  1. Once the container is ready, request an interactive session with a GPU:

  1. To run a container with GPU support:

circle-check

the --nv flag is important. As it enables the NVIDA sub-system

  1. Or, if you're executing a specific command inside the container:

  1. Make sure your Tensorflow image is able to detect GPUs

  1. If you need to install additional custom packages, note that the containers themselves are non-writable. However, you can use the --user flag to install packages inside .local. For example:

hashtag
Slurm Script:

Here's how you can submit a SLURM job script using the srun command to run your container. Below is a basic example:

apptainer build tensorflow-24.03-tf2-py3.simg docker://nvcr.io/nvidia/tensorflow:24.03-tf2-py3
export APPTAINER_CACHEDIR=/tmp
export APPTAINER_TMPDIR=/tmp
interact -q gpu -g 1 -f ampere -m 20g -n 4
export APPTAINER_BINDPATH="/oscar/home/$USER,/oscar/scratch/$USER,/oscar/data"
# Run a container with GPU support
apptainer run --nv tensorflow-24.03-tf2-py3.simg
# Execute a command inside the container with GPU support
$ apptainer exec --nv tensorflow-24.03-tf2-py3.simg nvidia-smi
$ python
>>> import tensorflow as tf
>>> tf.test.is_gpu_available(cuda_only=False, min_cuda_compute_capability=None)
True
Apptainer> pip install <package-name> --user
#!/bin/bash
#SBATCH --nodes=1               # node count
#SBATCH -p gpu --gres=gpu:1     # number of gpus per node
#SBATCH --ntasks-per-node=1     # total number of tasks across all nodes
#SBATCH --cpus-per-task=1       # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G        # total memory per node (4 GB per cpu-core is default)
#SBATCH -t 01:00:00             # total run time limit (HH:MM:SS)
#SBATCH --mail-type=begin       # send email when job begins
#SBATCH --mail-type=end         # send email when job ends
#SBATCH --mail-user=<USERID>@brown.edu

module purge
unset LD_LIBRARY_PATH
srun apptainer exec --nv tensorflow-24.03-tf2-py3.simg python examples/tensorflow_examples/models/dcgan/dcgan.py