Oscar
HomeServicesDocumentation
  • Overview
  • Quickstart
  • Getting Started
  • System Hardware
  • Account Information
  • Short "How to" Videos
  • Quick Reference
    • Common Acronyms and Terms
    • Managing Modules
    • Common Linux Commands
  • Getting Help
    • ❓FAQ
  • Citing CCV
  • CCV Account Information
  • Student Accounts
  • Offboarding
  • Connecting to Oscar
    • SSH (Terminal)
      • SSH Key Login (Passwordless SSH)
        • Mac/Linux/Windows(PowerShell)
        • Windows(PuTTY)
      • SSH Configuration File
      • X-Forwarding
      • SSH Agent Forwarding
        • Mac/Linux
        • Windows (PuTTY)
      • Arbiter2
    • Open OnDemand
      • Using File Explorer on OOD
      • Web-based Terminal App
      • Interactive Apps on OOD
      • Using Python or Conda environments in the Jupyter App
      • Using RStudio
      • Desktop App (VNC)
    • SMB (Local Mount)
    • Remote IDE (VS Code)
      • From Non-compliant Networks (2-FA)
      • Setup virtual environment and debugger
  • Managing files
    • Oscar's Filesystem
    • Transferring Files to and from Oscar
    • Transferring Files between Oscar and Campus File Storage (Replicated and Non-Replicated)
    • Resolving quota issues
      • Understanding Disk Quotas
    • Inspecting Disk Usage (Ncdu)
    • Restoring Deleted Files
    • Best Practices for I/O
    • Version Control
  • Submitting jobs
    • Running Jobs
    • Slurm Partitions
    • Interactive Jobs
    • Batch Jobs
    • Managing Jobs
    • Job Arrays
    • MPI Jobs
    • Condo/Priority Jobs
    • Dependent Jobs
    • Associations & Quality of Service (QOS)
  • GPU Computing
    • GPUs on Oscar
      • Grace Hopper GH200 GPUs
      • H100 NVL Tensor Core GPUs
      • Ampere Architecture GPUs
    • Submitting GPU Jobs
    • Intro to CUDA
    • Compiling CUDA
    • Installing Frameworks (PyTorch, TensorFlow, Jax)
      • Installing JAX
      • Installing TensorFlow
    • Mixing MPI and CUDA
  • Large Memory Computing
    • Large Memory Nodes on Oscar
  • Software
    • Software on Oscar
    • Using Modules
    • Migration of MPI Apps to Slurm 22.05.7
    • Python on Oscar
    • Python in batch jobs
    • Installing Python Packages
    • Installing R Packages
    • Using CCMake
    • Intro to Parallel Programming
    • Anaconda
    • Conda and Mamba
    • DMTCP
    • Screen
    • VASP
    • Gaussian
    • IDL
    • MPI4PY
  • Jupyter Notebooks/Labs
    • Jupyter Notebooks on Oscar
    • Jupyter Labs on Oscar
    • Tunneling into Jupyter with Windows
  • Debugging
    • Arm Forge
      • Configuring Remote Launch
      • Setting Job Submission Settings
  • MATLAB
    • Matlab GUI
    • Matlab Batch Jobs
    • Improving Performance and Memory Management
    • Parallel Matlab
  • Visualization 🕶
    • ParaView Remote Rendering
  • Singularity Containers
    • Intro to Apptainer
    • Building Images
    • Running Images
    • Accessing Oscar Filesystem
      • Example Container (TensorFlow)
    • Singularity Tips and Tricks
  • Installing Software Packages Locally
    • Installing your own version of Quantum Espresso
    • Installing your own version of Qmcpack
  • dbGaP
    • dbGaP Architecture
    • dbGaP Data Transfers
    • dbGaP Job Submission
  • RHEL9 Migration
    • RHEL-9 Migration
    • LMOD - New Module System
    • Module Changes
    • Testing Jupyter Notebooks on RHEL9 mini-cluster
  • Large Language Models
    • Ollama
Powered by GitBook
On this page
  • Apptainer Using NGC Containers (Our #1 Recommendation)
  • Slurm Script:

Was this helpful?

Export as PDF
  1. GPU Computing
  2. Installing Frameworks (PyTorch, TensorFlow, Jax)

Installing TensorFlow

Setting up a GPU-accelerated environment can be challenging due to driver dependencies, version conflicts, and other complexities. Apptainer simplifies this process by encapsulating all these details

PreviousInstalling JAXNextMixing MPI and CUDA

Last updated 1 year ago

Was this helpful?

Apptainer Using NGC Containers (Our #1 Recommendation)

There are multiple ways to install and run TensorFlow. Our recommended approach is via NGC containers. The containers are available via . In this example we will pull TensorFlow NGC container

  1. Build the container:

apptainer build tensorflow-24.03-tf2-py3.simg docker://nvcr.io/nvidia/tensorflow:24.03-tf2-py3

This will take some time, and once it completes you should see a .simg file.

For your convenience, the pre-built container images are located in directory:

/oscar/runtime/software/external/ngc-containers/tensorflow.d/x86_64/

You can choose either to build your own or use one of the pre-downloaded images.

Working with Apptainer images requires lots of storage space. By default Apptainer will use ~/.apptainer as a cache directory which can cause you to go over your Home quota.

export APPTAINER_CACHEDIR=/tmp
export APPTAINER_TMPDIR=/tmp
  1. Once the container is ready, request an interactive session with a GPU

interact -q gpu -g 1 -f ampere -m 20g -n 4
  1. Run a container wih GPU support

export APPTAINER_BINDPATH="/oscar/home/$USER,/oscar/scratch/$USER,/oscar/data"
# Run a container with GPU support
apptainer run --nv tensorflow-24.03-tf2-py3.simg

the --nv flag is important. As it enables the NVIDA sub-system

  1. Or, if you're executing a specific command inside the container:

# Execute a command inside the container with GPU support
$ apptainer exec --nv tensorflow-24.03-tf2-py3.simg nvidia-smi
  1. Make sure your Tensorflow image is able to detect GPUs

$ python
>>> import tensorflow as tf
>>> tf.test.is_gpu_available(cuda_only=False, min_cuda_compute_capability=None)
True
  1. If you need to install more custom packages, the containers itself are non-writable but we can use the --user flag to install packages inside .local Example:

Apptainer> pip install <package-name> --user

Slurm Script:

Here is how you can submit a SLURM job script by using the srun command to run your container. Here is a basic example:

#!/bin/bash
#SBATCH --nodes=1               # node count
#SBATCH -p gpu --gres=gpu:1     # number of gpus per node
#SBATCH --ntasks-per-node=1     # total number of tasks across all nodes
#SBATCH --cpus-per-task=1       # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem=40G               # total memory (4 GB per cpu-core is default)
#SBATCH -t 01:00:00             # total run time limit (HH:MM:SS)
#SBATCH --mail-type=begin       # send email when job begins
#SBATCH --mail-type=end         # send email when job ends
#SBATCH --mail-user=<USERID>@brown.edu

module purge
unset LD_LIBRARY_PATH
export APPTAINER_BINDPATH="/oscar/home/$USER,/oscar/scratch/$USER,/oscar/data"
srun apptainer exec --nv tensorflow-24.03-tf2-py3.simg python examples/tensorflow_examples/models/dcgan/dcgan.py

NGC Registry