1 of 100

Oscar

Oscar - Brown University's Cluster

Oscar - Ocean State Center for Advanced Resources - is Brown University's high performance computing cluster for both research and classes. Oscar is maintained and supported by .

Please contact if there are any questions on Oscar.

Accounts

If you do not have an Oscar account, you can request one by clicking the following link

Account Information

To request a priority account or a condo, use the account form on the CCV homepage. For more information on resources available to priority accounts and costs, visit the CCV Rates page.

What username and password should I be using?

If you are at Brown and have requested a regular CCV account, your Oscar login can be authenticated using your Brown credentials itself, i.e. the same username and password that you use to login to any Brown service such as "canvas".
If you are an external user, you will have to get a sponsored ID at Brown through the department with which you are associated, before requesting an account on Oscar. Once you have the sponsored ID at Brown, you can on Oscar and use your Brown username and password to login.

Changing Passwords

Oscar users should use their Brown passwords to log into Oscar. Users should change their Brown passwords at .

Exploratory Account

Exploratory accounts are available to all members of the Brown community for free.
See thefor detailed description of the resources
Jobs are submitted to the batch partition. See the page for available hardware

Priority Accounts

The following accounts are billed quarterly and offer more computational resources than the exploratory accounts. See thefor pricing and detailed description of the resources

HPC Priority

Intended for users running CPU-intensive jobs. These offer more CPU and memory resources than an exploratory account
Two types of accounts:
- HPC Priority

Standard GPU Priority

Intended for users running GPU intensive jobs. These accounts offer fewer CPU and memory resources but more GPU resources than an exploratory account.
Two types of accounts:
- Standard GPU Priority

High End GPU Priority

Intended for GPU jobs required high-end gpus. These offer the same number of CPUS as Standard GPU priority accounts
High end GPUS like A40, v100 and a6000 are available
See thefor pricing and detailed description of the resources

Large Memory Priority

Intended for jobs requiring large amounts of memory.
These accounts offer 2TB of memory and twice the wall-time of exploratory accounts.
See thefor pricing and detailed description of the resources

Condo

PIs who purchase hardware (compute nodes) for the CCV machine get a Condo account. Condo account users have the highest priority on the number of cores equivalent to the hardware they purchased. Condo accounts last for five years and give their owners access to 25% more CPU cores than they purchase for the first three years of their lifespan. GPU resources do not decrease over the lifetime of the condo.

Investigators may also purchase condos to grant access to computing resources for others working with them. After a condo is purchased, they can have users request to join the condo group through the "Request Access to Existing Condo" option on the on the CCV homepage.

Short "How to" Videos

Managing Modules

Modules

Command

Description

module list

Getting Help

Here are some ways to get help with using OSCAR

Filing a Support Ticket

Filing a good support ticket makes it much easier for CCV staff to deal with your request

When you email [email protected] aim to include the following:

State the problem/request in the subject of the email
Describe which software and with version you are using
Error message (if there was one)
The job number
How you were running, e.g. batch, interactively, vnc
Give as as small an example as possible that reproduces the problem

Q&A Forum

Ask questions and search for previous problems at our .

Office Hours

CCV holds weekly office hours. These are drop in sessions where we'll have one or more CCV staff members available to answer questions and help with any problems you have. Please visit for upcoming office hours and events.

Arrange a Meeting

You can arrange to meet with a CCV staff member in person to go over difficult problems, or to discuss how best to use Oscar. Email [email protected] to arrange a consultation.

Citing CCV

If you publish research that benefited from the use of CCV services or resources, we would greatly appreciate an acknowledgment that states:

Connecting to Oscar

SSH Key Login (Passwordless SSH)

How to set up SSH key authentication.

When connecting from a campus network to sshcampus.ccv.brown.edu you can set up SSH keys as a form of authentication instead of having to enter your password interactively. Follow the insctructions below that correspond to your operating system/connection method.

Mac/Linux/Windows(PowerShell)

Step 1 : Check for existing SSH key pair

Before generating new SSH key pair first check if you have an SSH key on your local machine.

If there are existing keys, please move to Step 3

Windows(PuTTY)

Key Generation & Setup

Open PuTTYgen (this comes as part of the package), change the 'Number of bits in a generated key:' to 4096 (recommended), then click 'Generate'

SSH Agent Forwarding

How to forward local ssh keys to Oscar

SSH provides a method of sharing the ssh keys on your local machine with Oscar. This feature is called Agent Forwarding and can be useful for instance when working with version control or other services that authenticate via ssh keys. Below are instructions on how to configure your SSH connection to forward ssh-agent for diffeent operating systems

Mac/Linux Windows (PuTTY)

Windows (PuTTY)

SSH Agent Forwarding on a Windows system using PuTTY, with an example application to git.

Agent Forwarding with PuTTY

Once adding your private key to Pageant, open PuTTY and navigate to the Auth menu.

2. Check the 'Allow agent forwarding' checkbox, and return to the Session menu.

3. Enter the Host Name you usually use to connect to Oscar, and click 'Open'.

4. Entering your password. If you have ssh keys setup on your local computer to connect to GitHub, you can confirm your ssh-agent was properly forwarded by checking GitHub . If the ssh command fails, your agent has not been properly forwarded.

Open OnDemand

Open OnDemand (OOD) is a web portal to the Oscar computing cluster. An Oscar account is required to access Open OnDemand. Visit this link in a web browser and sign in with your Brown username and password to access this portal.

Open on Demand

Intro to Open OnDemand Slides

OOD provides with a several resources for interacting with Oscar.

Use the in the portal to view, copy, download or delete files on Oscar.
Launch , like Matlab and Jupyter Notebook, inside your web browser.
with your browser without needing a separate terminal emulator. This is especially handy for Windows users, since you do not need to install a separate program.

Features:

No installation needed. Just use your favorite browser!
No need to enter your password again. in seconds!
No need to use two-factor authentication multiple times. Just do it once, when you log into OOD.

Web-based Terminal App

Open OnDemand offers a browser-based terminal app to access Oscar. Windows users who do not want to install an SSH client like Putty will find this app very useful.

Accessing the terminal

Interactive Apps on OOD

You can launch several different apps on the Open OnDemand (OOD) interface. All of these apps start of a Slurm batch job on the Oscar cluster with the requested amount of resources. These jobs can access the filesystem on Oscar and all output files are written to the Oscar's file system.

Launching an App on OOD

Open on any browser of the your choice
If prompted, enter your Brown username and password.
Click on the "Interactive Apps" tab at the top of the screen to see the list of available apps. This will open the form to enter the details of the job.
Follow the instructions on the form to complete it. Some of fields can be left blank and OOD will choose the default option for you.
Click Launch to submit an OOD job. This will open a new tab on the browser It may take a few minutes for this job to start.
Click "Launch <APP>" again if prompted in the next tab.

SLURM limits on resources such CPUs, memory, GPUs or time for each partition still applies for OOD jobs. Please keep these in mind before choosing these options on the OOD form.

When submit a batch job from a terminal of the Desktop app or the Advanced Desktop app, users need to

run "unset SLURM_MEM_PER_NODE"before submitting a job if the job needs to specify --mem-per-cpu

Using Python or Conda environments in the Jupyter App

We recommend all users to install Python packages within an environment. This can be a Conda to a python virtual environment. More information can be found . Follow these steps to use such environments in the .

Python Environments:

Using RStudio

RStudio is an IDE for R that can be run on Oscar.

Launching RStudio

Open the Open On Demand Dashboard by following this link. Select RStudio (under "Default GUI's"). Fill in the form to allocate the required resources, and optionally select your R modules. Finally, click the "Launch Session" button.

Known Issues

Plotting figures may not work within RStudio. If this is the case, save the plots to a file, and view them through the Open On Demand Desktop App. If plots are required for your task, launch RStudio through the Desktop App.

To learn about using the Open On Demand Desktop App, look .

Setup virtual environment and debugger

If you have an existing virtual environment, proceed to step 2. Otherwise, to create a new virtual environment:

2. Search for Python.VenvPath as shown in the picture below:

3. VSCode expects you to have multiple virtual environments for each of your different python projects, and it expects you to put them all in the same directory. Pointing to the parent directory lets it scan and find all expected virtual environments, and then you can easily toggle between them in interface.

Managing files

Oscar's Filesystem

CCV uses IBM's General Parallel File System (GPFS). Users have a home, data, and scratch space.

home ~

20GB of space
Optimized for many small files

Transferring Files between Oscar and Campus File Storage (Replicated and Non-Replicated)

You may use either Globus (recommended) or smbclient to transfer data between Oscar and Campus File Storage.

Globus

Follow .

Restoring Deleted Files

Nightly snaphots of the file system are available for the last 30 days.

CCV does not guarantee that each of the last 30 days will be available in snapshots because occasionally the snapshot process does not complete within 24 hours.

Best Practices for I/O

Efficient I/O is essential for good performance in data-intensive applications. Often, the file system is a substantial bottleneck on HPC systems, because CPU and memory technology has improved much more drastically in the last few decades than I/O technology.

Parallel I/O libraries such as MPI-IO, HDF5 and netCDF can help parallelize, aggregate and efficiently manage I/O operations. HDF5 and netCDF also have the benefit of using self-describing binary file formats that support complex data models and provide system portability. However, some simple guidelines can be used for almost any type of I/O on Oscar:

Try to aggregate small chunks of data into larger reads and writes.
For the GPFS file systems, reads and writes in multiples of 512KB
provide the highest bandwidth.

Submitting jobs

Running Jobs

Oscar is a shared machine used by hundreds of users at once. User requests are called jobs. A job is the combination of the resource requested and the program you want to run on the compute nodes of the Oscar cluster. On Oscar, Slurm is used to schedule and manage jobs.

Jobs can be run on Oscar in two different ways:

Interactive jobs allow the user to interact with programs (e.g., by entering input manually, using a GUI) while they are running. However, if your connection to the system is interrupted, the job will abort. Small jobs with short run times and jobs that require the use of a GUI are best-suited for running interactively.
Batch jobs allow you to submit a script that tells the cluster how to run your program. Your program can run for long periods of time in the background, so you don't need to be connected to Oscar. The output of your program is continuously written to an output file that you can view both during and after your program runs.

Jobs are scheduled to run on the cluster according to your account priority and the resources you request (i.e., cores, memory, and runtime). In general, the fewer resources you request, the less time your job will spend waiting in the queue.

Please do not run CPU-intense or long-running programs directly on the login nodes! The login nodes are shared by many users, and you will interrupt other users' work.

Managing Jobs

Listing running and queued jobs

The squeue command will list all jobs scheduled in the cluster. We have also written wrappers for squeue on Oscar that you may find more convenient:

Dependent Jobs

Here is an example script for running dependent jobs on Oscar.

There are 3 batch jobs. Each job has it's own batch script: job1.sh, job2,sh, jobs.sh. The script above (script.sh) submits the three jobs.

line 4: job1 is submitted.

line 7: job2 depends on job1 finishing successfully.

line 10: job3 depends on job2 finishing successfully.

GPU Computing

Compiling CUDA

Compiling with CUDA

To compile a CUDA program on Oscar, first load the CUDA with:

The CUDA compiler is called nvcc, and for compiling a simple CUDA program it uses syntax simlar to gcc:

Installing Frameworks (PyTorch, TensorFlow, Jax)

This page describes installing popular frameworks like TensorFlow, PyTorch & JAX, etc. on your Oscar account.

Preface: Oscar is a heterogeneous cluster meaning we have nodes with different architecture GPUs (Pascal, Volta, Turing, and Ampere). We recommend building the environment first time on Ampere GPUs with the latest CUDA11 modules so it's backward compatible with older architecture GPUs.

In this example, we will install PyTorch (refer to sub-pages for TensorFlow and Jax).

Step 1: Request an interactive session on a GPU node with Ampere architecture GPUs

Installing JAX

This page describes how to install JAX with Python virtual environments

In this example, we will install Jax.

Step 1: Request an interactive session on a GPU node with Ampere architecture GPUs

Here, -f = feature. We only need to build on Ampere once.

Step 2: Once your session has started on a compute node, run nvidia-smi to verify the GPU and then load the appropriate modules

Step 3: Create and activate the virtual environment

Installing TensorFlow

In this example, we will install TensorFlow.

Step 1: Request an interactive session on a GPU node with Ampere architecture GPUs

Here, -f = feature. We only need to build on Ampere once.

Step 2: Once your session has started on a compute node, run nvidia-smi to verify the GPU and then load the appropriate modules

Step 3: Create and activate the virtual environment

Large Memory Computing

Software

Software on Oscar

Many scientific and HPC software packages are already installed on Oscar, and additional packages can be requested by submitting a ticket to [email protected]. If you want a particular version of the software, do mention it in the email along with a link to the web page from where it can be downloaded. You can also install your own software on Oscar.

CCV cannot, however, supply funding for the purchase of commercial software. This is normally attributed as a direct cost of research, and should be purchased with research funding. CCV can help in identifying other potential users of the software to potentially share the cost of purchase and maintenance. Several commercial software products that are licensed campus-wide at Brown are available on Oscar.

For software that requires a Graphical User Interface (GUI) we recommend using CCV's VNC Client rather than X-Forwarding.

Python on Oscar

Several versions of Python are available on Oscar as modules. For Python 2, we recommend using the python/2.7.16 module. For Python 3, we recommend using the python/3.7.4 module. These modules include the pip and virtualenv commands, but do not include other common Python packages (e.g., SciPy, NumPy). This affords individual users complete control over the packages they are using, thereby avoiding issues that can arise when code written in Python requires specific versions of Python packages.

To use the recommend Python modules, use the following commands to load the relevant module:

module load python/2.7.16

module load python/3.7.4

Miniconda

The Miniconda modules include only conda, mamba, python, and a few other packages. Users can use either mamba (preferred) or condo to install packages in their own conda environment.

Modules

The newest module miniconda/4.12.0 is recommeded.

$ module avail miniconda
miniconda/4.10    miniconda/4.12.0

Conda

Please refere to for using a conda environment.

Mamba

Mamba is a drop-in replacement of conda, and is faster than conda.

Only activating and deactivating a conda environment still requires conda

For all other commands, conda can be replaced with mamba.

Gaussian

Gaussian is a general purpose computational chemistry package. Oscar uses the Gaussian 9 package.

Setting Up Gaussian

In order to use Gaussian on Oscar, you must be a part of the g09 group. To check your groups, run the groups command in the terminal.

You must first choose a Gaussian module to load. To see available Gaussian modules, run module avail gauss

Jupyter Notebooks/Labs

Debugging

Setting Job Submission Settings

We have provided templates for you to use for job submission settings. These templates are in/gpfs/runtime/opt/forge/19.1.2/templates

Click Run and debug a program to open the following menu

Click Configure next to Submit to Queue and enter /gpfs/runtime/opt/forge/19.1.2/templates/slurm-ccv.qtf as the Submission template file

slurm-ccv-qtf lets you specify the total number of tasks. The number of tasks may not be equal for each node. This option will be the shortest time in the queue, but may not give you consistent run times.

slurm-ccv-mpi.qtf is for MPI jobs where you want to specify number of nodes and tasks per node

slurm-ccv-threaded.qtf is for threaded (single node) jobs

MATLAB

Matlab Batch Jobs

Matlab can be used within a batch script. Here is an example batch script for running a serial Matlab program on an Oscar compute node:

#!/bin/bash

# Request an hour of runtime:
#SBATCH --time=1:00:00

# Default resources are 1 core with 2.8GB of memory.

# Use more memory (4GB):
#SBATCH --mem=4G

# Specify a job name:
#SBATCH -J MyMatlabJob

# Specify an output file
#SBATCH -o MyMatlabJob-%j.out
#SBATCH -e MyMatlabJob-%j.out

# Run a matlab function called 'foo.m' in the same directory as this batch script.
matlab -r "run foo.m; exit"

This is also available in your home directory as the file:

~/batch_scripts/matlab-serial.sh

Note the exit command at the end which is very important to include either there or in the Matlab function/script itself. If you don't make Matlab exit the interpreter, it will keep waiting for the next command until SLURM cancels the job after running out of requested walltime. So for example, if you requested 4 hours of walltime and your actual program completes in 1 hour, the SLURM job will not complete until the designated 4 hours which results in idle cores and wastage of resources and also blocks up your other jobs.

If the name of your batch script file is matlab-serial.sh, the batch job can be submitted using the following command:

sbatch matlab-serial.sh

Improving Performance and Memory Management

Matlab programs often suffer from poor performance and running out of memory. Among other things, you can refer the following web pages for best practices for an efficient code:

The first step to speeding up Matlab applications is identifying the part which takes up most of the run time. Matlab's "Profiling" tool can be very helpful in doing that:

Parallel Matlab

You can explore GPU computing through Matlab if you think your program can benefit from massively parallel computations:

Finally, parallel computing features like parfor and spmd can be used by launching a pool of workers on a node.

Visualization 🕶

Singularity Containers

MPI Jobs

Resources from the web on getting started with MPI:

MPI is a standard that dictates the semantics and features of "message passing". There are different implementations of MPI. Those installed on Oscar are

MVAPICH2
OpenMPI

We recommend using MVAPICH2 as it is integrated with the SLURM scheduler and optimized for the Infiniband network.

MPI modules on Oscar

The MPI module is called "mpi". The different implementations (mvapich2, openmpi, different base compilers) are in the form of versions of the module "mpi". This is to make sure that no two implementations can be loaded simultaneously, which is a common source of errors and confusion.

You can just use "module load mpi" to load the default version which is mpi/openmpi_4.0.7_gcc_10.2_slurm22. This is the recommended version.

The module naming format is

`srun` instead of `mpirun`

Use srun --mpi=pmix or srun --mpi=pmi2 to run MPI programs. All MPI implementations listed with suffix _slurm22 are built with SLURM support. Hence, the programs need to be run using SLURM's srun command, except if you are using the above mentioned legacy versions.

The --mpi=pmix flag is also required to match the configuration with which MPI is installed on Oscar.

Running MPI programs - Interactive

To run an MPI program interactively, first create an allocation from the login nodes using the salloc command:

For example, to request 4 cores to run 4 tasks (MPI processes):

Once the allocation is fulfilled, you can run MPI programs with the srun command:

When you are finished running MPI commands, you can release the allocation by exiting the shell:

Also, if you only need to run a single MPI program, you can skip the salloc command and specify the resources in a single sruncommand:

This will create the allocation, run the MPI program, and release the allocation.

Note: It is not possible to run MPI programs on compute nodes by using the interact command.

salloc documentation:

srun documentation:

Running MPI programs - Batch Jobs

Here is a sample batch script to run an MPI program:

Hybrid MPI+OpenMP

If your program has multi-threading capability using OpenMP, you can have several cores attached with a single MPI task using the --cpus-per-task or -c option with sbatch or salloc. The environment variable OMP_NUM_THREADS governs the number of threads that will be used.

The above batch script will launch 4 MPI tasks - 2 on each node - and allocate 4 CPUs for each task (total 16 cores for the job). Setting OMP_NUM_THREADS governs the number of threads to be used, although this can also be set in the program.

Performance Scaling

The maximum theoretical speedup that can be achieved by a parallel program is governed by the proportion of sequential part in the program (Amdahl's law). Moreover, as the number of MPI processes increases, the communication overhead increases i.e. the amount of time spent in sending and receiving messages among the processes increases. For more than a certain number of processes, this increase starts dominating over the decrease in computational run time. This results in the overall program slowing down instead of speeding up as number of processes are increased.

Hence, MPI programs (or any parallel program) do not run faster as the number of processes are increased beyond a certain point.

If you intend to carry out a lot of runs for a program, the correct approach would be to find out the optimum number of processes which will result in the least run time or a reasonably less run time. Start with a small number of processes like 2 or 4 and first verify the correctness of the results by comparing them with the sequential runs. Then increase the number of processes gradually to find the optimum number beyond which the run time flattens out or starts increasing.

Maximum Number of Nodes for MPI Programs

An MPI program is allowed to run on at most 32 nodes. When a user requests more than 32 nodes for an MPI program/job, the user will receive the following error:

Batch job submission failed: Requested node configuration is not available

FAQ

General

How do I request help?

Most inquiries can be directed to CCV’s support address, [email protected], which will create a support ticket with one of our staff.

What are the fees for CCV services?

All CCV services are billed quarterly, and rates can be found (requires a Brown authentication to view). Questions about rates should be directed to [email protected].

How do I acknowledge CCV in a research publication?

We greatly appreciate acknowledgements in research publications that benefited from the use of CCV services or resources.

Oscar

What is Oscar?

Oscar is our primary research computing cluster with several hundred multi-core nodes sharing a high-performance interconnect and file system. Applications can be run interactively or scheduled as batch jobs.

How do I request an account on Oscar?

To request an account, please fill out a All accounts are subject to our .

How do I run a job on Oscar?

Sample batch scripts are available in your home directory at ~/batch_scripts and can be run with the sbatch <jobscript> command. For more information, visit our manual page on .

Can I use Oscar for teaching?

See our page on

How do I find out when the system is down?

We post updates to our user mailing list, [email protected] which you are automatically subscribed to when setting up an account with CCV. If you need to be added to the mailing list, please submit a support ticket to [email protected]. We also have an announcement mailing list for office hours, workshops and other events relevant to CCV users, ccv-announce.listserve.brown.edu.

How do I run a job array on Oscar?

A job array is a special type of job submission that allows you to submit many related batch jobs with a single command. This makes it easy to do parameter sweeps or other schemes where the submitted jobs are all the same except for a single parameter such as a filename or input variable. Job arrays require special syntax in your job script. Sample batch scripts for job arrays are available in your home directory at ~/batch_scripts and can be run with the sbatch <jobscript> command. For more information, visit our manual page on .

How do I run a MPI job on Oscar?

MPI is a type of programming interface. Programs written with MPI can run on and communicate across multiple nodes. You can run MPI-capable programs by calling srun --mpi=pmix <program> in your batch script. For more detailed info, visit our manual page on .

I have some MPI-enabled source code. How can I compile it on Oscar?

Load an mpi module module load mpi. For a list of mpi modules available, module avail mpi

What applications are available on Oscar?

Many scientific and HPC software packages are already installed on Oscar, including python, perl, R, Matlab, Mathematica, and Maple. Use the module avail command on Oscar to view the whole list or search for packages. See our manual page on to understand how software modules work. Additional packages can be requested by submitting a support ticket to [email protected].

What compilers are available on Oscar?

By default, the gcc compiler is available when you login to Oscar, providing the GNU compiler suite of gcc (C), g++ (C++), and gfortran. We also provide compilers from Intel (intel module) and the Portland Group (pgi module). For more information, visit our manual page on .

How do I get information about finished jobs?

The sacct command will list all of your completed jobs since midnight of the previous day (as well as running and queued jobs). You can pick an earlier start date with the -S option, e.g. sacct -S 2012-01-01.

How much storage am I using?

The myquota command on Oscar will print a summary of your usage on the home, data, and scratch file systems. For more information, see our manual page on .

My job keeps terminating unexpectedly with a "Killed" message, or without any errors. What happened?

These are symptoms of not requesting enough memory for your job. The default memory allocation is about 3 GB. If your job is resource-intensive, you may need to specifically allocate more. See the for instructions on requesting memory and other resources.

How do I request a certain amount of memory per CPU?

Specify the SLURM option --mem-per-cpu= in your script.

How do I link against a BLAS and LAPACK library?

We recommend linking against the Intel Math Kernels Library (MKL) which provides both BLAS and LAPACK. The easiest way to do this on Oscar is to include the special environment variable $MKL at the end of your link line, e.g. gcc -o blas-app blas-app.c $MKL. For more complicated build systems, you may want to consult the .

RUNNING JOBS

How is a job identified?

By a unique JobID, e.g. 13180139

Which of my jobs are running/pending?

Use the command myq

How do I check the progress of my running job?

You can look at the output file. The default output file is slurm-%j.out" where %j is the JobID. If you specified and output file using #SBATCH -o output_filename and/or an error file #SBATCH -e error_filename you can check these files for any output from your job. You can view the contents of a text file using the program less , e.g.

Use the spacebar to move down the file, b to move back up the file, and q to quit.

My job is not running how I indented it too. How do I cancel the job?

scancel <JobID> where <JobID> is the job allocation number, e.g. 13180139

How do I save a copy of an interactive session?

You can use interact -o outfile to save a copy of the session's output to "outfile"

I've submitted a bunch of jobs. How do I tell which one is which? myq will list the running and pending jobs with their JobID and the name of the job. The name of the job is set in the batch script with #SBATCH -J jobname. For jobs that are in the queue (running or pending) you can use the command scontrol show job <JobID> where <JobID> is the job allocation number, e.g.13180139 to give you more detail about what was submitted.

How do I ask for a haswell node?

Use the --constraint (or -C) option:

You can use the --constraint option restrict your allocation according to other features too. The nodes command provides a list of "features" for each type of node.

Why won't my job start?

When your job is pending (PD) in the queue, SLURM will display a reason why your job is pending. The table below shows some common reasons for which jobs are kept pending.

Reason

Meaning

Why is my job taking so long to start? Just waiting in (Priority) or (Resources)

Overall system busy: when tens of thousands of jobs are submitted it total by all users, the time it takes SLURM to process these into the system may increase from the normal almost instantly to a half-hour or more.
Specific resource busy: if you request very specific resources (e.g., a specific processor) you then have to wait for that specific resource to become available while other similar resources may be going unused.
Specified resource not available: if you request something that is not or may never be available, your job will simply wait in the queue. E.g., requesting 64 GB of RAM on a 64 GB node will never run because the system needs at least 1 GB for itself so you should reduce your request to less than 64.

TRANSFERRING FILES

How do I transfer big files to/from Oscar?

Please use the server transfer.ccv.brown.edu

Transfer local file to Oscar:

2. Transfer remote file on Oscar to the local system:

Alternatively, Oscar has an endpoint for "Globusonline" () that you can use to more effectively transfer files. See our manual page on how to use to transfer files.

Cloud HPC Options

The use of cloud resources for HPC varies according to your demands and circumstances. Cloud options are changing rapidly both in service providers and various services being offered. For those who have short-term needs that don't demand the highest of computational performance, a cloud option might be appropriate. For others, a local option customized to individual needs may be better. The cost of cloud services also varies quite a bit and includes not only compute time but data transfer charges. Other issues involved licensing, file synchronization, etc.

We are actively investigating a number of options to connect Brown users seamlessly to suitable cloud options. We are collecting such information for publishing on the CIS website as part of research services available. At this point, the best course of action is to request an individual consultation to help address your specific needs. Please send email to support@ccv. brown.edu.

Jupyter Notebooks on Oscar

Installing Jupyter Notebook

The anaconda/3-5.2.0 module provides jupyter-notebook. Users can also use pip or anaconda to install jupyter notebook.

Running Jupyter Notebook on Oscar

There are a couple of ways to use Notebook on Oscar. You can run Jupyter Notebook

in a VNC session
using a batch job
in an interactive session

With the batch job or interactive session method, you use a browser on your machine to connect to your Jupyter Notebook server on Oscar.

Start by going to the directory you want to access when using Jupyter Notebook, and then start Jupyter Notebook. The directory where a Jupyter Notebook is started is the working directory for the Notebook.

Do not run Jupyter Notebook on login nodes.

In a VNC Session

Start a , and open up a terminal in the VNC session . To start a Jupyter Notebook, enter

This will start the Jupyter Notebook server and open up a browser with the notebook.

If you installed Jupyter Notebook with pip, you may need to give the full path:

~/.local/bin/jupyter-notebook

Using a Batch Job

Submit an ssh tunnel to the server.
Set up an ssh tunnel to the server.
Open a browser to view the notebook.

1. Submit batch script

Here is an example batch script to start a Jupyter notebook server on an Oscar compute node

If you installed Jupyter notebook with pip you may need to give the full path:

~/.local/bin/jupyter-notebook --no-browser --port-$ipnport --ip=$ipnip

This script can be found in ~/batch_scripts. Copy this example and submit this script with

sbatch jupyter.sh

Once your batch job is running there will be a file named jupyter-log-{jobid}.txtcontaining the information you need to connect to your jupyter notebook server on Oscar. To check if your job is running, use myq.

The output from myq will look something like this:

2. Set up an ssh tunnel to the notebook server

In this example the jobID is 7239096. To view the notebook server information, use cat. For this example:

cat jupyter-log-7239096.txt

Open a terminal on your machine and copy and paste the ssh -N -L ........ line into the terminal.

If you are using Windows, follow the documentation to complete this step.

Enter your Oscar password. Note it will appear that nothing has happened.

3. Open a browser to view the notebook

Open a browser on your local machine to the address given in cat jupyter-log-{jobid}.txt.

The notebook will ask for a token. Copy the token from jupyter-log-{jobid}.txt. Then your notebook will start.

Remember to scancel {jobid} when you are done with your notebook session.

In an Interactive Session

Start Jupyter Notebook in an interactive job.
Set up an ssh tunnel to the server.
Open a browser to view the notebook.

1. Start a Jupyter Notebook in an interactive job

Start an and then in your interactive session enter the following:

An output similar to the one below indicates that Jupyter Notebook has started:

$ jupyter-notebook --no-browser --port=$ipnport --ip=$ipnip
[I 13:35:25.948 NotebookApp] JupyterLab beta preview extension loaded from /gpfs/runtime/opt/anaconda/3-5.2.0/lib/python3.6/site-packages/jupyterlab
[I 13:35:25.948 NotebookApp] JupyterLab application directory is /gpfs/runtime/opt/anaconda/3-5.2.0/share/jupyter/lab
[I 13:35:25.975 NotebookApp] Serving notebooks from local directory: /gpfs_home/yliu385
[I 13:35:25.975 NotebookApp] 0 active kernels
[I 13:35:25.975 NotebookApp] The Jupyter Notebook is running at:
[I 13:35:25.975 NotebookApp] http://172.20.207.61:8855/?

2. Setup an ssh tunnel to the server

Open a terminal on your machine and enter the following line (replace $ipnip and $ipnport with the values from the two echo commands in the previous step).

If you are using Windows, follow the documentation to complete this step.

Enter your Oscar password. Note it will appear that nothing has happened.

3. Open a browser to view the notebook

Open a browser on your local machine to the address:

Again, you need to replace $ipnport with the value from the first echo command in Step 1. The notebook will ask for a token. You can copy the token from the output from Step 2.

4. Press Ctrl+C twice to kill your Jupyter Notebook server

Once you finish and no longer need the Jupyter Notebook server, you can kill the server by pressing Ctrl+C twice in your interactive session.

ParaView Remote Rendering

Running Paraview Remote Rendering in Oscar

This service is new and is in Beta

The Center for Computation and Visualization (CCV) offers to the academic community a way to visualize large datasets using Oscar and its powerful GPUs as a rendering server. The current GPU hardware and available memory on Oscar surpasses the common desktop models, offering a modern and robust solution to display large datasets in parallel jobs using the widely used opensource software Paraview. It is a simple two-steps process. Start the server and connect the client.

Who benefits from this service?

The target audience for this service are members of the academic community that interact, and analyze large 3D datasets, i.e., point clouds, volumetric data, tiff-stacks and mesh-data. This includes groups working with microscopy data, MRI images, structural analysis, fluid dynamics, climate sciences, astrophysics and more. In fact, ParaView can handle over 100 different file formats. The remote rendering service is targeted to scenarios where the personal/lab computer setup may not have the resources to handle the size of the underlying datasets. Common obstacles are older GPU technology or low RAM availability which may cause performance issues.

Workflow Overview

Above is a graphical representation on how the parallel render server works using Oscar. The user logins to Oscar either via SSH or VNC session. From the terminal, the user loads the Paraview module and executes the convinience script called run-remote-serverto start the Paraview server session and allocate the memory and walltime limit. Once the server starts, the user receives an email with the information needed to access the server. Lastly, the user connects the Paraview client (i.e., desktop application) to the server that is running in Oscar. The client displays images that are processed by the server (on Oscar) which reconstructs the information computed by the nodes.

0. Prerequisites

Paraview Desktop:

You can either download the Paraview Desktop App to your presonal computer or access the desktop application already installed in Oscar's VNC. Installing in your local computer may give you better interactivity.

Download Paraview Desktop to your desktop computer (Recommended)

Go to the official . Select your Operational system (Linux, Windows or Mac), get the ParaView-5.11.0--Python3.9 version that suit your local machine and run the Installer wizard.

Not all versions will work. You must select 5.11.0

Using Paraview UI installed in Oscar

Connect to
Open terminal: Applications - > Utilities -> Terminal (this might differ depending on the Operating System UI)
Run the commands

If this is your first time opening paraview, it will take a few minutes.

1. Start the Server

You need to allocate the resources via SLURM indicating the amount of memory you want to reserve, as well as a few optional parameters to configure your session. We have created a convinience script for you to do so called run-remote-server

In order to have the run-remote-server be found we need to load the Paraview module that supports this service (this appends the correct path to out PATH environment variable)

The flag -u indicates where the confirmation email will be sent. Technically it could be any email address, but the remote render session can only be used by existing Oscar users.

The only mandatory parameter is -u <user-email>.

Memory Request

The number of CPU cores and GPUs are determined by the memory request.

By default, the run-remote-server script's minimum memory request is 45 GB (1 CPU/GPU ) and the maximum is 180 GB (4 CPU/GPU ). You can add more resources to your session using the -m flag. Every multiplier of 45GB adds a CPU core and a GPU. i.e :

The following is the description of the command and the available configuration settings.

usage: run-remote-render [-n cores] [-t walltime] [-m memory] [-q queue] [-o outfile] [-g ngpus] [-u user brown email] Allocates resources, start up the render server and send and email to the user requesting the service options: -t walltime as hh:mm:ss (default: 1:30:00) -m memory as #[k|m|g] (default: 45G) -o outfile save a copy of the session's output to outfile (default: off) -q slurm partition (gpu (default)| gpu-he) -u brown email of the user requesting the service

After executing the command, the system will allocate resources, and it will send a confirmation email indicating that the service is ready; the email includes additional instruction on how to connect to the server using Paraview UI.

NOTE: You might not receive the email instantly. Sometimes it may take a while before there are sufficient resources (i.e., GPUs) available. You will get the notification as soon as they are available for your use. You can also use the myq command to see the status of your job

2. Wait for the confirmation Email

In the email sent by the system has important information such as :

How to create an SSH tunnel
The IP address and port where the service is deployed
How to connect to the server from multiple systems

Please read it and get familiar on how the process works.

Please, read the email carefully. It contains information about your connection such as server IP address and port. You need them in order to proceed with the next steps.

3. Connect to the Server

There are two options to connect to the remote server:

Your personal computer
VNC

3.1 Setting up SSH Tunneling

This step is only needed if you are using your personal computer and not VNC

Open a terminal and execute the command:

<SERVER_IP> This is the ip of the compute node in Oscar. Replace with the value sent in the confirmation email
<port-number>This is the port exposed to access the rendering server. Replace with the value sent in the confirmation email

The IP address and the Port might vary from use to user. Check the confirmation email for the correct details for your connection.

NOTE: After entering your credentials, you will notice the terminal command line appears to hang. That is normal, it indicates you are connected and the SSH tunneling is set up.

3.2 Connect the client/desktop application to the remote server

This step will reset the scene, so before doing it make sure to save all your data.

Open Paraview Desktop Application (see the )
In paraview UI go to menu bar File -> Connect ..
Add Server:

After a few seconds, you get connected to the HPC automatically.

In subsequent connections you can reuse the Server, but you will need to update the host and port values each time you launch a new server session

Verifying the connection is set up correctly.

In Paraview UI go to the menu bar View and select Memory Inspector. You will notice a list of servers indicating the number of processes running on them

Summary

Open a terminal an connect to Oscar (Follow to know how to do it)
Load the Paraview module module load paraview/5.11.0_openmpi_4.0.7_intel_2020.2_slurm22
Execute the command run-remote-server -u [email protected]

If you find any issues following this guide or require additional help, do not hesitate contacting CCV services at [email protected]

Oscar

Oscar - Brown University's Cluster

hashtagAccounts

Account Information

hashtagWhat username and password should I be using?

hashtagChanging Passwords

hashtagExploratory Account

hashtagPriority Accounts

hashtagHPC Priority

hashtagStandard GPU Priority

hashtagHigh End GPU Priority

hashtagLarge Memory Priority

hashtagCondo

Short "How to" Videos

Managing Modules

hashtagModules

Getting Help

hashtagFiling a Support Ticket

hashtagQ&A Forum

hashtagOffice Hours

hashtagArrange a Meeting

Citing CCV

Connecting to Oscar

SSH Key Login (Passwordless SSH)

Mac/Linux/Windows(PowerShell)

hashtagStep 1 : Check for existing SSH key pair

Windows(PuTTY)

hashtagKey Generation & Setup

SSH Agent Forwarding

hashtag

Windows (PuTTY)

hashtagAgent Forwarding with PuTTY

Open OnDemand

hashtagOpen on Demandarrow-up-right

hashtagIntro to Open OnDemand Slides

hashtagFeatures:

Web-based Terminal App

hashtagAccessing the terminal

Interactive Apps on OOD

hashtagLaunching an App on OOD

Using Python or Conda environments in the Jupyter App

hashtagPython Environments:

Using RStudio

hashtagLaunching RStudio

hashtagKnown Issues

Setup virtual environment and debugger

Managing files

Oscar's Filesystem

Transferring Files between Oscar and Campus File Storage (Replicated and Non-Replicated)

hashtagGlobus

Restoring Deleted Files

hashtag

Best Practices for I/O

Submitting jobs

Running Jobs

Managing Jobs

hashtagListing running and queued jobs

Dependent Jobs

GPU Computing

Compiling CUDA

hashtagCompiling with CUDA

Installing Frameworks (PyTorch, TensorFlow, Jax)

Installing JAX

Installing TensorFlow

Large Memory Computing

Software

Software on Oscar

Python on Oscar

Miniconda

hashtagModules

hashtagConda

hashtagMamba

Gaussian

hashtagSetting Up Gaussian

Jupyter Notebooks/Labs

Debugging

Setting Job Submission Settings

MATLAB

Matlab Batch Jobs

Improving Performance and Memory Management

Accounts

What username and password should I be using?

Changing Passwords

Exploratory Account

Priority Accounts

HPC Priority

Standard GPU Priority

High End GPU Priority

Large Memory Priority

Condo

Modules

Filing a Support Ticket

Q&A Forum

Office Hours

Arrange a Meeting

Step 1 : Check for existing SSH key pair

Key Generation & Setup

Agent Forwarding with PuTTY

Open on Demand

Intro to Open OnDemand Slides

Features:

Accessing the terminal

Launching an App on OOD

Python Environments:

Launching RStudio

Known Issues

Globus

Listing running and queued jobs

Compiling with CUDA

Modules

Conda

Mamba

Setting Up Gaussian

What username and password should I be using?

Changing Passwords

Exploratory Account

Priority Accounts

HPC Priority

Standard GPU Priority

High End GPU Priority

Large Memory Priority

Condo

Accounts

Hardware

Scheduler

Software

Storage

Connecting to Oscar

Maintenance Schedule

Unplanned Outage

User and Research Support

Agent Forwarding with PuTTY

Open on Demand

Intro to Open OnDemand Slides

Features:

Filing a Support Ticket

Q&A Forum

Office Hours

Arrange a Meeting

Launching an App on OOD

Modules

Launching RStudio

Known Issues

Modules

Conda

Mamba

Step 1 : Check for existing SSH key pair

Python Environments:

Listing running and queued jobs