1 of 100

Oscar

Overview

Overview of OSCAR Supercomputer

Oscar is Brown University's high performance computing cluster for both research and classes. Oscar is maintained and supported by .

Please contact if there are any questions on Oscar.

Accounts

If you do not have an Oscar account, you can request one by clicking the following link:

Anyone with a Brown account can get a free Exploratory account on Oscar, or pay for priority accounts.

More details can be found at the .

Individuals external to Brown can get access to Oscar by having a sponsored Brown account. Please work with your department to for any external collaborators.

Authorized users must comply with the following Brown University policies:

Hardware

Users can run their computing-intensive and/or long runtime jobs/program in Oscar to take advantage of high performance computing resources there, as highlighted below:

2 Login nodes
8 PB of storage
Red Hat Enterprise Linux 9.2 (Linux)
Mellanox InfiniBand network
Slurm Workload manager

Please refer to the details at.

Scheduler

Hundreds of users can share computing resources in Oscar. such as cores and GPUs.

Users should not run computations or simulations on the login nodes, because they are shared with other users. You can use the login nodes to compile your codes, manage files, and launch jobs on the compute nodes.

To allow users sharing access to Oscar, there are limits on the maximum number of pending and running jobs a user account may have/submit:

1200 for a priority account
1000 for an exploratory account

Software

Operating systems of all Oscar nodes: Red Hat 9.2
More than 500
CCV Staff install software upon user requests or help users on software installation

Storage

Oscar has 8 PB of all-flash storage from VAST, which provides high-performance access to storage. Users have ~/home, ~/scratch, and ~/data directories as their storage with quota in Oscar. Please refer to the details at .

Access and User Accounts - User accounts are controlled via central authentication and directories on Oscar are only deleted on the request of the user, PI, or departmental chair.

Files not accessed for 30 days will be deleted from your ~/scratch directory. Use ~/data for files you wish to keep long term.

Users can from and to. In particular, users can

Connecting to Oscar

Oscar users can connect to Oscar by

Maintenance Schedule

Non-disruptive Maintenance:
- non-disruptive work, including software changes, maintenance, and testing
- may occur at any time
- no notification provided
Monthly Scheduled Maintenance:
- no downtime expected, but there may be limited degradation of performance
- first Tuesday of the month, 8:00 am - 12:00 noon
- no notification provided
Unscheduled Maintenance:
- maximum 1 day downtime
- occurs very rarely and includes any unplanned emergency issues that arise
- Prior notification provided (depending on the issue, 1 day to 4 weeks advance notice provided)
Major Upgrade Maintenance:
- service may be brought down for 3-5 days
- occurs annually
- 4-week prior notification provided

Unplanned Outage

During Business Hours:
- Send email to . A ticket will get created and CCV staff will attempt to address the issue as soon as possible.
During Non-Business Hours:
- Send email to .
- Call CIS Operations Center at (401) 863-7562. A ticket will get created and CCV staff will be contacted to address the issue.

User and Research Support

CCV staff support for researchers seeking help with statistical modeling, machine learning, data mining, data visualization, computational biology, high-performance computing, and software engineering.

CCV staff provides tutorials on using Oscar for classes, groups and individual. Please check for upcoming trainings and office hours.

CCV provides short videos (coming soon) for users to learn as well.

Quickstart

How to connect to Oscar and submit your first batch job

Connect to OSCAR

This guide assumes you have an Oscar account. To request an account see create an account.

The simplest way to connect to Oscar is via Open OnDemand (OOD). To connect to OOD, go to https://ood.ccv.brown.edu and log in using your Brown credentials. For more details on Open OnDemand click here.

Alternatively, you can connect to OSCAR via SSH (Terminal):

ssh <username>@ssh.ccv.brown.edu

Windows users need an SSH client like PuTTY installed. SSH is available by default on Linux and macOS. Click here for more details.

Submit a Job

You can submit a job using sbatch:

sbatch batch_scripts/hello.sh

You can confirm that your job ran successfully by running:

cat hello-*.out

For more detailed information on submitting jobs, see the Submitting Jobs section of the documentation.

Transfer Files

To get specific files on to / off of Oscar, read through the Transferring Files to and from Oscar page of the documentation.

Get Help

If you encounter problems while using Oscar, check out the Getting Help documentation, or read through the Overview page.

Getting Started

This guide assumes you have an Oscar account. To request an account see .

If you're confused about any acronyms or terms throughout the guide, check out our page to see definitions of commonly used terms

OSCAR

Oscar is the shared compute cluster operated by CCV.

Oscar runs the Linux Red Hat 9 operating system. General Linux documentation is available from . We recommend you read up on basic Linux commands before using Oscar. Some of the most common commands you'll be using in Oscar can also be found on our .

If you'd like a brief introduction to Linux commands, watch .

Oscar has two login nodes and several hundred compute nodes. When users log in through Secure Shell (SSH), they are first put on one of the login nodes which are shared among several users at a time. You can use the login nodes to compile your code, manage files, and launch jobs on the compute nodes from your own computer. Running computationally intensive or memory intensive programs on the login node slows down the system for all users. Any processes taking up too much CPU or memory on a login node will be killed. Please do not run Matlab on the login nodes.

What username and password should I be using?

If you are at Brown and have requested a regular CCV account, your Oscar login will be authenticated using your Brown credentials, i.e. the same username and password that you use to log into any Brown service such as "canvas".
If you are an external user, you will have to get a sponsored ID at Brown through the department with which you are associated before requesting an account on Oscar. Once you have the sponsored ID at Brown, you can on Oscar and use your Brown username and password to log in.

Connecting to Oscar for the first time

To log in to Oscar you need Secure Shell (SSH) on your computer. Mac and Linux machines normally have SSH available. To login in to Oscar, open a terminal and type

Windows users need to install an SSH client. We recommend , a free SSH client for Windows. Once you've installed PuTTY, open the client and use <username>@ssh.ccv.brown.edufor the Host Name and click Open. The configuration should look similar to the screenshot below.

The first time you connect to Oscar you will see a message like:

You can type yes . You will be prompted for your password. Note that nothing will show up on the screen when you type in your password; just type it in and press enter. You will now be in your home directory on Oscar. In your terminal you will see a prompt like this:

Congratulations, you are now on one of the Oscar login nodes.

Note: Please do not run computations or simulations on the login nodes, because they are shared with other users. You can use the login nodes to compile your code, manage files, and launch jobs on the compute nodes.

File system

Users on Oscar have three places to store files:

home
scratch
data

Note that class accounts may not have a data directory. Users who are members of more than one research group may have access to multiple data directories.

From the home directory, you can use the command ls to see your scratch directory and your data directory (if you have one) and use cd to navigate into them if needed.

To see how much space in your directories, use the command checkquota. Below is an example output:

Files not accessed for 30 days may be deleted from your scratch directory. This is because scratch is high performance space. The fuller scratch is, the worse the read/write performance. Use ~/data for files you need to keep long term.

A good practice is to configure your application to read any initial input data from ~/data and write all output into ~/scratch. Then, when the application has finished, move or copy data you would like to save from ~/scratch to ~/data. For more information on which directories are backed up and best practices for reading/writing files, see and . You can go over your quota up to the hard limit for a grace period. This grace period is to give you time to manage your files. When the grace period expires you will be unable to write any files until you are back under quota.

You can also transfer files to and from the Oscar Filesystem from your own computer. See .

Software modules

CCV uses the package for managing the software environment on OSCAR. To see the software available on Oscar, use the command module avail. You can load any one of these software modules using module load <module>. The command module list shows what modules you have loaded. Below is an example of checking which versions of the module 'workshop' are available and loading a given version.

For a list of all Lmod commands, see . If you have a request for software to be installed on Oscar, email [email protected].

Using a Desktop on Oscar

You can connect remotely to a graphical desktop environment on Oscar using . The OOD Desktop integrates with the scheduling system on Oscar to create dedicated, persistent VNC sessions that are tied to a single user.

Using VNC, you can run graphical user interface (GUI) applications like Matlab, Mathematica, etc. while having access to Oscar's compute power and file system.

Running Jobs

You are on Oscar's login nodes when you log in through SSH. You should not (and would not want to) run your programs on these nodes as these are shared by all active users to perform tasks like managing files and compiling programs.

With so many active users, a shared cluster has to use a "job scheduler" to assign compute resources to users for running programs. When you submit a job (a set of commands) to the scheduler along with the resources you need, it puts your job in a queue. The job is run when the required resources (cores, memory, etc.) become available. Note that since Oscar is a shared resource, you must be prepared to wait for your job to start running, and it can't be expected to start running straight away.

Oscar uses the SLURM job scheduler. Batch jobs are the preferred mode of running programs, where all commands are mentioned in a "batch script" along with the required resources (number of cores, wall-time, etc.). However, there is also a way to run programs interactively.

For information on how to submit jobs on Oscar, see .

There is also extensive documentation on the web on using SLURM ().

Where to get help

Online resources: , , ,
CCV's page detailing you might face on Oscar
Email

Account Information

To request a priority account or a condo, use the account form on the CCV homepage. For more information on resources available to priority accounts and costs, visit the CCV Rates page.

What username and password should I be using?

If you are at Brown and have requested a regular CCV account, your Oscar login can be authenticated using your Brown credentials itself, i.e. the same username and password that you use to login to any Brown service such as "canvas".
If you are an external user, you will have to get a sponsored ID at Brown through the department with which you are associated, before requesting an account on Oscar. Once you have the sponsored ID at Brown, you can request an account on Oscar and use your Brown username and password to login.

Changing Passwords

Oscar users should use their Brown passwords to log into Oscar. Users should change their Brown passwords at myaccount.brown.edu.

Exploratory Account

Exploratory accounts are available to all members of the Brown community for free.
See the CCV Rates page for detailed description of the resources
Jobs are submitted to the batch partition. See the System Hardware page for available hardware

Priority Accounts

The following accounts are billed quarterly and offer more computational resources than the exploratory accounts. See the CCV Rates page for pricing and detailed description of the resources

HPC Priority

Intended for users running CPU-intensive jobs. These offer more CPU and memory resources than an exploratory account
Two types of accounts:
- HPC Priority
- HPC Priority+ (Twice the resources of HPC Priority)
See the CCV Rates page for pricing and detailed description of the resources.
Jobs are submitted to the batch partition. See the System Hardware page for available hardware

Standard GPU Priority

Intended for users running GPU intensive jobs. These accounts offer fewer CPU and memory resources but more GPU resources than an exploratory account.
Two types of accounts:
- Standard GPU Priority
- Standard GPU Priority+ (Twice the resources of Standard GPU Priority)
See the CCV Rates page for pricing and detailed description of the resources.
Jobs are submitted to the gpu partition. See the System Hardware page for available GPU hardware

High End GPU Priority

Intended for GPU jobs required high-end gpus. These offer the same number of CPUS as Standard GPU priority accounts
High end GPUS like A40, v100 and a6000 are available
See the CCV Rates page for pricing and detailed description of the resources
Jobs are submitted to the gpu-he partition. See the System Hardware page for available GPU hardware

Large Memory Priority

Intended for jobs requiring large amounts of memory.
These accounts offer 2TB of memory and twice the wall-time of exploratory accounts.
See the CCV Rates page for pricing and detailed description of the resources
Jobs are submitted to the bigmem partition. See the System Hardware page for available hardware

Condo

PIs who purchase hardware (compute nodes) for the CCV machine get a Condo account. Condo account users have the highest priority on the number of cores equivalent to the hardware they purchased. Condo accounts last for five years and give their owners access to 25% more CPU cores than they purchase for the first three years of their lifespan. GPU resources do not decrease over the lifetime of the condo.

Investigators may also purchase condos to grant access to computing resources for others working with them. After a condo is purchased, they can have users request to join the condo group through the "Request Access to Existing Condo" option on the account form on the CCV homepage.

Short "How to" Videos

Quick Reference

This page contains Linux commands commonly used on Oscar, basic module commands, and definitions for common terms used within this documentation.

These pages list some command commands and terms you will come across while using Oscar.

Common Acronyms and Terms

Managing Modules

Common Linux Commands

Managing Modules

`module list`

Lists all modules that are currently loaded in your software environment.

`module avail`

Lists all available modules on the system. Note that a module can have multiple versions. Use module avail <name> to list available modules which start with <name>

`module help <name>`

Prints additional information about the given software.

`module load <name>`

Adds a module to your current environment. If you load using just the name of a module, you will get the default version. To load a specific version, load the module using its full name with the version: "module load gcc/10.2"

`module unload <name>`

Removes a module from your current environment.

Common Linux Commands

`cd`

Moves the user into the specified directory. Change Directory.

cd .. to move one directory up

cd by itself to move to home directory

cd - to move to previous directory

cd <directory-path> to move to a directory (can be an absolute path or relative path)

`cp <old_filepath> <new directory path>`

Copies the file into the specified directory

`clear`

Clears the terminal

`cat <filename>`

Lists the contents of a file. Concatenate files.

`ls`

List contents within the current directory

`grep <string_to_match> <filename>`

Searches for the string / within the specified file and prints the line(s) with the result

`pwd`

Displays the path of the current directory that you are in. Present Working Directory

`man <command>`

Displays the help manual instruction for the given command

`mv <file_name> <new_directory>`

Moves a file into a new directory.

mv <old_file_name> <new_file_name> to rename a file

`mkdir <directory_name>`

Creates a new directory

`rm <file_name>`

Deletes a file

`rm -r <directory_name>`

Deletes directories and the contents within them. -r stands for recursive

`rmdir <directory_name>`

Removes the specified directory (must be empty)

`touch`

Creates a blank new file

Getting Help

Here are some ways to get help with using OSCAR

Filing a Support Ticket

Filing a good support ticket makes it much easier for CCV staff to deal with your request

When you email [email protected] aim to include the following:

State the problem/request in the subject of the email
Describe which software and with version you are using
Error message (if there was one)
The job number
How you were running, e.g. batch, interactively, vnc
Give as as small an example as possible that reproduces the problem

Q&A Forum

Ask questions and search for previous problems at our .

Slack

Join our CCV-Share Slack workspace to discuss your questions with CCV Staff in the #oscar channel.

Office Hours

CCV holds weekly office hours. These are drop in sessions where we'll have one or more CCV staff members available to answer questions and help with any problems you have. Please visit for upcoming office hours and events.

Arrange a Meeting

You can arrange to meet with a CCV staff member in person to go over difficult problems, or to discuss how best to use Oscar. Email [email protected] to arrange a consultation.

Citing CCV

If you publish research that benefited from the use of CCV services or resources, we would greatly appreciate an acknowledgment that states:

CCV Account Information

Account Usage

Oscar users are not permitted to:

Share their accounts or passwords with others or enable unauthorized users to access Center for Computation and Visualization resources
Use Center for Computation and Visualization resources for personal economic gain
Engage in unauthorized activity (e.g., cryto currency mining etc.) that intentionally impacts integrity of resources

Storage

Each user (premium or exploratory) gets 20GB Home Directory, 512GB short-term Scratch, and 256G Data directory (shared amongst the members of group)

Files in Scratch Directory not accessed for last 30 days are automatically purged. CCV only stores snapshots for 7 days after that files will be automatically deleted.
PI has the ultimate access to Data Directory - if a student leaves Brown the files in Data directory will be owned by the PI.

Software and Data

All software and data stored or used on Center hosted systems must be appropriately and legally acquired and must be used in compliance with applicable licensing terms. Unauthorized misuse or copying of copyrighted materials is prohibited.

Data Retention

CCV reserves the right to remove any data at any time and/or transfer data or other individuals (such as Principal Investigators working on a same or similar project) after a user account is deleted is no longer affiliated with Brown University.

Accounts Validity

Once created, Oscar accounts are valid for duration of one's Brown AD credentials

Student Accounts

CCV provides access to HPC resources for classes, workshops, demonstrations, and other instructional uses. In general, the system is available for most types of instructional use at Brown where HPC resources are required, and we will do what we can to provide the resources necessary to help teach your class. We do ask that you follow these guidelines to help us better support your class.

Account Requests and Software Needs

Requests for class accounts should be made in writing to [email protected] two weeks prior to the beginning of class, and should be made in bulk. Please provide the Brown username (required), name and Brown Email address for the students, TAs and instructor as well as the course number and the semester. Requests for specific software should also be made two weeks before the start of the semester, and should be properly licensed, tested and verified to work by an instructor or TA.

Usage Expectations and System Utilization

Unless prior arrangements are made, student class accounts will have the same priority and access as free accounts on the CCV system. Access can be provided to specialized hardware or higher cores if needed provided it does not impact research use of the CCV systems. Be aware that usage of the CCV system is unpredictable, and high utilization of the system could impact a student's ability to finish assignments in a specific time period. We also encourage instructors to give an overview of the system and discuss computing policies before students use the system. CCV can provide resources (slides, documentation and in class workshops) to help prepare students to use HPC system. CCV staff are always available to meet directly with instructors and TAs to help prepare for classes and help setup specific software or environments for the class.

Support

It is expected that any class being taught using CCV resources will have its own TA. The TA should be the first line of support for any problems or questions the students may have regarding the use of the CCV system. CCV staff may not know specifics about how to use or run the programs the class is using, and can’t provide direct support to students for that software.

Class Guest Accounts

CCV will provide limited duration guests accounts that are custom tailored for the class use of the system. These accounts will have a username of “ccvws###”, and each account is associated with an individual student, instructor, or TA. Guest accounts are temporary and are only active for the duration of the class, and are deactivated at the conclusion of the semester/workshop. Account data is kept intact on our system for one semester after the conclusion of the class, and is then permanently deleted from the CCV system.

To request student accounts for a course, please contact us by emailing [email protected].

Offboarding

Account and Access

Oscar users will keep their access to Oscar as long as their Brown account are still active. To be able to access Oscar account after a user's Brown account is deactivated, the user needs to get an affiliate account through the department the user is associated.

It is the best that your affiliate account keeps the same username as your previous Brown account. Otherwise, please contact [email protected] to migrate your Oscar account to your affiliate account.

If you are not able to connect to Oscar with your affiliate account, please contact [email protected] for help.

Data

Data Retention

Your data (directories and files) will stay in Oscar for one years after your Brown account is deactivated. After that your data will be archived.

Date Deletion

You may delete your data when you leave Brown University. Or you may request that CCV delete your data on Oscar, especially if you have lots of data.

A PI owns the PI's data directories and can delete all files there.

Retrieve Data

You can download data from Oscar following the instructions here. Globus is recommended for large data transfer.

Billing

If you are a PI and want to keep your priority accounts and/or data directories after leaving Brown University, please contact [email protected] to update your billing information.

Connecting to Oscar

SSH (Terminal)

To log in to Oscar you need Secure Shell (SSH) on your computer.

You need log in using your Brown password. Old Oscar password can not be used for ssh any more.

There are two options for signing into Oscar: with or without VPN.

If you are connected to the Brown VPN, you have the option of using an SSH key pair to connect to Oscar without having to enter your password.

Summary of SSH Hosts

ssh.ccv.brown.edu You can connect from anywhere. You will need Two Factor Authentication
sshcampus.ccv.brown.edu You can connect when whithin Brown Wifi, Network or VPN. You will need to set up passwordless authentication.
poodcit4.services.brown.edu This is the host to be used when connecting from a remote IDE, i.e., Visual Studio Code.
transfer.ccv.brown.edu This host is used to transfer files to/from oscar using SFTP protocol

macOS and Linux

To log in to Oscar, open a terminal and

If you are not connected to the Brown VPN, use the following command:

ssh -X [email protected]

If you are connected to the Brown VPN, use the following command:

ssh -X [email protected]

The -X allows Oscar to display windows on your machine. This allows you to open and use GUI-based applications, such as the text editor gedit.

Watch our videos on SSHing on Linux and SSHing on Mac.

Windows

Windows users need to install an SSH client. We recommend PuTTY, a free SSH client for Windows.

If you are not connected to the Brown VPN, use [email protected] as the Host Name and click Open.

If you are connected to the Brown VPN, use [email protected] as the Host Name and click Open.

Confused? Watch our tutorial on PuTTY installation or SSHing to Oscar on Windows.

Connecting to Oscar for the First Time

The first time you connect to Oscar you will see a message about the authenticity of the host:

The authenticity of host 'ssh.ccv.brown.edu (138.16.172.8)' can't be established.
RSA key fingerprint is SHA256:Nt***************vL3cH7A.
Are you sure you want to continue connecting (yes/no)?

You can type yes and press return. On subsequent logins you should not see this message.

You will then be prompted for your password.

Nothing will show up on the screen as you type in your password. Just type it in and press enter.

You will now be in your home directory on Oscar. In your terminal you will see a prompt like this:

[username@login004 ~]$

Congratulations, you are now on one of the Oscar login nodes! The login nodes are for administrative tasks such as editing files and compiling code. To use Oscar for computation you will need to use the compute nodes. To get to the compute nodes from the login nodes you can either start an interactive session on a compute node, or submit a batch job.

Please do not run CPU-intense or long-running programs directly on the login nodes! The login nodes are shared by many users, and you will interrupt other users' work.

SSH Key Login (Passwordless SSH)

How to set up SSH key authentication.

When connecting from a campus network to sshcampus.ccv.brown.edu you can set up SSH keys as a form of authentication instead of having to enter your password interactively. Follow the insctructions below that correspond to your operating system/connection method.

Mac/Linux/Windows(PowerShell)

Step 1 : Check for existing SSH key pair

Before generating new SSH key pair first check if you have an SSH key on your local machine.

If there are existing keys, please move to Step 3

Step 2 : Generate a new SSH Keypair

Press Enter to accept the default file location and file name.

The ssh-keygen will ask you to type a secure passphrase. This is optional. If you don't want to use a passphrase just press Enter

Verify the SSH keys are generated correctly, you should see two files id_rsa and id_rsa.pub under ~/.ssh directory.

DO NOT upload or send the private key.

Step 3 : Copy the public key to Oscar

You will now need to copy your public key to Oscar. There are two ways to acomplish this.

With ssh-copy-id

If your OS comes with the ssh-copy-id utility, then you'll be able to copy your public key into Oscar as follows:

You will be prompted for a Password. The public key will be appended to the authorized_keys file on Oscar.

If you used a custom name for your key instead of the default id_rsa then you'll need pass the name of your key to ssh-copy-id i.e.,

Without ssh-copy-id

If your system does not come with the ssh-copy-id utility installed, then you'll need to copy your public key by hand.

Get the contents of id_rsa.pub file. One option is to use cat in your teminal cat id_rsa.pub.
Copy the contents of this file to your clipboard, as we need to upload it to Oscar.
Login into Oscar via regular ssh ssh <username>@ssh.ccv.brown.edu. Once you are on the login node, open the authorized_keys file with your text editor of choice e.g., vim ~/.ssh/authorized_keysor nano ~/.ssh/authorized_keysAdd your public keys to end of this file. Save and exit.

If everything went well, you will be logged in immediately withouth prompting you for a password.

Windows(PuTTY)

Key Generation & Setup

Open PuTTYgen (this comes as part of the PuTTY package), change the 'Number of bits in a generated key:' to 4096 (recommended), then click 'Generate'

2. Move your cursor around randomly in order to "salt" your key, while the key is being generated. Once the key is generated, you should see something like this:

3. Replace the text in the 'Key comment:' field with something recognizable and enter a passphrase in the two fields below.

4. Copy the text in the 'Public key for pasting...' field (the text continues past what is displayed) and paste it wherever the public key is needed. If you are using GitHub, you can now create a new SSH key in your Personal Settings and paste this text into the 'Key' field.

5. Click on 'Save private key' and select a logical/recognizable name and directory for the file. Your private key is saved in the selected file.

6. Open Pageant (also part of the PuTTY package). If a message saying "Pageant is already running" is displayed, open your system tray and double click on the Pageant icon.

To open your system tray, click on the up arrow (looks like: ^ ) icon at the bottom right of your screen (assuming your taskbar is at the bottom of your screen).

7. Click on 'Add Key' and select the file you saved when generating your key earlier (Step 5). If it is requested, enter the passphrase you created at Step 3 to complete the process.

In order to not have to add the key to Pageant after every time your machine reboots, you can add the key file(s) to your Windows startup folder (the directory for the current user is C:\Users\[User Name]\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup). You may still be prompted to enter the passphrase after a reboot, but you will not have to find and add the key to Pageant every time.

X-Forwarding

Instructions to forward X11 applications from Oscar to local computer

If you have an installation of X11 on your local system, you can access Oscar with X forwarding enabled, so that the windows, menus, cursor, etc. of any X applications running on Oscar are all forwarded to your local X11 server. Here are some resources for setting up X11:

Mac OS - https://www.xquartz.org
Windows - https://sourceforge.net/projects/xming

One limitation of X forwarding is its sensitivity to your network connection's latency. We advise against using X forwarding from a connection outside of the Brown campus network, since you will likely experience lag between your actions and their response in the GUI.

Mac/Linux

Once your X11 server is running locally, open a terminal and use

$ ssh -X <user>@ssh.ccv.brown.edu

to establish the X forwarding connection. Then, you can launch GUI applications from Oscar and they will be displayed locally on your X11 server.

Windows (PuTTY)

For Windows users using PuTTY, enable X forwarding under Connections->SSH->X11:

SSH Agent Forwarding

How to forward local ssh keys to Oscar

SSH provides a method of sharing the ssh keys on your local machine with Oscar. This feature is called Agent Forwarding and can be useful for instance when working with version control or other services that authenticate via ssh keys. Below are instructions on how to configure your SSH connection to forward ssh-agent for diffeent operating systems

Mac/Linux

Agent Forwarding in Mac and Linux Systems

Start the SSH-Agent

First, start your ssh-agent with the command below.

$ eval $(ssh-agent)

You should see an output similar to this:

Agent pid 48792

Add Key(s)

Next, add your ssh private keys to the running agent (using the ssh-add command on line 1). This step may be repeated for every key pair you use to connect to different git servers. For most, this file is called id_rsa and will live in ~/.ssh/id_rsa. If you set a password for your ssh keys, the agent will prompt you to enter them.

$ ssh-add ~/.ssh/id_rsa
Enter passphrase for ~/.ssh/id_rsa:
Identity added: ~/.ssh/id_rsa

Confirm the ssh keys have been loaded into the agent with ssh-add -L:

$ ssh-add -L
ssh-rsa AAAAB3NzaC1y...CQ0jPj2VG3Mjx2NR user@computer

Connect to Oscar

Now ssh into Oscar with the -A option as shown on the first line below (replace username with your Oscar username). -A will forward your ssh-agent to Oscar, enabling you to use the ssh keys on your laptop while logged into Oscar.

$ ssh -A [email protected]

If you have ssh keys setup on your local computer to connect to GitHub, you can confirm your ssh-agent was properly forwarded by checking GitHub . If the ssh command fails, your agent has not been properly forwarded.

$ ssh [email protected]

Hi JaneDoe! You've successfully authenticated, but GitHub does not provide shell access.
Connection to github.com closed.

Always connecting with Agent Forwarding

To make these changes permanent, you can add the ForwardAgent yes option to your ssh configuration file. To learn more about configuring your ssh connections, visit

Windows (PuTTY)

SSH Agent Forwarding on a Windows system using PuTTY, with an example application to git.

Agent Forwarding with PuTTY

Once adding your private key to Pageant, open PuTTY and navigate to the Auth menu.

2. Check the 'Allow agent forwarding' checkbox, and return to the Session menu.

3. Enter the Host Name you usually use to connect to Oscar, and click 'Open'.

4. Entering your password. If you have ssh keys setup on your local computer to connect to GitHub, you can confirm your ssh-agent was properly forwarded by checking GitHub . If the ssh command fails, your agent has not been properly forwarded.

Open OnDemand

Open OnDemand (OOD) is a web portal to the Oscar computing cluster. An Oscar account is required to access Open OnDemand. Visit this link in a web browser and sign in with your Brown username and password to access this portal.

Open on Demand

Intro to Open OnDemand Slides

OOD provides with a several resources for interacting with Oscar.

Use the File Explorer in the portal to view, copy, download or delete files on Oscar.
Launch interactive apps/softwares, like Matlab and Jupyter Notebook, inside your web browser.
Access the Oscar shell with your browser without needing a separate terminal emulator. This is especially handy for Windows users, since you do not need to install a separate program.

Features:

No installation needed. Just use your favorite browser!
No need to enter your password again. SSH into Oscar in seconds!
No need to use two-factor authentication multiple times. Just do it once, when you log into OOD.
Use it with, or without, VPN. Your workflow remains the same.

Web-based Terminal App

Open OnDemand offers a browser-based terminal app to access Oscar. Windows users who do not want to install an SSH client like Putty will find this app very useful.

Accessing the terminal

Log in to https://ood.ccv.brown.edu
In the top menu, click Clusters -> >_OSCAR Shell Access

A new tab will open and the web-based terminal app will be launched in it. The shell will be launched on one of the login nodes.

The shell DOES NOT start on a compute node. Please do not run computations or simulations on the login nodes, because they are shared with other users. You can use the login nodes to compile your code, manage files, and launch jobs on the compute nodes.

3. You are logged into one of the login nodes. You can launch batch jobs from this terminal or start an interactive job for anything computationally intensive.

Features:

No installation needed. Just use your favorite browser!
No need to enter your password again. SSH into Oscar in seconds!
No need to use two factor authentication again. Just do it once, when you log into OOD.
Use it with, or without, VPN. Your workflow remains the same.

Interactive Apps on OOD

You can launch several different apps on the Open OnDemand (OOD) interface. All of these apps start of a Slurm batch job on the Oscar cluster with the requested amount of resources. These jobs can access the filesystem on Oscar and all output files are written to the Oscar's file system.

Launching an App on OOD

Open on any browser of the your choice
If prompted, enter your Brown username and password.
Click on the "Interactive Apps" tab at the top of the screen to see the list of available apps. This will open the form to enter the details of the job.
Follow the instructions on the form to complete it. Some of fields can be left blank and OOD will choose the default option for you.
Click Launch to submit an OOD job. This will open a new tab on the browser It may take a few minutes for this job to start.
Click "Launch <APP>" again if prompted in the next tab.

SLURM limits on resources such CPUs, memory, GPUs or time for each partition still applies for OOD jobs. Please keep these in mind before choosing these options on the OOD form.

When submit a batch job from a terminal of the Desktop app or the Advanced Desktop app, users need to

run "unset SLURM_MEM_PER_NODE"before submitting a job if the job needs to specify --mem-per-cpu
run "unset SLURM_EXPORT_ENV" before submitting an MPI job

Using Python or Conda environments in the Jupyter App

We recommend all users to install Python packages within an environment. This can be a Conda to a python virtual environment. More information can be found . Follow these steps to use such environments in the .

Python Environments:

One Time Setup:

Open a terminal on Oscar.
Load the relevant python module and create and/or activate the environment. See this page for more information about creating .
Run pip install notebook to install Jupyter notebook, if not already installed.
Run pip install ipykernel to install ipykernel in this environment.
Run python -m ipykernel install --user --name=<myenv> where <myenv> is the name of the environment.

Launching Jupyter Notebook

Open the "Basic Jupyter Notebook for Python Environments" app on the Open OnDemand interface
Under "Python Module on Oscar", choose the python module you loaded when the environment was created.
Under "Python Virtual Environment", add the name of the Virtual Environment you created. Note: If your virtual environment is not at the top level of your home directory, you should input the absolute path to the environment directory.
Under the "Modules" , enter the name of the python module used to create the environment. Add any additional modules you may need separated with a space.
Choose the other options as required.
Click "Launch" to start the job
Click "Connect to Jupyter" on the next screen.
To start a new notebook, click "New" -> <myenv> where <myenv> is the environment.
For starting a pre-existing notebook, open the notebook. In the Jupyter interface, click "Kernel" -> "Change Kernel" -> <myenv> where myenv is the name of the environment.

Conda Environments

One Time Setup:

Open a terminal on Oscar.
Activate the conda environment.
Run pip install notebook to install Jupyter notebook, if not already installed.
Run pip install ipykernel to install ipykernel in this environment.
Run python -m ipykernel install --user --name=<myenv> where <myenv> is the name of the environment.

Launching Jupyter Notebook

Open the "Basic Jupyter Notebook with Anaconda" app on the Open OnDemand interface
Under "Oscar Anaconda module", choose "anaconda/2020.02"
Enter the name of the conda environment in "Conda Env"
Choose the other options as required.
Click "Launch" to start the job
Click "Connect to Jupyter" on the next screen.
To start a new notebook, click "New" -> <myenv> where <myenv> is the environment.
For starting a pre-existing notebook, open the notebook. In the Jupyter interface, click "Kernel" -> "Change Kernel" -> <myenv> where myenv is the name of the environment.

Using RStudio

RStudio is an IDE for R that can be run on Oscar.

Launching RStudio

Open the Open On Demand Dashboard by following . Select RStudio (under "Default GUI's"). Fill in the form to allocate the required resources, and optionally select your R modules. Finally, click the "Launch Session" button.

Known Issues

Plotting figures may not work within RStudio. If this is the case, save the plots to a file, and view them through the Open On Demand Desktop App. If plots are required for your task, launch RStudio through the Desktop App.

To learn about using the Open On Demand Desktop App, look .

From Non-compliant Networks (2-FA)

Accessing VSCode from Non-Brown compliant networks

This guide is only for users connecting from Non-Brown Compliant Networks. 2-FA is mandatory.

Install the Remote Development extension pack for VSCode
Open VSCode settings

On Windows/Linux - File > Preferences > Settings
On macOS - Code > Preferences > Settings

Search for symlink and make sure the symlink searching is unchecked

3. Under VSCode settings, search for remote ssh timeout and manually enter a timeout value i.e. 50s. It should give you enough time to complete 2-Factor Authentication.

4. Edit the ~/.ssh/config file on your local machine, add the following lines. Replace <username> with your Oscar username.

# Jump box with public IP address
Host jump-box
  HostName ssh8.ccv.brown.edu
  User <username>
# Target machine with private IP address
Host ccv-vscode-node
  HostName vscode1
  User <username>
  ProxyCommand ssh -q -W %h:%p jump-box

6. In VSCode, select Remote-SSH: Connect to Host… and after the list populates select ccv-vscode-node

When prompted in VSCode, please enter your Brown password and complete the DUO authentication. After that, wait about 30 seconds and VSCode should connect to Oscar.

Setup virtual environment and debugger

If you have an existing virtual environment, proceed to step 2. Otherwise, to create a new virtual environment:

2. Search for Python.VenvPath as shown in the picture below:

3. VSCode expects you to have multiple virtual environments for each of your different python projects, and it expects you to put them all in the same directory. Pointing to the parent directory lets it scan and find all expected virtual environments, and then you can easily toggle between them in interface.

4. Once you have the virtual environment selected, the debugging capabilities should work.

Managing files

Oscar's Filesystem

CCV uses all-flash parallel filesystem (Vast Data). Users have a home, data, and scratch space.

home ~

100GB of space
Optimized for many small files
30 days snapshots
The quota is per individual user
A grace period of 14 days

data ~/data

Each PI gets 256GB for free
Optimized for reading large files
30 days snapshots
The quota is by group
A grace period of 14 days

scratch ~/scratch

512G (soft-quota): 12T (hard-quota)
Optimized for reading/writing large files
30 days snapshots
Purging: Files not accessed for 30 days may be deleted
The quota is per individual user
A grace period of 21 days

Files not accessed for 30 days will be deleted from your scratch directory. This is because scratch is high-performance space. The fuller scratch is the worse the read/write performance. Use ~/data for files you need to keep long-term.

The scratch purge is on individual files. It is by 'atime' which is when the file was last read. You can use 'find' to find files that are at risk of being purged, e.g. to find files in the current directory that have not been accessed in the last 25 days:

find . -atime +25

Note: class or temporary accounts may not have a ~/data directory!

To see how much space on your directories, you can use the command checkquota. Below is an example output

You can go over your quota up to the hard limit for a grace period. This grace period is to give you time to manage your files. When the grace period expires you will be unable to write any files until you are back under quota.

There is a quota for space used and for number of files. If you hit the hard limit on either of these you will be unable to write any more files until you are back under quota.

Keep the number of files within the ranges from 0.5M (preferred) to 1M (upper limit). Going beyond this limit can lead to unexpected problems.

Transferring Files between Oscar and Campus File Storage (Replicated and Non-Replicated)

You may use either Globus (recommended) or smbclient to transfer data between Oscar and Campus File Storage.

Globus

Follow the instructions here for transferring data between files.brown.edu and Oscar.

smbclient

You can transfer files between Campus File Storage and Oscar using smbclient.

Transfer Instructions

1) Log into Oscar:

   ssh ssh.ccv.brown.edu

2) Start a screen session. This will allow you to reattach to your terminal window if you disconnect.

    screen

3) To use Oscar's high-speed connection to Campus File Storage - Replicated:

    smbclient "//smb.isi.ccv.brown.edu/SHARE_NAME" -D DIRECTORY_NAME -U "ad\BROWN_ID" -m SMB3

Similarly to access Campus File Storage - Non-Replicated ( LRS: Locally Redundant Share)

smbclient "//smblrs.ccv.brown.edu/Research" -D DIRECTORY_NAME -U "ad\BROWN_ID" -m SMB3

Replace SHARE_NAME, DIRECTORY_NAME, and BROWN_ID. DIRECTORY_NAME is an optional parameter. The password required is your Brown password.

4) Upload/download your data using the FTP "put"/"get" commands. Replace DIRECTORY_NAME with the folder you'd like to upload.

   put DIRECTORY_NAME

5) You can detach from the screen session with a "CTRL+A D" keypress. To reattach to your session:

   screen -r

smbclient basics

put is upload to Campus File Storage

Usage: put <local_file> [remote file name]

Copy <local_file> from Oscar to Campus File Storage. The remote file name is optional (use if you want to rename the file)

get is download to Oscar

Usage: get <remote_file> [local file name] Copy <remote_file> from the Campus File Storage to Oscar. The local file name is optional (use if you want to rename the file)

Moving more than one file:

To move more than one file at once use mput or mget. By default:

recurse is OFF. smbclient will not recurse into any subdirectories when copying files

prompt is ON. smbclient will ask for confirmation for each file in the subdirectories

You can toggle recursion ON/OFF with:

recurse

You can toggle prompt OFF/ON with:

prompt

Resolving quota issues

This is a quick guide for resolving issues related to file system quotas. To read more details about these quotas, refer to this page.

Step 1: Identify the directory

Run the checkquota command and identify the line that shows the warning status message.

If this directory is either /oscar/home or /oscar/scratch , you will have to take the subsequent steps to resolve this issue. If the directory is data+<group> you should inform others in your group and take collective action to resolve this issue.

Step 2: Disk Space or Inodes

Check whether you have exceeded your disk space quota or your inodes quota. Disk space usage is specified in GB or TB while inodes usage is just numerical count.

Step 3: Remove files

You will need to take the following steps based on the quota you have exceeded.

Disk Space quota:

The fastest way to reduce this usage is identifying large and unnecessary files. Load the module ncdu using the command module load ncdu and run ncdu in the offending directory. This utility will scan that directory and show you all the directories and files, sorted by their size. If they are not sorted by size, press lowercase s to sort them by size. You can navigate the directory tree using the arrow keys and delete any files or directories that are unnecessary.

Some programs leave a lot of temporary files on the disk that may not be necessary.

Apptainer: Run the command apptainer cache clean to clear the apptainer cache. This will clear up the cache in your home directory without affecting any container images. However, pulling a new image from a repository may be slower in the future.
Conda: Run the command conda clean -ato delete any tarballs downloaded by conda. This does not affect any existing conda or python virtual environments. However, it may slow down the installation of some packages in the future
Core Dump Files: This files are typically named core.<number> A core dump file is generated when a program crashes. It contains the state of the system and it is useful for debugging purposes. You can safely delete any core dump files if you know the reason behind the crash. Old core dump files can take up a lot of disk space and they can be safely deleted.

Inodes quota:

Inode usage can be reduced by removing any files and directories OR tarring up large nested directories. When a directory is converted to a tar ball, it uses a single inode instead of one inode per directory or file. This can drastically decrease your inode usage. Identify directories that contain a large number of files or a very large nested tree of directories with a lot of files.

To identify such directories, load the module ncdu using the command module load ncdu and run ncdu in the offending directory. This utility will scan that directory and show you all the directories and files, sorted by their size. Press uppercase C to switch the sorting criteria to "number of items". You can navigate the directory tree using the arrow keys and delete or tar any files or directories that are unnecessary.

To create a tar ball of a directory:

tar -cvf <directory_name>.tar.gz <directory_name>

If your usage has exceeded quota and you cannot write to the directory, you can tar ball in another directory. Using this command, you can create a tar ball in the scratch directory:

tar -cvf /oscar/scratch/$USER/<directory_name>.tar.gz <directory_name>

Understanding Disk Quotas

Checkquota

Use the command checkquota to view your current disk usage and quotas. Here's an example output of this command

Each line represents a top level directory that you have access to.

Each column represents a usage or quota for these directories.

Types of Quota:

Disk usage and quotas are calculated separately for . Two types of quotas are calculated for each of these directories:

Disk space usage

This usage is expressed in Gigabytes (G) or Terabytes (T) . This is the total size of all the files in that directory and it does not depend upon the number of files. Run the command checkquota to see your disk usage and quota. Here's an example:

Inode usage

This is the total number of files and directories in the particular directory. This number does not depend upon the size of the files. Run the command checkquota to see your inode usage and quota. Here's an example:

Soft Limits vs Hard Limits

All quotas have a soft limit (SLimit) and hard limit (HLimit). When usage exceeds the soft limit, a grace period associated with this limit begins. During the grace period, the usage is allowed to increase up to the hard limit. When the usage reaches the hard limit or when the grace period expires, the user is not allowed to write any files to that particular directory.

Usage State

The "Usage State" column shows the status of the grace period for a particular directory. Here are some of the status messages:

`SOFT_EXCEEDED`

This indicates that your usage of the disk space or inodes has exceeded the soft limit and you are still within the grace period. Check the Grace_Period column to see the number of days left in the grace period. You may continue writing data into this directory until the end of the grace period, as long as you do not exceed the hard limit

`GRACE_EXPIRED`

This indicates that your usage has exceeded the soft limit AND the grace period has expired. You will not be able to write data into that directory, but you can remove files.

`HARD_EXCEEDED`

This indicates that your usage has reached the hard limit. You will not be able to write data into that directory, but you can remove data.

`OK`

This indicates that your usage of the disk space as well as inodes in within the soft quota.

Inspecting Disk Usage (Ncdu)

To determine the sizes of files and discover the largest files in a directory, one can use the Ncdu module.

To get started with NCDU, load the module using the following command:

module load ncdu/1.14

Once the module has been loaded, it can be used to easily show the size of all files within a directory:

ncdu my_directory

To view options you can use with the ncdu command, simply use the command ncdu --help

The line above uses Ncdu to rank all of the files within the my_directory directory. Your window should change to show a loading screen (if the directory doesn't have a lot in it, you may not even see this screen):

Once Ncdu has finished loading, you will see a result like this:

The files will be ordered with the largest file at the top and the smallest file at the bottom. The bottom left corner shows the Total disk usage (which in this case is 25.5 KiB). To quit out of this display, simply press q on your keyboard.

If there is a subdirectory within the directory you're inspecting, the files and directories within that subdirectory can be viewed by selecting the directory with the gray bar (using up and down arrow keys as needed) and then using the right arrow key.

Restoring Deleted Files

Nightly snaphots of the file system are available for the last 30 days.

CCV does not guarantee that each of the last 30 days will be available in snapshots because occasionally the snapshot process does not complete within 24 hours.

Restore a file from a snapshot in the last 30 days

Nightly snapshots of the file system are available for the last 30 days can be found in the following directories.

Home directory snapshot

Data directory snapshot

Scratch directory snapshot

To restore a file, copy the file from the snapshot to your directory.

Do not use the links in your home directory snapshot to try and retrieve snapshots of data and scratch. The links will always point to the current versions of these files. An easy way to check what a link is pointing to is to use ls -l

e.g.:

Best Practices for I/O

Efficient I/O is essential for good performance in data-intensive applications. Often, the file system is a substantial bottleneck on HPC systems, because CPU and memory technology has improved much more drastically in the last few decades than I/O technology.

Parallel I/O libraries such as MPI-IO, HDF5 and netCDF can help parallelize, aggregate and efficiently manage I/O operations. HDF5 and netCDF also have the benefit of using self-describing binary file formats that support complex data models and provide system portability. However, some simple guidelines can be used for almost any type of I/O on Oscar:

Try to aggregate small chunks of data into larger reads and writes.
For the GPFS file systems, reads and writes in multiples of 512KB
provide the highest bandwidth.
Avoid using ASCII representations of your data. They will usually
require much more space to store, and require conversion to/from
binary when reading/writing.
Avoid creating directory hierarchies with thousands or millions of
files in a directory. This causes a significant overhead in managing
file metadata.

While it may seem convenient to use a directory hierarchy for managing large sets of very small files, this causes severe performance problems due to the large amount of file metadata. A better approach might be to implement the data hierarchy inside a single HDF5 file using HDF5's grouping and dataset mechanisms. This single data file would exhibit better I/O performance and would also be more portable than the directory approach.

Version Control

Git Overview

Version Control refers to the management of changes made to source code or any such large amount of information in a robust manner by multiple collaborators. Git is by far the most popular version control system.

Git enables effective collaboration among developers. In a team setting, multiple developers often work on the same project simultaneously. With Git, each developer can work on their own local copy of the project, making changes and experimenting freely without affecting the main codebase. Git allows developers to merge their changes seamlessly, ensuring that modifications made by different individuals can be consolidated efficiently. It provides mechanisms to track who made specific changes, making it easier to understand the evolution of the project and identify potential issues.

Git Workflow

Nearly all operations that are performed by Git are in you local computing environment, for the exception of few used purely to synchronize with a remote. Some of the most common git operations are depicted below. In summary a typical flow consists of making changes to your files, staging them via git add, marking a save point via git commit, then finally syncing to your remote (e.g., GitHub) via git push. If you are pushing changes to your remote from multiple places, you can bring changes your most recent version using git pull, which is the equivalent of doing git fetch followed by a git merge operation

Cheatsheet

Below are some of the most commonly used Git commands. You can also get much more information by running git --help. And if you'd like to learn more there is an

Command

Summary

Git Configuration

While using Git on Oscar, make sure that you to have your correct Name and Email ID to avoid confusion while working with remote repositories (e.g., GitHub, GitLab, BitBucket).

Getting Out of Trouble

Git can sometimes be a bit tricky. And we all eventually find ourselves in a place where we want to undo something or fix a mistake we made with Git. (pardon the profanity) has a bunch of really excellent solutions to common problems we sometimes run in to with Git.

Submitting jobs

Running Jobs

Oscar is a shared machine used by hundreds of users at once. User requests are called jobs. A job is the combination of the resource requested and the program you want to run on the compute nodes of the Oscar cluster. On Oscar, Slurm is used to schedule and manage jobs.

Jobs can be run on Oscar in two different ways:

Interactive jobs allow the user to interact with programs (e.g., by entering input manually, using a GUI) while they are running. However, if your connection to the system is interrupted, the job will abort. Small jobs with short run times and jobs that require the use of a GUI are best-suited for running interactively.
Batch jobs allow you to submit a script that tells the cluster how to run your program. Your program can run for long periods of time in the background, so you don't need to be connected to Oscar. The output of your program is continuously written to an output file that you can view both during and after your program runs.

Jobs are scheduled to run on the cluster according to your account priority and the resources you request (i.e., cores, memory, and runtime). In general, the fewer resources you request, the less time your job will spend waiting in the queue.

Please do not run CPU-intense or long-running programs directly on the login nodes! The login nodes are shared by many users, and you will interrupt other users' work.

Interactive Jobs

To start an interactive session for running serial or threaded programs on an Oscar compute node, simply run the command interact from the login node:

interact

By default, this will create an interactive session that reserves 1 core and 4GB of memory for a period of 30 minutes. You can change the resources reserved for the session from these default limits by modifying the interact command:

usage: interact [-n cores] [-t walltime] [-m memory] [-q queue]
                [-o outfile] [-X] [-f featurelist] [-h hostname] [-g ngpus]

Starts an interactive job by wrapping the SLURM 'salloc' and 'srun' commands.

options:
  -n cores        (default: 1)
  -t walltime     as hh:mm:ss (default: 30:00)
  -m memory       as #[k|m|g] (default: 4g)
  -q queue        (default: 'batch')
  -o outfile      save a copy of the sessions output to outfile (default: off)
  -X              enable X forwarding (default: no)
  -f featurelist  CCV-defined node features (e.g., 'e5-2600'),
                  combined with '&' and '|' (default: none)
  -h hostname     only run on the specific node 'hostname'
                  (default: none, use any available node)
  -a account      user SLURM accounting account name
  -g ngpus        number of GPUs

For example, the command

$ interact -n 20 -t 01:00:00 -m 10g

requests an interactive session with 20 cores and 10 GB of memory (per node) for a period of 1 hour.

Keeping Interactive Jobs Alive:

If you lose connectivity to your login node, you lose access to your interactive job. To mitigate this issue you can use screen to keep your connection alive. For more information on using screen on the login nodes, see the software section

Managing Jobs

Listing running and queued jobs

The squeue command will list all jobs scheduled in the cluster. We have also written wrappers for squeue on Oscar that you may find more convenient:

Viewing estimated time until completion for pending jobs

This command will list all of your pending jobs and the estimated time until completion.

Canceling jobs

View details about completed jobs

`sacct`

The sacct command will list all of your running, queued and completed jobs since midnight of the previous day. To pick an earlier start date, specify it with the -S option:

To find out more information about a specific job, such as its exit status or the amount of runtime or memory it used, specify the -l ("long" format) and -j options with the job ID:

(example)

`myjobinfo`

The myjobinfo command uses the sacct command to display "Elapsed Time", "Requested Memory" and "Maximum Memory used on any one Node" for your jobs. This can be used to optimize the requested time and memory to have the job started as early as possible. Make sure you request a conservative amount based on how much was used.

ReqMem shows the requested memory: A c at the end of number represents Memory Per CPU, a n represents Memory Per Node. MaxRSS is the maximum memory used on any one node. Note that memory specified to sbatch using --mem is Per Node.

`jobstats`

The 'jobstats' utility is now available for analyzing recently completed jobs, comparing the resources used to those requested in the job script, including CPU, GPU, and memory. If email notifications are enabled, 'jobstats' sends an email with the results and includes a prompt to contact support for help with resource requests.

Run this command in a bash shell on Oscar. No additional module needs to be loaded.

To send this output to your email after the job is completed, make sure that these lines are in your job submit script

Job Arrays

A job array is a collection of jobs that all run the same program, but on different values of a parameter. It is very useful for running parameter sweeps, since you don't have to write a separate batch script for each parameter setting.

To use a job array, add the option:

#SBATCH --array=<range>

in your batch script. The range can be a comma separated list of integers, along with ranges separated by a dash. For example:

1-20
1-10,12,14,16-20

A job will be submitted for each value in the range. The values in the range will be substituted for the variable $SLURM_ARRAY_TASK_ID in the remainder of the script. Here is an example of a script for running a serial Matlab script on 16 different parameters by submitting 16 different jobs as an array:

#!/bin/bash
#SBATCH -J MATLAB
#SBATCH -t 1:00:00
#SBATCH --array=1-16

# Use '%A' for array-job ID, '%J' for job ID and '%a' for task ID
#SBATCH -e arrayjob-%a.err
#SBATCH -o arrayjob-%a.out

echo "Starting job $SLURM_ARRAY_TASK_ID on $HOSTNAME"
matlab -r "MyMatlabFunction($SLURM_ARRAY_TASK_ID); quit;"

You can then submit the multiple jobs using a single sbatch command:

$ sbatch <jobscript>

The $SLURM_ARRAY_TASK_ID can be manipulated as needed. For example, you can generate a fixed length number form it. The following example generates a number of length of 3 from $SLURM_ARRAY_TASK_ID.

#!/bin/bash
#SBATCH -J MATLAB
#SBATCH -t 1:00:00
#SBATCH --array=1-16

# Use '%A' for array-job ID, '%J' for job ID and '%a' for task ID
#SBATCH -e arrayjob-%a.err
#SBATCH -o arrayjob-%a.out

echo "Starting job $SLURM_ARRAY_TASK_ID on $HOSTNAME"
t=`printf "%03d" $SLURM_ARRAY_TASK_ID`
matlab -r "MyMatlabFunction($t); quit;"

For more info: https://slurm.schedmd.com/job_array.html

Condo/Priority Jobs

Note: we do not provide users condo access by default if their group/PI has a condo on the system. You will have to explicitly request a condo access and we will ask for approval from the PI.

To use your condo account to submit jobs, please follow the steps below to check the association of your Oscar account and include condo information in your batch script or command line.

Step 1 - Check your account associations to find your condo Account and Partition information by running the following command:

In the example below, the user has access to two condos, where their Account and Partition are highlighted.

slurmctld|abcd-condo|ccvdemo1|batch|1|||||||||||||abcd-condo|abcd-condo||

slurmctld|default|ccvdemo1|abcd-condo|1|||||||||||||abcd-condo|abcd-condo||

Step 2 - Choose the correct way to submit jobs to a condo according to the condo's Account column:

For batch script - Please include the following line:

For command line - You can also provide this option on the command line while submitting the job using sbatch:

For interactive session - Similarly, you can change the account while asking for interactive access too:

For batch script - Please include the following line:

For command line - You can also provide this option on the command line while submitting the job using sbatch:

For interactive session - Similarly, you can change the account while asking for interactive access too:

To see the running and pending jobs in a condo:

condo <condo-name>

Premium Account (priority) jobs

If you have a premium account, that should be your default QOS for submitting jobs. You can check if you have a premium account with the command groups. If you have a priority account you will see priority in your the output form groups.

You can check the qos for a running job by running the command myq. The QOS column should show "pri-<username>"

If you are interested in seeing all your accounts and associations, you can use the following command:

Dependent Jobs

Here is an example script for running dependent jobs on Oscar.

There are 3 batch jobs. Each job has it's own batch script: job1.sh, job2,sh, jobs.sh. The script above (script.sh) submits the three jobs.

line 4: job1 is submitted.

line 7: job2 depends on job1 finishing successfully.

line 10: job3 depends on job2 finishing successfully.

To use the above script to submit the 3 jobs, run the script as follows:

./script.sh

For details on the types of dependencies you can use in slurm see the manual page.

GPU Computing

Grace Hopper GH200 GPUs

Oscar has two Grace Hopper GH200 GPU nodes. Each node combines Nvidia Grace Arm CPU and Hopper GPU architecture.

Hardware Specifications

Each GH200 node has 72 Arm cores with 550G memory. Multiple-Install GPU (MIG) is enabled on only one GH200 node that has 4 MIGs. The other GH200 node doesn't have MIGs and only one GPU. Both CPU and GPU threads on GH200 nodes can now concurrently and transparently access both CPU and GPU memory.

Access

The two GH200 nodes are in the gracehopper partition.

gk-condo Account

A gk-condo user can submit jobs to the GH200 nodes with their gk-gh200-gcondo account, i.e.,

#SBATCH --account=gk-gh200-gcondo
#SBATCH --partition=gracehopper

CCV Account

For users who are not a gk-condo user, a High End GPU priority account is required for accessing the gracehopper partition and GH200 nodes. All users with access to the GH200 nodes need to submit jobs to the nodes with the ccv-gh200-gcondo account, i.e.

#SBATCH --account=ccv-gh200-gcondo
#SBATCH --partition=gracehopper

MIG Access

To request a MIG, the feature mig needs be specified, i.e.

#SBATCH --constraint=mig

Running NGC Containers

NGC containers provide the best performance from the GH200 nodes. Running tensorflow containers is an example for running NGC containers.

A NGC container must be built on a GH200 node for the container to run on GH200 nodes

Running Modules

The two nodes have Arm CPUs. So Oscar modules do not run on the two GH200 nodes. Please contact [email protected] about installing and running modules on GH200 nodes.

H100 NVL Tensor Core GPUs

Oscar has two H100 nodes. is based on the that accelerates the training of AI models. The two DGX nodes provides better performance when multiple GPUS are used, in particular with Nvidia software like .

Multiple-Instance GPU (MIG) is not enabled on the DGX H100 nodes

Hardware Specifications

Each DGX H100 node has 112 Intel CPUs with 2TB memory, and 8 Nvidia H100 GPUs. Each H100 GPU has 80G memory.

Access

The two DGX H100 nodes are in the gpu-he partition. To access H100 GPUs, users need to submit jobs to the gpu-he partition and request the h100 feature, i.e.

Running NGC Containers

NGC containers provide the best performance from the DGX H100 nodes. is an example for running NGC containers.

Running Oscar Modules

The two nodes have Intel CPUs. So Oscar modules can still be loaded and run on the two DGX nodes.

Ampere Architecture GPUs

The new Ampere architecture GPUs on Oscar (A6000's and RTX 3090's)

The new Ampere architecture GPUs do not support older CUDA modules. Users must re-compile their applications with the newer CUDA/11 or older modules. Here are detailed instructions to compile major frameworks such as PyTorch, and TensorFlow.

PyTorch

Users can install PyTorch from a pip virtual environment or use pre-built singularity containers provided by Nvidia NGC.

To install via virtual environment:

To use NGC containers via Singularity :

Pull the image from NGC

Export PATHs to mount the Oscar file system

To use the image interactively

To submit batch jobs

Submitting GPU Jobs

The Oscar GPUs are in a separate partition to the regular compute nodes. The partition is called gpu. To see how many jobs are running and pending in the gpu partition, use

Interactive use

To start an session on a GPU node, use the interact command and specify the gpu partition. You also need to specify the requested number of GPUs using the -g option:

Batch jobs

Here is an example batch script for a cuda job that uses 1 gpu and 1 cpu for 5 minutes

To submit this script:

DGX GPU Nodes in the GPU-HE Partition

All the nodes in the gpu-he partition have V100 GPUs. However, two of them are DGX nodes (gpu1404/1405) which have 8 GPUs. When a gpu-he job requests for more than 4 GPUs, the job will automatically be allocated to the DGX nodes.

The other non-DGX nodes actually have a better NVLink interconnect topology as all of them have direct links to the other. So the non-DGX nodes are better for a gpu-he job if the job does not require more than 4 GPUs.

Intro to CUDA

Introduction to CUDA

is an extension of the C language, as well as a runtime library, to facilitate general-purpose programming of NVIDIA GPUs. If you already program in C, you will probably find the syntax of CUDA programs familiar. If you are more comfortable with C++, you may consider instead using the higher-level library, which resembles the Standard Template Library and is included with CUDA.

In either case, you will probably find that because of the differences between GPU and CPU architectures, there are several new concepts you will encounter that do not arise when programming serial or threaded programs for CPUs. These are mainly to do with how CUDA uses threads and how memory is arranged on the GPU, both described in more detail below.

There are several useful documents from NVIDIA that you will want to consult as you become more proficient with CUDA:

There are also many CUDA tutorials available online:

from NVIDIA
from The Supercomputing Blog

Threads in CUDA

CUDA uses a data-parallel programming model, which allows you to program at the level of what operations an individual thread performs on the data that it owns. This model works best for problems that can be expressed as a few operations that all threads apply in parallel to an array of data. CUDA allows you to define a thread-level function, then execute this function by mapping threads to the elements of your data array.

A thread-level function in CUDA is called a kernel. To launch a kernel on the GPU, you must specify a grid, and a decomposition of the grid into smaller thread blocks. A thread block usually has around 32 to 512 threads, and the grid may have many thread blocks totalling thousands of threads. The GPU uses this high thread count to help it hide the latency of memory references, which can take 100s of clock cycles.

Conceptually, it can be useful to map the grid onto the data you are processing in some meaningful way. For instance, if you have a 2D image, you can create a 2D grid where each thread in the grid corresponds to a pixel in the image. For example, you may have a 512x512 pixel image, on which you impose a grid of 512x512 threads that are subdivided into thread blocks with 8x8 threads each, for a total of 64x64 thread blocks. If your data does not allow for a clean mapping like this, you can always use a flat 1D array for the grid.

The CUDA runtime dynamically schedules the thread blocks to run on the multiprocessors of the GPU. The M2050 GPUs available on Oscar each have 14 multiprocessors. By adjusting the size of the thread block, you can control how much work is done concurrently on each multiprocessor.

Memory on the GPU

The GPU has a separate memory subsystem from the CPU. The M2050 GPUs have GDDR5 memory, which is a higher bandwidth memory than the DDR2 or DDR3 memory used by the CPU. The M2050 can deliver a peak memory bandwidth of almost 150 GB/sec, while a multi-core Nehalem CPU is limited to more like 25 GB/sec.

The trade-off is that there is usually less memory available on a GPU. For instance, on the Oscar GPU nodes, each M2050 has only 3 GB of memory shared by 14 multiprocessors (219 MB per multiprocessor), while the dual quad-core Nehalem CPUs have 24 GB shared by 8 cores (3 GB per core).

Another bottleneck is transferring data between the GPU and CPU, which happens over the PCI Express bus. For a CUDA program that must process a large dataset residing in CPU memory, it may take longer to transfer that data to the GPU than to perform the actual computation. The GPU offers the largest benefit over the CPU for programs where the input data is small, or there is a large amount of computation relative to the size of the input data.

CUDA kernels can access memory from three different locations with very different latencies: global GDDR5 memory (100s of cycles), shared memory (1-2 cycles), and constant memory (1 cycle). Global memory is available to all threads across all thread blocks, and can be transferred to and from CPU memory. Shared memory can only be shared by threads within a thread block and is only accessible on the GPU. Constant memory is accessible to all threads and the CPU, but is limited in size (64KB).

Compiling CUDA

Compiling with CUDA

To compile a CUDA program on Oscar, first load the CUDA module with:

$ module load cuda

The CUDA compiler is called nvcc, and for compiling a simple CUDA program it uses syntax simlar to gcc:

$ nvcc -o program source.cu

Optimizations for Fermi

The Oscar GPU nodes feature NVIDIA M2050 cards with the Fermi architecture, which supports CUDA's "compute capability" 2.0. To fully utilize the hardware optimizations available in this architecture, add the -arch=sm_20 flag to your compile line:

$ nvcc -arch=sm_20 -o program source.cu

This means that the resulting executable will not be backwards-compatible with earlier GPU architectures, but this should not be a problem since CCV nodes only use the M2050.

Memory caching

The Fermi architecture has two levels of memory cache similar to the L1 and L2 caches of a CPU. The 768KB L2 cache is shared by all multiprocessors, while the L1 cache by default uses only 16KB of the available 64KB shared memory on each multiprocessor.

You can increase the amount of L1 cache to 48KB at compile time by adding the flags -Xptxas -dlcm=ca to your compile line:

$ nvcc -Xptxas -dlcm=ca -o program source.cu

If your kernel primarily accesses global memory and uses less than 16KB of shared memory, you may see a benefit by increasing the L1 cache size.

If your kernel has a simple memory access pattern, you may have better results by explicitly caching global memory into shared memory from within your kernel. You can turn off the L1 cache using the flags –Xptxas –dlcm=cg.

Installing Frameworks (PyTorch, TensorFlow, Jax)

This page describes installing popular frameworks like TensorFlow, PyTorch & JAX, etc. on your Oscar account.

Preface: Oscar is a heterogeneous cluster meaning we have nodes with different architecture GPUs (Pascal, Volta, Turing, and Ampere). We recommend building the environment first time on Ampere GPUs with the latest CUDA11 modules so it's backward compatible with older architecture GPUs.

In this example, we will install PyTorch (refer to sub-pages for TensorFlow and Jax).

Step 1: Request an interactive session on a GPU node with Ampere architecture GPUs

interact -q gpu -g 1 -f ampere -m 20g -n 4

Here, -f = feature. We only need to build on Ampere once.

Step 2: Once your session has started on a compute node, run nvidia-smi to verify the GPU and then load the appropriate modules

Step 3: Create and activate the virtual environment, unload the pre-loaded modules then load cudnn and cuda dependencies

module purge
unset LD_LIBRARY_PATH
module load cudnn cuda

Step 4: Create a new vittual environment

python -m venv pytorch.venv
source pytorch.venv/bin/activate

Step 5: Install the required packages

pip install --upgrade pip
pip install torch torchvision torchaudio

The aforementioned will install the latest version of PyTorch with cuda11 compatibility, for older versions you can specify the version by:

pip install torch torchvision torchaudio

Step 6: Test that PyTorch is able to detect GPUs

python
>>> import torch 
torch.cuda.is_available()
True
>>> torch.cuda.get_device_name(0)
'NVIDIA GeForce RTX 3090'

If the above functions return True and GPU model, then it's working correctly. You are all set, now you can install other necessary packages.

Installing TensorFlow

Setting up a GPU-accelerated environment can be challenging due to driver dependencies, version conflicts, and other complexities. Apptainer simplifies this process by encapsulating all these details

Apptainer Using NGC Containers (Our #1 Recommendation)

There are multiple ways to install and run TensorFlow. Our recommended approach is via NGC containers. The containers are available via NGC Registry. In this example we will pull TensorFlow NGC container

Build the container:

apptainer build tensorflow-24.03-tf2-py3.simg docker://nvcr.io/nvidia/tensorflow:24.03-tf2-py3

This will take some time, and once it completes you should see a .simg file.

For your convenience, the pre-built container images are located in directory:

/oscar/runtime/software/external/ngc-containers/tensorflow.d/x86_64/

You can choose either to build your own or use one of the pre-downloaded images.

Working with Apptainer images requires lots of storage space. By default Apptainer will use ~/.apptainer as a cache directory which can cause you to go over your Home quota.

export APPTAINER_CACHEDIR=/tmp
export APPTAINER_TMPDIR=/tmp

Once the container is ready, request an interactive session with a GPU

interact -q gpu -g 1 -f ampere -m 20g -n 4

Run a container wih GPU support

export APPTAINER_BINDPATH="/oscar/home/$USER,/oscar/scratch/$USER,/oscar/data"
# Run a container with GPU support
apptainer run --nv tensorflow-24.03-tf2-py3.simg

the --nv flag is important. As it enables the NVIDA sub-system

Or, if you're executing a specific command inside the container:

# Execute a command inside the container with GPU support
$ apptainer exec --nv tensorflow-24.03-tf2-py3.simg nvidia-smi

Make sure your Tensorflow image is able to detect GPUs

$ python
>>> import tensorflow as tf
>>> tf.test.is_gpu_available(cuda_only=False, min_cuda_compute_capability=None)
True

If you need to install more custom packages, the containers itself are non-writable but we can use the --user flag to install packages inside .local Example:

Apptainer> pip install <package-name> --user

Slurm Script:

Here is how you can submit a SLURM job script by using the srun command to run your container. Here is a basic example:

#!/bin/bash
#SBATCH --nodes=1               # node count
#SBATCH -p gpu --gres=gpu:1     # number of gpus per node
#SBATCH --ntasks-per-node=1     # total number of tasks across all nodes
#SBATCH --cpus-per-task=1       # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem=40G               # total memory (4 GB per cpu-core is default)
#SBATCH -t 01:00:00             # total run time limit (HH:MM:SS)
#SBATCH --mail-type=begin       # send email when job begins
#SBATCH --mail-type=end         # send email when job ends
#SBATCH --mail-user=<USERID>@brown.edu

module purge
unset LD_LIBRARY_PATH
export APPTAINER_BINDPATH="/oscar/home/$USER,/oscar/scratch/$USER,/oscar/data"
srun apptainer exec --nv tensorflow-24.03-tf2-py3.simg python examples/tensorflow_examples/models/dcgan/dcgan.py

Mixing MPI and CUDA

Combining CUDA and MPI

Mixing MPI (C) and CUDA (C++) code requires some care during linking because of differences between the C and C++ calling conventions and runtimes. One option is to compile and link all source files with a C++ compiler, which will enforce additional restrictions on C code. Alternatively, if you wish to compile your MPI/C code with a C compiler and call CUDA kernels from within an MPI task, you can wrap the appropriate CUDA-compiled functions with the extern keyword, as in the following example.

These two source files can be compiled and linked with both a C and C++ compiler into a single executable on Oscar using:

The CUDA/C++ compiler nvcc is used only to compile the CUDA source file, and the MPI C compiler mpicc is used to compile the C code and to perform the linking. / multiply.cu /

include

global void multiply (const float a, float b) { const int i = threadIdx.x + blockIdx.x blockDim.x; b[i] = a[i]; }

extern "C" void launch_multiply(const float a, const b) { / ... load CPU data into GPU buffers a_gpu and b_gpu /

Note the use of extern "C" around the function launch_multiply, which instructs the C++ compiler (nvcc in this case) to make that function callable from the C runtime. The following C code shows how the function could be called from an MPI task.

/ main.c /

include

void launch_multiply(const float a, float b);

int main (int argc, char **argv) { int rank, nprocs; MPI_Init (&argc, &argv); MPI_Comm_rank (MPI_COMM_WORLD, &rank); MPI_Comm_size (MPI_COMM_WORLD, &nprocs);

Mixing MPI and CUDA

These two source files can be compiled and linked with both a C and C++ compiler into a single executable on Oscar using:

The CUDA/C++ compiler nvcc is used only to compile the CUDA source file, and the MPI C compiler mpicc is used to compile the C code and to perform the linking.

Large Memory Computing

Software

Software on Oscar

Many scientific and HPC software packages are already installed on Oscar, and additional packages can be requested by submitting a ticket to [email protected]. If you want a particular version of the software, do mention it in the email along with a link to the web page from where it can be downloaded. You can also install your own software on Oscar.

CCV cannot, however, supply funding for the purchase of commercial software. This is normally attributed as a direct cost of research, and should be purchased with research funding. CCV can help in identifying other potential users of the software to potentially share the cost of purchase and maintenance. Several commercial software products that are licensed campus-wide at Brown are available on Oscar.

For software that requires a Graphical User Interface (GUI) we recommend using CCV's VNC Client rather than X-Forwarding.

Python on Oscar

Several versions of Python are available on Oscar as modules. However, we recommend using the system Python available at /usr/bin/python . You do not need to load any module to use this version of Python.

$ which python
/usr/bin/python
$ python --version
Python 3.9.16

pip is also installed as a system package, but other common Python packages (e.g., SciPy, NumPy) are not installed on the system. This affords individual users complete control over the packages they are using, thereby avoiding issues that can arise when code written in Python requires specific versions of Python packages.

We do not provide Python version 2 modules since it has reached its end of life. You may install Python 2 locally in your home directory, but CCV will not provide any Python2 modules.

Users can install any Python package they require by following the instructions given on the Installing Python Packages page.

Intel provides optimized packages for numerical and scientific work that you can install through pip or anaconda.

Python 2 has entered End-of-Life (EOL) status and will receive no further official support as of January 2020. As a consequence, you may see the following message when using pip with Python 2.

DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.

Going forward, the Python Software Foundation recommends using Python 3 for development.

Python in batch jobs

By default, print in Python is buffered. When running Python in a batch job in SLURM you may see output less often than you would when running interactively. This is because the output is being buffered - the print statements are collected until there is a large amount to print, then the messages are all printed at once. For debugging or checking that a Python script is producing the correct output, you may want to switch off buffering.

Switch off buffering

For a single python script you can use the -u option, e.g.

python -u my_script.py

The -u stands for "unbuffered". You can use the environment variable PYTHONUNBUFFERED to set unbuffered I/O for your whole batch script.

There is some performance penalty for having unbuffered print statements, so you may want to reduce the number of print statements, or run buffered for production runs.

Installing R Packages

Installing R packages

Users should install R packages for themselves locally. This documentation shows you how to install R packages locally (without root access) on Oscar.

If the package you want to install has operating-system-level dependencies (i.e. the package depends on core libraries), then we can install it as a module.

Installing an R package

First load the R version that you want to use the package with:

Start an R session

Note some packages will require code to be compiled so it is best to do R packages installs on the login node.

To install the package 'wordcloud':

You will see a warning:

Answer y . If you have not installed any R packages before you will see the following message:

Answer y . The package will then be installed. If the install is successful you will see a message like:

If the installation was not successful you will see a message like:

There is normally information in the message that gives the reason why the install failed. Look for the word ERROR in the message.

Possible reasons for an installation failing include:

Other software is needed to build the R package, e.g. the R package rgdal needs gdal so you have to do module load gdal
A directory needs deleting from a previous failed installation.

Reinstalling R packages

To reinstall R packages, start an R session and run the update.packages() command

Removing an R package

Start an R session:

To remove the 'wordcloud' package:

Intro to Parallel Programming

This page serves as a guide for application developers getting started with parallel programming, or users wanting to know more about the working of parallel programs/software they are using.

Although there are several ways to classify parallel programming models, a basic classification is:

Distributed Memory Programming

Shared Memory Parallelism

This model is useful when all threads/processes have access to a common memory space. The most basic form of shared memory parallelism is Multithreading. According to Wikipedia, a of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler (Operating System).

Note that most compilers have inherent support for multithreading up to some level. Multithreading comes into play when the compiler converts your code to a set of instructions such that they are divided into several independent instruction sequences (threads) which can be executed in parallel by the Operating System. Apart from multithreading, there are other features like "vectorized instructions" which the compiler uses to optimize the use of compute resources. In some programming languages, the way of writing the sequential code can significantly affect the level of optimization the compiler can induce. However, this is not the focus here.

Multithreading can also be induced at code level by the application developer and this is what we are interested in. If programmed correctly, it can also be the most "efficient" way of parallel programming as it is managed at the Operating System level and ensures optimum use of "available" resources. Here too, there are different parallel programming constructs which support multithreading.

Pthreads

POSIX threads is a standardized C language threads programming interface. It is a widely accepted standard because of being lightweight, highly efficient and portable. The routine to create Pthreads in a C program is called pthread_create and an "entry point" function is defined which is to be executed by the threads created. There are mechanisms to synchronize the threads, create "locks and mutexes", etc. Help pages:

Comprehensive tutorial page on

OpenMP

OpenMP is a popular directive based construct for shared memory programming. Like POSIX threads, OpenMP is also just a "standard" interface which can be implemented in different ways by different vendors.

Compiler directives appear as comments in your source code and are ignored by compilers unless you tell them otherwise - usually by specifying the appropriate compiler flag (). This makes the code more portable and easier to parallelize. you can parallelize loop iterations and code segments by inserting these directives. OpenMP also makes it simpler to tune the application during run time using environment variables. for example, you can set the number of threads to be used by setting the environment variable OMP_NUM_THREADS before running the program. Help pages:

Anaconda

Anaconda provides Python, R and other packages for scientific computing including data sciences, machine learning, etc.

The conda command from the anaconda modules does NOT work. Use module.

There is one anaconda module:

Do not load the module in your .modules or .bashrc file. Otherwise, your OOD Desktop session cannot start.

Screen

screen is a "terminal multiplexer", it enables a number of terminals (or windows) to be accessed and controlled from a single terminal. screen is a great way to save an interactive session between connections to oscar. You can reconnect to the session from anywhere!

Screen commands

Common commands are:

start a new screen session with session name: screen -S <name>
list running sessions/screens: screen -ls
attach to session by name: screen -r <name>
detach: Ctrl+a d
detach and logout (quick exit): Ctrl+a d d
kill a screen session: screen -XS session_name quit

Reconnecting to your screen session

There are several login nodes in Oscar, and the node from where you launched screen matters! That is, you can only reconnect from the login node in which you launched screen from

In order to reconnect to a running screen session, you need to be connected to the same login node that you launched your screen session from. In order to locate and identify your screen sessions correctly, we recommed the following:

Create a directory to store the information of your screen sessions. You only need do this once.

mkdir ~/.screen && chmod 700 ~/.screen

Put the following line into your /.bashrc. This tells the screen program to save the information of your screen sessions in the directory created in the previous step . This allows you to query your screen sessions across different login nodes. To make this change effective in your current sessions, you need run 'source /.bashrc' in each of your current session . However, you do not need to run 'source /bashrc' in your new sessions.

export SCREENDIR=$HOME/.screen

Name your new screen session using the name of the login node. For instance, start your screen with a commnd similar to

screen -S experiment1-login003

VASP

The Vienna Ab initio Simulation Package (VASP) is a package for performing advanced mechanical computations. This page will explain how VASP can be accessed and used on Oscar.

Setting up VASP

In order to use VASP, you must be a part of the vasp group on Oscar. To check your groups, run the groups command in the terminal.

First, you must choose which VASP module to load. You can see the available modules using module avail vasp. You can load your preferred VASP module using module load <module-name>.

Available Versions

VASP 5.4.1
VASP 5.4.4
VASP 6.1.1

Running VASP

Within a batch job, you should specify the number of MPI tasks as

If you would like 40 cores for your calculation, you would include the following in your batch script:

If you're not sure how many cores you should include in your calculation, refer to

Gaussian

Gaussian is a general purpose computational chemistry package. Oscar uses the Gaussian 9 package.

Setting Up Gaussian

In order to use Gaussian on Oscar, you must be a part of the ccv-g09 group. To check your groups, run the groups command in the terminal.

You must first choose a Gaussian module to load. To see available Gaussian modules, run module avail gauss. You can load a Gaussian module using the command module load <module-name>.

Available Versions

Gaussian 9 (g09)
Gaussian 16 (g16)

NOTE: There are three versions of g09, you can load any one of those, but the newer version g16 is preferred now. If using g09 just replace g16 below with g09.

Running Gaussian

Gaussian can be run either interactively or within a batch script using one of two command styles:

g16 job-name
g16 <input-file >output-file

In the first form, the program reads input from job-name.gjf and writes its output to job-name.log. When no job-name has been specified, the program will read from standard input and write to standard output

Given a valid .gjf file (we'll call it test-file.gjf), we can use the following simple batch script to run Gaussian:

g16-test.sh

Then queue the script using

Once the job has been completed, you should have a g16-test.out, a g16-test.err, and a test-file.out.

IDL

Interactive Data Language (IDL) is a programming language used for data analysis and is popular in several scientific fields. This page explains how to use the IDL module on Oscar run IDL programs.

Setting Up IDL

First load the IDL module that you want to use with module load idl/version_number:

You can use the command module load idl to simply load the default version. This is demonstrated in the following command followed by system dialogue.

As indicated by the system dialogue, you will need to enter the following command to set up the environment for IDL:

IDL Command Line

Once you've set up IDL in the way outlined above, you can open the IDL command line by simply using the command idl:

Note: To exit this environment, simply use the command exit

As is stated in the , IDL in command-line mode "uses a text-only interface and sends output to your terminal screen or shell window." Thus, this is a mode in which you can enter commands and see their results in real time, but it is not where one should write full IDL programs.

IDL Programs

To write an IDL program, you can use any of the text editors on Oscar (such as vim, emacs, and nano) or you can create the program in a file on your own computer and then copy that file to Oscar when you are finished. Here is an example (hello world) IDL program idl_hello_world.pro:

This file and the batch file below can be found at /gpfs/runtime/software_examples/idl/8.5.1 if you wish to copy them and test the process yourself.

Once you have the .pro file on Oscar, you can then run this file using a batch script. Here is a bare bones version of a batch script (called idl_hello_world.sh)that will run the script idl_hello_world.pro (note that the .pro is omitted in the script).

We can then run the batch file by using the sbatch command:

MPI4PY

This page documents how to use the MPI for Python package within a Conda environment.

Using MPI4PY in a Python Script

The installation of mpi4py will be discussed in the following sections. This section provides an example of how mpi4py would be used in a python script after such an installation.

To use MPI in a python script through mpi4py, you must first import it using the following code:

Example Script

Here is an example python script mpi4pytest.py that uses MPI:

The file mpi4pytest.py can be found at /gpfs/runtime/softwareexamples/mpi4py/

Conda Environment

Start by creating and activating a :

Once you have activated your conda environment, run the following commands to install mpi4py:

You may change the python version in the pip command.

To check that the installation process was a success you can run

If no errors result from running the command, the installation has worked correctly.

Here is an example batch job script mpi4pytest_conda.sh that uses mpi4pytest.py and the conda environment setup:

The example script above runs the python script on two nodes by using the #SBATCH -N 2 command. For more information on #SBATCH options, see our .

Python Environment

Start by creating and activating a

Once you have activated your conda environment, run the following command to install mpi4py:

Below is an example batch job script mpi4pytest_env.sh:

Jupyter Notebooks/Labs

Tunneling into Jupyter with Windows

This page is for users trying to open Jupyter Notebooks/Labs through Oscar with Windows.

Software that makes it easy

If you are using Windows, you can use any of the following options to open a terminal on your machine (ranked in order of least difficult to set up and use):

Windows Terminal
MobaXterm
WSL2 (we recommend Ubuntu as your Linux distribution)

After opening a terminal using any of these programs, simply enter the ssh command provided by the jupyter-log-{jobid}.txt file. Then continue with the steps given by the documentation that led you to this page.

If you have PuTTY and would prefer to not download any additional software, there are steps (explained below) that you can take to use PuTTY to tunnel into a Jupyter Notebook/Lab.

Using PuTTY

These instructions will use ssh -N -L 9283:172.20.209.14:9283 [email protected] as an example command that could be found in the jupyter-log-{jobid}.txt file.

Open PuTTY and enter your host name ([email protected]) in the textbox.

Next, navigate to the 'Tunnels' Menu (click the '+' next to SSH in order to have it displayed).

Enter the source port (9283 in the example) and destination (172.20.209.14:9283 in the example). Click 'Add'. The source port and destination should show up as a pair in the box above. Then click 'Open'. A new window should open requesting your password.

After entering your password, you should be able to access the notebook/lab in a browser using localhost:ipnport (see the documentation that led you here for details).

Debugging

Arm Forge

Arm Forge is available on Oscar. There are two products, DDT (debugger) and MAP (performance reports).

We recommend you use the Arm Forge remote client to launch your debugging jobs on Oscar. The first time you set up Arm Forge you will need to configure the client with the following steps:

Download the arm forge remote client on your machine.
Configuring Remote Launch from the client
Set up Job Submission Settings

Compile your code with -g so you can see the source code in your debugging session

Arm DDT

Arm DDT is a powerful graphical debugger suitable for many different development environments, including:

Single process and multithreaded software.
OpenMP.
Parallel (MPI) software.

Arm MAP

Arm MAP is a parallel profiler that shows you which lines of code took the most time to run, and why. Arm MAP does not require any complicated configuration, and you do not need to have experience with profiling tools to use it.

Arm MAP supports:

MPI, OpenMP and single-threaded programs.
Small data files. All data is aggregated on the cluster and only a few megabytes written to disk, regardless of the size or duration of the run.
Sophisticated source code view, enabling you to analyze performance across individual functions.
Both interactive and batch modes for gathering profile data.
A rich set of metrics, that show memory usage, floating-point calculations and MPI usage across processes, including:
- Percentage of vectorized instructions, including AVX extensions, used in each part of the code.
- Time spent in memory operations, and how it varies over time and processes, to verify if there are any cache bottlenecks.
- A visual overview across aggregated processes and cores that highlights any regions of imbalance in the code.

Configuring Remote Launch

Configuring Remote Launch from the client

You will need to configure remote launch for Oscar

Open the client on your machine
Click 'Remote Launch' -> Configure
Add [email protected] as the Host Name
Add /gpfs/runtime/opt/forge/19.1.2 as the Remote Installation Directory
Test Remote Launch. You should enter the password used for Oscar. If successful you should see the message Remote Launch test completed successfully

If you have a mismatch between your client version on the version of Forge on Oscar you will see an error message. To fix this make sure you are using compatible client and remote versions

Once you are connected you will see a Licence checked out and "Connected to [email protected]' on the client.

Setting Job Submission Settings

We have provided templates for you to use for job submission settings. These templates are in/gpfs/runtime/opt/forge/19.1.2/templates

Click Run and debug a program to open the following menu

Click Configure next to Submit to Queue and enter /gpfs/runtime/opt/forge/19.1.2/templates/slurm-ccv.qtf as the Submission template file

slurm-ccv-qtf lets you specify the total number of tasks. The number of tasks may not be equal for each node. This option will be the shortest time in the queue, but may not give you consistent run times.

slurm-ccv-mpi.qtf is for MPI jobs where you want to specify number of nodes and tasks per node

slurm-ccv-threaded.qtf is for threaded (single node) jobs

MATLAB

Migration of MPI Apps to Slurm 22.05.7

In January 2023, Oscar will be migrating to use Slurm version 22.05.7.

Slurm version 22.05.7

improves security and speed,
supports boths PMI2 and PMIX, and
provides REST APIs
allows users to prioritize their jobs via scontrol top <job_id>

While most applications will be unaffected by these changes, applications built to make use of MPI may need to be rebuilt to work properly. To help facilitate this, we are providing users who use MPI-based applications (either through Oscar's module system or built by users) with advanced access to a test cluster running the new version of Slurm. Instructions for accessing the test cluster, building MPI-based applications, and submitting MPI jobs using the new Slurm, are provided below.

Please note - some existing modules of MPI-based applications will be deprecated and removed from the system as part of this upgrade. A list of modules that will no longer be available to users following the upgrade is given at the bottom of the page.

Instructions for Testing Applications with Slurm 22.05.7

Request access to the Slurm 22.05.7 test cluster (email [email protected])
Connect to Oscar via either SSH or Open OnDemand (instructions below)
Build your application using the new MPI applications listed below
Submit your job

Users must contact [email protected] to obtain access to the test cluster in order to submit jobs using Slurm 22.05.7.

Connecting via SSH

Connect to Oscar using the ssh command in a terminal window
From Oscar's command line, connect to the test cluster using the command ssh node1947
From the node1947 command line, submit your jobs (either interactive or batch) as follows:

For CPU-only jobs: interact -q image-test
For GPU jobs: interact -q gpu

Include the following line within your batch script and then submit using the sbatch command, as usual

For CPU-only jobs: #SBATCH -p image-test
For GPU jobs: #SBATCH -p gpu

Connecting via Open OnDemand

Open a web browser and connect to poodcit2.services.brown.edu
Login with your Oscar username and password
Start a session using the Advanced Desktop App
Select the gpu partition and click the launch button.

Only the Advanced Desktop App will connect to the test cluster
The Advanced Desktop App must connect to the gpu partition

MPI Applications

Migrated or New Modules

If the "Current Module Version" for an application is blank, a new version is built for the application.

Application

Current Module Version

Migrated or New Module Version

abaqus

2021.1_intel17

2021_slurm22_a

ambertools

amber22

boost

1.69

1.69_openmpi_4.0.7_gcc_10.2_slurm22

CharMM

CharMM/c47b1_slurm20

CharMM/c47b1

cp2k

2022.2

dedalus

2.1905
2.1905_openmpi_4.05_gcc_10.2_slurm20

2.1905_openmpi_4.0.7_gcc_10.2_slurm22

esmf

8.4.0b12

8.4.0_openmpi_4.0.7_gcc_10.2_slurm22

fftw

3.3.6
3.3.8

3.3.6_openmpi_4.0.7_gcc_10.2_slurm22
3.3.10_slurm22

global_arrays

5.8_openmpi_4.0.5_gcc_10.2_slurm20

5.8_openmpi_4.0.7_gcc_10.2_slurm22

gpaw

21.1.0_hpcx_2.7.0_gcc_10.2_slurm20
21.1.0_openmpi_4.0.5_gcc_10.2_slurm20
21.1.0a_openmpi_4.0.5_gcc_10.2_slurm20

21.1.0_openmpi_4.0.7_gcc_10.2_slurm22
21.1.0_openmpi_4.0.7_gcc_10.2_slurm22
21.1.0_openmpi_4.0.7_gcc_10.2_slurm22

gromacs

2018.2

gromacs/2018.2_mvapich2-2.3.5_gcc_10.2_slurm22

hdf5

1.10.8_mvapich2_2.3.5_gcc_10.2_slurm22
1.10.8_openmpi_4.0.7_gcc_10.2_slurm22
1.10.8_openmpi_4.0.7_intel_2020.2_slurm22
1.12.2_openmpi_4.0.7_intel_2020.2_slurm22

ior

3.3.0

lammps

29Sep21_openmpi_4.0.5_gcc_10.2_slurm20

29Sep21_openmpi_4.0.7_gcc_10.2_slurm22

meme

5.3.0

5.3.0_slurm22

Molpro

2021.3.1

2021.3.1_openmpi_4.0.7_gcc_10.2_slurm22

mpi

hpcx_2.7.0_gcc_10.2_slurm20
mvapich2-2.3.5_gcc_10.2_slurm20

hpcx_2.7.0_gcc_10.2_slurm22
mvapich2-2.3.5_gcc_10.2_slurm22
openmpi_4.0.7_gcc_10.2_slurm22
openmpi_4.0.7_intel_2020.2_slurm22

mpi4py

3.1.4_py3.9.0_slurm22

netcdf

4.7.4_gcc_10.2_hdf5_1.10.5
4.7.4_intel_2020.2_hdf5_1.12.0

4.7.4_gcc_10.2_hdf5_1.10.8_slurm22
4.7.4_gcc_10.2_hdf5_1.12.2_slurm22

netcdf4-python

1.6.2

osu-mpi

5.6.3_openmpi_4.0.7_gcc_10.2

petsc

petsc/3.18.2_openmpi_4.0.7_gcc_10.2_slurm22

pnetcdf

1.12.3

1.12.3_openmpi_4.0.7_gcc_10.2_slurm22

qmcpack

3.9.2_hpcx_2.7.0_gcc_10.2_slurm20
3.9.2_openmpi_4.0.0_gcc_8.3_slurm20
3.9.2_openmpi_4.0.0_gcc_8.3_slurm20_complex
3.9.2_openmpi_4.0.1_gcc
3.9.2_openmpi_4.0.4_gcc
3.9.2_openmpi_4.0.5_intel_2020.2_slurm20

3.9.2_openmpi_4.0.7_gcc_10.2_slurm22

quantumespresso

6.4_openmpi_4.0.0_gcc_8.3_slurm20
6.4_openmpi_4.0.5_intel_2020.2_slurm20
7.0_openmpi_4.0.5_intel_2020.2_slurm20

6.4_openmpi_4.0.7_gcc_10.2_slurm22
6.4_openmpi_4.0.7_intel_2020.2_slurm22
7.0_openmpi_4.0.7_gcc_10.2_slurm22

vasp

5.4.1
5.4.1_mvapich2-2.3.5_intel_2020.2_slurm20
5.4.4
5.4.4_intel
5.4.4_mvapich2-2.3.5_intel_2020.2_slurm20
5.4.4_openmpi_4.0.5_gcc_10.2_slurm20
5.4.4a
6.1.1_ompi405_yqi27
6.1.1_openmpi_4.0.5_intel_2020.2_yqi27_slurm20
6.1.1_yqi27
6.3.0_cfgoldsm
6.3.2_avandewa

5.4.1_slurm22
5.4.4_slurm22
5.4.4_openmpi_4.0.7_gcc_10.2_slurm22
6.1.1_ompi407_yqi27_slurm22
6.3.0_cfgoldsm_slurm22
6.3.2_avandewa_slurm22

wrf

4.2.1_hpcx_2.7.0_intel_2020.2_slurm20

To build custom applications:

We recommend using following MPI modules to build your custom applications:

MPI

Oscar Module

GCC based OpenMPI

mpi/openmpi_4.0.7_gcc_10.2_slurm22

Intel based OpenMPI

mpi/openmpi_4.0.7_intel_2020.2_slurm22

MVAPICH

mpi/mvapich2-2.3.5_gcc_10.2_slurm22

Mellanox HPC-X

mpi/hpcx_2.7.0_gcc_10.2_slurm22

module load mpi/openmpi_4.0.7_gcc_10.2_slurm22

module load gcc/10.2 cuda/11.7.1

CC=mpicc CXX=mpicxx ./configure --prefix=/path/to/install/dir

module load mpi/openmpi_4.0.7_gcc_10.2_slurm22

module load gcc/10.2 cuda/11.7.1

cmake -DCMAKE_C_COMPILER=mpicc DCMAKE_CXX_COMPILER=mpicxx ..

Deprecated Modules

A new module might be available for a deprecated application module. Please search the table above to check if a new module is available for an application.

Application

Deprecated Module

abaqus

2017
2021
2021.1
6.12sp2

abinit

9.6.2

abyss

2.1.1

ambertools

amber16
amber16-gpu
amber17
amber17_lic
amber21

bagel

1.2.2

boost

1.55
1.57
1.68
1.44.0
1.62.0-intel
1.63.0
1.75.0_openmpi_4.0.5_intel_2020.2_slurm20
1.76.0_hpcx_2.7.0_gcc_10.2_slurm20
1.76.0_hpcx_2.7.0_intel_2020.2_slurm20

cabana

1
1.1
1.1_hpcx_2.7.0_gcc_10.2_slurm20

campari

cesm

1.2.1
1.2.2
2.1.1

cp2k

7.1
7.1_mpi
8.1.0
9.1.0

dacapo

2.7.16_mvapich2_intel

dalton

2018
2018.0_mvapich2-2.3.5_intel_2020.2_slurm20

dice

esmf

7.1.0r
8.0.0
8.0.0b
8.1.0b11
8.1.9b17
8.3.0
8.3.1b05

fenics

2017.1
2018.1.0

ffte

6.0
6.0/mpi

fftw

2.1.5
2.1.5_slurm2020
2.1.5-double
3.3.8a

gerris

global_arrays

5.6.1
5.6.1_i8
5.6.1_openmpi_2.0.3

gpaw

1.2.0
1.2.0_hpcx_2.7.0_gcc
1.2.0_mvapich2-2.3a_gcc
20.10_hpcx_2.7.0_intel_2020.2_slurm20
20.10.0_hpcx_2.7.0_intel_2020.2_slurm20

gromacs

2016.6
2020.1
2018.2_gpu
2018.2_hpcx_2.7.0_gcc_10.2_slurm20
2020.1_hpcx_2.7.0_gcc_10.2_slurm20
2020.4_gpu
2020.4_gpu_hpcx_2.7.0_gcc_10.2_slurm20
2020.4_hpcx_2.7.0_gcc_10.2_slurm20
2020.6_plumed
2021.5_plumed

hande

1.1.1
1.1.1_64
1.1.1_debug

hdf5

1.10.0
1.10.1_parallel
1.10.5
1.10.5_fortran
1.10.5_mvapich2-2.3.5_intel_2020.2_slurm20
1.10.5_openmpi_3.1.3_gcc
1.10.5_openmpi_3.1.6_gcc
1.10.5_openmpi_4.0.0_gcc
1.10.5_openmpi_4.0.5_gcc_10.2_slurm20
1.10.5_parallel
1.10.7_hpcx_2.7.0_intel_2020.2_slurm20
1.10.7_openmpi_4.0.5_gcc_10.2_slurm20
1.10.7_openmpi_4.0.5_intel_2020.2_slurm20
1.12.0_hpcx_2.7.0_intel_2020.2
1.12.0_hpcx_2.7.0_intel_2020.2_slurm20
1.12.0_openmpi_4.0.5_intel_2020.2_slurm20

hnn

hoomd

2.9.0

horovod

0.19.5

ior

3.0.1
3.3.0

lammps

17-Nov-16
11-Aug-17
16-Mar-18
22-Aug-18
7-Aug-19
11Aug17_serial
29Oct20_hpcx_2.7.0_intel_2020.2
29Oct20_openmpi_4.0.5_gcc_10.2_slurm20

medea

3.2.3.0

meme

5.0.5

meshlab

20190129_qt59

Molpro

2019.2
2020.1
2012.1.15
2015_gcc
2015_serial
2018.2_ga
2019.2_ga
2020.1_ga
2020.1_openmpi_4.0.5_gcc_10.2_slurm20
2021.3.1_openmpi_4.0.5_gcc_10.2_slurm20

mpi4py

3.0.1_py3.6.8

multinest

n2p2

1.0.0
2.0.0
2.0.0_hpcx

namd

2.11-multicore
2.13b1-multicore

netcdf

3.6.3
4.4.1.1_gcc
4.4.1.1_intel
4.7.0_intel2019.3
4.7.4_gcc8.3

nwchem

7
6.8-openmpi
7.0.2_mvapich2-2.3.5_intel_2020.2_slurm20
7.0.2_openmpi_4.0.5_intel_2020.2_slurm20
7.0.2_openmpi_4.1.1_gcc_10.2_slurm20

openfoam

4.1
7
4.1-openmpi_3.1.6_gcc_10.2_slurm20
4.1a
7.0_hpcx_2.7.0_gcc_10.2_slurm20

openmpi

openmpi_4.0.5_gcc_10.2_slurm20

Openmpi wth Intel compilers

openmpi_4.0.5_intel_2020.2_slurm20

orca

4.0.1.2
4.1.1
4.2.1
5.0.0
5.0.1

osu-mpi

5.3.2

paraview

5.1.0
5.1.0_yurt
5.4.1
5.6.0_no_scalable
5.6.0_yurt
5.8.0
5.8.0_mesa
5.8.0_release
5.8.1_openmpi_4.0.5_intel_2020.2_slurm20
5.9.0
5.9.0_ui

paris

1.1.3

petsc

3.14.2_hpcx_2.7.0_intel_2020.2_slurm20
3.14.2_mpich3.3a3_intel_2020.2
3.7.5
3.7.7
3.8.3

phyldog

plumed

2.7.2
2.7.5

pmclib

polychord

polyrate

potfit

20201014
0.7.1

prophet

augustegm_1.2

pstokes

pymultinest

qchem

5.0.2
5.0.2-openmpi

qmcpack

3.10.0_hpcx_2.7.0_intel_2020.2_slurm20
3.10.0_openmpi_4.0.5_intel_2020.2_slurm20
3.7.0
3.9.1
3.9.1_openmpi_3.1.6

quantumespresso

6.1
6.4
6.5
6.6
6.4_hpcx_2.7.0_intel_2020.02_slurm20
6.4_hpcx_2.7.0_intel_2020.2_slurm20
6.4_openmpi_4.0.5_intel_slurm20
6.4.1
6.5_openmpi_4.0.5_intel_slurm20
6.6_openmpi_4.0.5_intel_2020.2_slurm20
6.7_openmpi_4.0.5_intel_2020.2_slurm20

relion

3.1.3

rotd

2014-11-15_mvapich2

scalasca

2.3.1_intel

scorep

3.0_intel_mvapich2

siesta

sprng

su2

7.0.2

trilinos

12.12.1

vtk

7.1.1
8.1.0

wrf

3.6.1
4.2.1_hpcx_2.7.0_intel_2020.2_slurm20