Only this pageAll pages
Powered by GitBook
Couldn't generate the PDF for 133 pages, generation stopped at 100.
Extend with 50 more pages.
1 of 100

Oscar

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Connecting to Oscar

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Managing files

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Submitting jobs

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

GPU Computing

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Large Memory Computing

Loading...

Software

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Jupyter Notebooks/Labs

Loading...

Loading...

Loading...

Debugging

Loading...

Loading...

Loading...

MATLAB

Quick Reference

This page contains Linux commands commonly used on Oscar, basic module commands, and definitions for common terms used within this documentation.

These pages list some command commands and terms you will come across while using Oscar.

Common Acronyms and Terms

Managing Modules

Common Linux Commands

Managing Modules

module list

Lists all modules that are currently loaded in your software environment.

module avail

Lists all available modules on the system. Note that a module can have multiple versions. Use module avail <name> to list available modules which start with <name>

module help <name>

Prints additional information about the given software.

module load <name>

Adds a module to your current environment. If you load using just the name of a module, you will get the default version. To load a specific version, load the module using its full name with the version: "module load gcc/10.2"

module unload <name>

Removes a module from your current environment.

Best Practices for I/O

Efficient I/O is essential for good performance in data-intensive applications. Often, the file system is a substantial bottleneck on HPC systems, because CPU and memory technology has improved much more drastically in the last few decades than I/O technology.

Parallel I/O libraries such as MPI-IO, HDF5 and netCDF can help parallelize, aggregate and efficiently manage I/O operations. HDF5 and netCDF also have the benefit of using self-describing binary file formats that support complex data models and provide system portability. However, some simple guidelines can be used for almost any type of I/O on Oscar:

  • Try to aggregate small chunks of data into larger reads and writes.

    For the GPFS file systems, reads and writes in multiples of 512KB

    provide the highest bandwidth.

  • Avoid using ASCII representations of your data. They will usually

    require much more space to store, and require conversion to/from

    binary when reading/writing.

  • Avoid creating directory hierarchies with thousands or millions of

    files in a directory. This causes a significant overhead in managing

    file metadata.

While it may seem convenient to use a directory hierarchy for managing large sets of very small files, this causes severe performance problems due to the large amount of file metadata. A better approach might be to implement the data hierarchy inside a single HDF5 file using HDF5's grouping and dataset mechanisms. This single data file would exhibit better I/O performance and would also be more portable than the directory approach.

Citing CCV

If you publish research that benefited from the use of CCV services or resources, we would greatly appreciate an acknowledgment that states:

This research [Part of this research] was conducted using [computational/visualization]
resources and services at the Center for Computation and Visualization, Brown University.

Quickstart

How to connect to Oscar and submit your first batch job

Connect to OSCAR

This guide assumes you have an Oscar account. To request an account see create an account.

The simplest way to connect to Oscar is via Open OnDemand (OOD). To connect to OOD, go to https://ood.ccv.brown.eduand log in using your Brown credentials. For more details on Open OnDemand click .

Alternatively, you can connect to OSCAR via SSH (Terminal):

Windows users need an SSH client like installed. SSH is available by default on Linux and macOS. Click for more details.

Submit a Job

You can submit a job using sbatch:

You can confirm that your job ran successfully by running:

For more detailed information on submitting jobs, see the section of the documentation.

Transfer Files

To get specific files on to / off of Oscar, read through the page of the documentation.

Get Help

If you encounter problems while using Oscar, check out the documentation, or read through the Overview page.

Using RStudio

RStudio is an IDE for R that can be run on Oscar.

Launching RStudio

Open the Open On Demand Dashboard by following this link. Select RStudio (under "Default GUI's"). Fill in the form to allocate the required resources, and optionally select your R modules. Finally, click the "Launch Session" button.

Known Issues

Plotting figures may not work within RStudio. If this is the case, save the plots to a file, and view them through the Open On Demand Desktop App. If plots are required for your task, launch RStudio through the Desktop App.

To learn about using the Open On Demand Desktop App, look .

Overview

Overview of OSCAR Supercomputer

Oscar is Brown University's high performance computing cluster for both research and classes. Oscar is maintained and supported by .

Please contact if there are any questions on Oscar.

Accounts

If you do not have an Oscar account, you can request one by clicking the following link:

Anyone with a Brown account can get a free Exploratory account on Oscar, or pay for priority accounts

Common Linux Commands

cd

Moves the user into the specified directory. Change Directory.

cd .. to move one directory up

cd by itself to move to home directory

Offboarding

Account and Access

Oscar users will keep their access to Oscar as long as their Brown account are still active. To be able to access Oscar account after a user's Brown account is deactivated, the user needs to through the department the user is associated.

It is the best that your affiliate account keeps the same username as your previous Brown account. Otherwise, please contact [email protected] to migrate your Oscar account to your affiliate account.

SSH Key Login (Passwordless SSH)

How to set up SSH key authentication.

When connecting from a campus network to sshcampus.ccv.brown.edu you can set up SSH keys as a form of authentication instead of having to enter your password interactively. Follow the insctructions below that correspond to your operating system/connection method.

X-Forwarding

Instructions to forward X11 applications from Oscar to local computer

If you have an installation of X11 on your local system, you can access Oscar with X forwarding enabled, so that the windows, menus, cursor, etc. of any X applications running on Oscar are all forwarded to your local X11 server. Here are some resources for setting up X11:

  • Mac OS -

  • Windows -

SSH Agent Forwarding

How to forward local ssh keys to Oscar

SSH provides a method of sharing the ssh keys on your local machine with Oscar. This feature is called Agent Forwarding and can be useful for instance when working with version control or other services that authenticate via ssh keys. Below are instructions on how to configure your SSH connection to forward ssh-agent for diffeent operating systems

Software on Oscar

Many scientific and HPC software packages are already installed on Oscar, and additional packages can be requested by submitting a ticket to [email protected]. If you want a particular version of the software, do mention it in the email along with a link to the web page from where it can be downloaded. You can also install your own software on Oscar.

CCV cannot, however, supply funding for the purchase of commercial software. This is normally attributed as a direct cost of research, and should be purchased with research funding. CCV can help in identifying other potential users of the software to potentially share the cost of purchase and maintenance. Several commercial software products that are licensed campus-wide at Brown are available on Oscar.

For software that requires a Graphical User Interface (GUI) we recommend using CCV's rather than X-Forwarding.

Python in batch jobs

By default, print in Python is buffered. When running Python in a batch job in SLURM you may see output less often than you would when running interactively. This is because the output is being buffered - the print statements are collected until there is a large amount to print, then the messages are all printed at once. For debugging or checking that a Python script is producing the correct output, you may want to switch off buffering.

Switch off buffering

For a single python script you can use the -u option, e.g.

python -u my_script.py

Anaconda

Anaconda provides Python, R and other packages for scientific computing including data sciences, machine learning, etc.

The conda command from the anaconda modules does NOT work. Use module.

There is one anaconda module:

cd - to move to previous directory

cd <directory-path> to move to a directory (can be an absolute path or relative path)

cp <old_filepath> <new directory path>

Copies the file into the specified directory

clear

Clears the terminal

cat <filename>

Lists the contents of a file. Concatenate files.

ls

List contents within the current directory

grep <string_to_match> <filename>

Searches for the string / regular expression within the specified file and prints the line(s) with the result

pwd

Displays the path of the current directory that you are in. Present Working Directory

man <command>

Displays the help manual instruction for the given command

mv <file_name> <new_directory>

Moves a file into a new directory.

mv <old_file_name> <new_file_name> to rename a file

mkdir <directory_name>

Creates a new directory

rm <file_name>

Deletes a file

rm -r <directory_name>

Deletes directories and the contents within them. -r stands for recursive

rmdir <directory_name>

Removes the specified directory (must be empty)

touch

Creates a blank new file

here

If you are not able to connect to Oscar with your affiliate account, please contact [email protected] for help.

Data

Data Retention

Your data (directories and files) will stay in Oscar for one years after your Brown account is deactivated. After that your data will be archived.

Date Deletion

You may delete your data when you leave Brown University. Or you may request that CCV delete your data on Oscar, especially if you have lots of data.

A PI owns the PI's data directories and can delete all files there.

Retrieve Data

You can download data from Oscar following the instructions here. Globus is recommended for large data transfer.

Billing

If you are a PI and want to keep your priority accounts and/or data directories after leaving Brown University, please contact [email protected] to update your billing information.

get an affiliate account
Mac/Linux/Windows(PowerShell)
Windows(PuTTY)
Mac/Linux
Windows (PuTTY)
VNC Client

CCV Account Information

Account Usage

Oscar users are not permitted to:

  • Share their accounts or passwords with others or enable unauthorized users to access Center for Computation and Visualization resources

  • Use Center for Computation and Visualization resources for personal economic gain

  • Engage in unauthorized activity (e.g., cryto currency mining etc.) that intentionally impacts integrity of resources

Storage

Each user (premium or exploratory) gets 20GB Home Directory, 512GB short-term Scratch, and 256G Data directory (shared amongst the members of group)

  • Files in Scratch Directory not accessed for last 30 days are automatically purged. CCV only stores snapshots for 7 days after that files will be automatically deleted.

  • PI has the ultimate access to Data Directory - if a student leaves Brown the files in Data directory will be owned by the PI.

Software and Data

All software and data stored or used on Center hosted systems must be appropriately and legally acquired and must be used in compliance with applicable licensing terms. Unauthorized misuse or copying of copyrighted materials is prohibited.

Data Retention

CCV reserves the right to remove any data at any time and/or transfer data or other individuals (such as Principal Investigators working on a same or similar project) after a user account is deleted is no longer affiliated with Brown University.

Accounts Validity

  • Once created, Oscar accounts are valid for duration of one's Brown AD credentials

Student Accounts

CCV provides access to HPC resources for classes, workshops, demonstrations, and other instructional uses. In general, the system is available for most types of instructional use at Brown where HPC resources are required, and we will do what we can to provide the resources necessary to help teach your class. We do ask that you follow these guidelines to help us better support your class.

Account Requests and Software Needs

Requests for class accounts should be made in writing to [email protected] two weeks prior to the beginning of class, and should be made in bulk. Please provide the Brown username (required), name and Brown Email address for the students, TAs and instructor as well as the course number and the semester. Requests for specific software should also be made two weeks before the start of the semester, and should be properly licensed, tested and verified to work by an instructor or TA.

Usage Expectations and System Utilization

Unless prior arrangements are made, student class accounts will have the same priority and access as free accounts on the CCV system. Access can be provided to specialized hardware or higher cores if needed provided it does not impact research use of the CCV systems. Be aware that usage of the CCV system is unpredictable, and high utilization of the system could impact a student's ability to finish assignments in a specific time period. We also encourage instructors to give an overview of the system and discuss computing policies before students use the system. CCV can provide resources (slides, documentation and in class workshops) to help prepare students to use HPC system. CCV staff are always available to meet directly with instructors and TAs to help prepare for classes and help setup specific software or environments for the class.

Support

It is expected that any class being taught using CCV resources will have its own TA. The TA should be the first line of support for any problems or questions the students may have regarding the use of the CCV system. CCV staff may not know specifics about how to use or run the programs the class is using, and can’t provide direct support to students for that software.

Class Guest Accounts

CCV will provide limited duration guests accounts that are custom tailored for the class use of the system. These accounts will have a username of “ccvws###”, and each account is associated with an individual student, instructor, or TA. Guest accounts are temporary and are only active for the duration of the class, and are deactivated at the conclusion of the semester/workshop. Account data is kept intact on our system for one semester after the conclusion of the class, and is then permanently deleted from the CCV system.

To request student accounts for a course, please contact us by emailing [email protected].

Running Jobs

Oscar is a shared machine used by hundreds of users at once. User requests are called jobs. A job is the combination of the resource requested and the program you want to run on the compute nodes of the Oscar cluster. On Oscar, Slurm is used to schedule and manage jobs.

Jobs can be run on Oscar in two different ways:

  • Interactive jobs allow the user to interact with programs (e.g., by entering input manually, using a GUI) while they are running. However, if your connection to the system is interrupted, the job will abort. Small jobs with short run times and jobs that require the use of a GUI are best-suited for running interactively.

  • Batch jobs allow you to submit a script that tells the cluster how to run your program. Your program can run for long periods of time in the background, so you don't need to be connected to Oscar. The output of your program is continuously written to an output file that you can view both during and after your program runs.

Jobs are scheduled to run on the cluster according to your account priority and the resources you request (i.e., cores, memory, and runtime). In general, the fewer resources you request, the less time your job will spend waiting in the queue.

Please do not run CPU-intense or long-running programs directly on the login nodes! The login nodes are shared by many users, and you will interrupt other users' work.

here
PuTTY
here
Submitting Jobs
Transferring Files to and from Oscar
Getting Help
Do not load the module in your .modules or .bashrc file. Otherwise, your OOD Desktop session cannot start.

$ module avail anaconda

-------- /oscar/runtime/software/spack/0.20.1/share/spack/lmod/linux-rhel9-x86_64/Core -------
   anaconda/2023.09-0-7nso27y
the conda command from the miniconda3

Large Memory Nodes on Oscar

Memory-Intensive Workloads

Users can check the nodes in a partition using this command nodes . As of July 2025 the Oscar cluster currently has following nodes in bigmem partition.

$ nodeinfo|grep bigmem
node1609  32-core  770GB 32core,intel,scalable,cascade,edr         idle~      0     0%    766GB  99.6%   bigmem                       
node1610  32-core  770GB 32core,intel,scalable,cascade,edr           mix     17  53.1%    633GB  82.4%   bigmem                       
node1611  32-core  770GB 32core,intel,scalable,cascade,edr         idle~      0     0%    712GB  92.6%   bigmem                       
node1612  32-core  770GB 32core,intel,scalable,cascade,edr           mix     18  56.2%    577GB  75.1%   bigmem                       
node2412 192-core 1540GB 192core,amd,genoa,edr                       mix    100    52%   1061GB    69%   bigmem                       
node2415 192-core 1540GB 192core,amd,genoa,edr                       mix    153  79.6%   1231GB    80%   bigmem    

All Oscar users have access to this partition, and can submit jobs to it. To submit batch jobs to large memory nodes, include the following in your batch script:

#SBATCH -p bigmem

To run an interactive job on large memory node, launch the interact command with the following flag:

$ interact -q bigmem

The current Batch partition has many nodes of 1540GB memory. An HPC Priority account can submit jobs to the batch partition to use up to 1500G memory.

One limitation of X forwarding is its sensitivity to your network connection's latency. We advise against using X forwarding from a connection outside of the Brown campus network, since you will likely experience lag between your actions and their response in the GUI.

Mac/Linux

Once your X11 server is running locally, open a terminal and use

to establish the X forwarding connection. Then, you can launch GUI applications from Oscar and they will be displayed locally on your X11 server.

Windows (PuTTY)

For Windows users using PuTTY, enable X forwarding under Connections->SSH->X11:

https://www.xquartz.org
https://sourceforge.net/projects/xming
$ ssh -X <user>@ssh.ccv.brown.edu

The -u stands for "unbuffered". You can use the environment variable PYTHONUNBUFFERED to set unbuffered I/O for your whole batch script.

There is some performance penalty for having unbuffered print statements, so you may want to reduce the number of print statements, or run buffered for production runs.

#!/bin/bash
#SBATCH -n 1

export PYTHONUNBUFFERED=TRUE
python my_script.py
ssh <username>@ssh.ccv.brown.edu
sbatch batch_scripts/hello.sh
cat hello-*.out
.

More details can be found at the CCV Rates page.

Individuals external to Brown can get access to Oscar by having a sponsored Brown account. Please work with your department to get sponsored Brown accounts for any external collaborators.

Authorized users must comply with the following Brown University policies:

  • Acceptable Use Policy

  • Computing Passwords Policy.

  • Computing Policies.

Hardware

Users can run their computing-intensive and/or long runtime jobs/program in Oscar to take advantage of high performance computing resources there, as highlighted below:

  • 2 Login nodes

  • 8 PB of storage

  • Red Hat Enterprise Linux 9.2 (Linux)

  • Mellanox InfiniBand network

  • Slurm Workload manager

Please refer to the details at Oscar hardware.

Scheduler

Hundreds of users can share computing resources in Oscar. Slurm is used in Oscar to manage user jobs and computing resources such as cores and GPUs.

Users should not run computations or simulations on the login nodes, because they are shared with other users. You can use the login nodes to compile your codes, manage files, and launch jobs on the compute nodes.

To allow users sharing access to Oscar, there are limits on the maximum number of pending and running jobs a user account may have/submit:

  • 1200 for a priority account

  • 1000 for an exploratory account

Software

  • Operating systems of all Oscar nodes: Red Hat 9.2

  • More than 500 software modules

  • CCV Staff install software upon user requests or help users on software installation

Storage

Oscar has 8 PB of all-flash storage from VAST, which provides high-performance access to storage. Users have ~/home, ~/scratch, and ~/data directories as their storage with quota in Oscar. Please refer to the details at Oscar's filesystem.

Access and User Accounts - User accounts are controlled via central authentication and directories on Oscar are only deleted on the request of the user, PI, or departmental chair.

Files not accessed for 30 days will be deleted from your ~/scratch directory. Use ~/data for files you wish to keep long term.

Users can transfer files from and to Oscar filesystem. In particular, users can transfer files between Oscar filesystem and Campus File Storage.

Connecting to Oscar

Oscar users can connect to Oscar by

  • SSH

  • Open OnDemand

  • VS Code Remote IDE

Maintenance Schedule

  • Non-disruptive Maintenance:

    • non-disruptive work, including software changes, maintenance, and testing

    • may occur at any time

    • no notification provided

  • Monthly Scheduled Maintenance:

    • no downtime expected, but there may be limited degradation of performance

    • first Tuesday of the month, 8:00 am - 12:00 noon

    • no notification provided

  • Unscheduled Maintenance:

    • maximum 1 day downtime

    • occurs very rarely and includes any unplanned emergency issues that arise

    • Prior notification provided (depending on the issue, 1 day to 4 weeks advance notice provided)

  • Major Upgrade Maintenance:

    • service may be brought down for 3-5 days

    • occurs annually

    • 4-week prior notification provided

Unplanned Outage

  • During Business Hours:

    • Send email to [email protected]. A ticket will get created and CCV staff will attempt to address the issue as soon as possible.

  • During Non-Business Hours:

    • Send email to .

    • Call CIS Operations Center at (401) 863-7562. A ticket will get created and CCV staff will be contacted to address the issue.

User and Research Support

CCV staff support for researchers seeking help with statistical modeling, machine learning, data mining, data visualization, computational biology, high-performance computing, and software engineering.

CCV staff provides tutorials on using Oscar for classes, groups and individual. Please check CCV Events for upcoming trainings and office hours.

CCV provides short videos (coming soon) for users to learn as well.

Center for Computation and Visualization (CCV)
[email protected]

Oscar's Filesystem

CCV uses all-flash parallel filesystem (Vast Data). Users have a home, data, and scratch space.

home ~

  • 100GB of space

  • Optimized for many small files

  • 30 days snapshots

  • The quota is per individual user

  • A grace period of 14 days

data ~/data

  • Each PI gets 256GB for free

  • Optimized for reading large files

  • 30 days snapshots

  • The quota is by group

scratch ~/scratch

  • 512G (soft-quota): 12T (hard-quota)

  • Optimized for reading/writing large files

  • 30 days snapshots

  • Purging: Files not accessed for 30 days may be deleted

Files not accessed for 30 days will be deleted from your scratch directory. This is because scratch is high-performance space. The fuller scratch is the worse the read/write performance. Use ~/data for files you need to keep long-term.

The scratch purge is on individual files. It is by 'atime' which is when the file was last read. You can use 'find' to find files that are at risk of being purged, e.g. to find files in the current directory that have not been accessed in the last 25 days:

find . -atime +25

A good practice is to configure your application to read any initial input data from ~/data and write all output into ~/scratch. Then, when the application has finished, move or copy data you would like to save from ~/scratch to ~/data.

Note: class or temporary accounts may not have a ~/data directory!

To see how much space on your directories, you can use the command checkquota. Below is an example output

You can go over your quota up to the hard limit for a grace period. This grace period is to give you time to manage your files. When the grace period expires you will be unable to write any files until you are back under quota.

There is a quota for space used and for number of files. If you hit the hard limit on either of these you will be unable to write any more files until you are back under quota.

Keep the number of files within the ranges from 0.5M (preferred) to 1M (upper limit). Going beyond this limit can lead to unexpected problems.

Resolving quota issues

This is a quick guide for resolving issues related to file system quotas. To read more details about these quotas, refer to this page.

Step 1: Identify the directory

Run the checkquota command and identify the line that shows the warning status message.

If this directory is either /oscar/home or /oscar/scratch , you will have to take the subsequent steps to resolve this issue. If the directory is data+<group> you should inform others in your group and take collective action to resolve this issue.

Step 2: Disk Space or Inodes

Check whether you have exceeded your disk space quota or your inodes quota. Disk space usage is specified in GB or TB while inodes usage is just numerical count.

Step 3: Remove files

You will need to take the following steps based on the quota you have exceeded.

Disk Space quota:

The fastest way to reduce this usage is identifying large and unnecessary files. Load the module ncdu using the command module load ncdu and run ncdu in the offending directory. This utility will scan that directory and show you all the directories and files, sorted by their size. If they are not sorted by size, press lowercase s to sort them by size. You can navigate the directory tree using the arrow keys and delete any files or directories that are unnecessary.

Some programs leave a lot of temporary files on the disk that may not be necessary.

  • Apptainer: Run the command to clear the apptainer cache. This will clear up the cache in your home directory without affecting any container images. However, pulling a new image from a repository may be slower in the future.

  • Conda: Run the command to delete any tarballs downloaded by conda. This does not affect any existing conda or python virtual environments. However, it may slow down the installation of some packages in the future

  • Core Dump Files: This files are typically named core.<number> A core dump file is generated when a program crashes. It contains the state of the system and it is useful for debugging purposes. You can safely delete any core dump files if you know the reason behind the crash. Old core dump files can take up a lot of disk space and they can be safely deleted.

Inodes quota:

Inode usage can be reduced by removing any files and directories OR tarring up large nested directories. When a directory is converted to a tar ball, it uses a single inode instead of one inode per directory or file. This can drastically decrease your inode usage. Identify directories that contain a large number of files or a very large nested tree of directories with a lot of files.

To identify such directories, load the module ncdu using the command module load ncdu and run ncdu in the offending directory. This utility will scan that directory and show you all the directories and files, sorted by their size. Press uppercase C to switch the sorting criteria to "number of items". You can navigate the directory tree using the arrow keys and delete or tar any files or directories that are unnecessary.

To create a tar ball of a directory:

If your usage has exceeded quota and you cannot write to the directory, you can tar ball in another directory. Using this command, you can create a tar ball in the scratch directory:

Getting Help

Here are some ways to get help with using OSCAR

Filing a Support Ticket

Filing a good support ticket makes it much easier for CCV staff to deal with your request

When you email [email protected] aim to include the following:

  • State the problem/request in the subject of the email

  • Describe which software and with version you are using

  • Error message (if there was one)

  • The job number

  • How you were running, e.g. batch, interactively, vnc

  • Give as as small an example as possible that reproduces the problem

Q&A Forum

Ask questions and search for previous problems at our .

Slack

Join our CCV-Share Slack workspace to discuss your questions with CCV Staff in the #oscar channel.

Office Hours

CCV holds weekly office hours. These are drop in sessions where we'll have one or more CCV staff members available to answer questions and help with any problems you have. Please visit for upcoming office hours and events.

Arrange a Meeting

You can arrange to meet with a CCV staff member in person to go over difficult problems, or to discuss how best to use Oscar. Email [email protected] to arrange a consultation.

Short "How to" Videos

  • Putty Installation

  • SSH to Oscar from Linux

  • SSH to Oscar from Mac

Interactive Apps on OOD

You can launch several different apps on the Open OnDemand (OOD) interface. All of these apps start of a Slurm batch job on the Oscar cluster with the requested amount of resources. These jobs can access the filesystem on Oscar and all output files are written to the Oscar's file system.

Launching an App on OOD

  1. Open https://ood.ccv.brown.edu on any browser of the your choice

  2. If prompted, enter your Brown username and password.

  3. Click on the "Interactive Apps" tab at the top of the screen to see the list of available apps. This will open the form to enter the details of the job.

  4. Follow the instructions on the form to complete it. Some of fields can be left blank and OOD will choose the default option for you.

  5. Click Launch to submit an OOD job. This will open a new tab on the browser It may take a few minutes for this job to start.

  6. Click "Launch <APP>" again if prompted in the next tab.

SLURM limits on resources such CPUs, memory, GPUs or time for each partition still applies for OOD jobs. Please keep these in mind before choosing these options on the OOD form.

When submit a batch job from a terminal of the Desktop app or the Advanced Desktop app, users need to

  • run "unset SLURM_MEM_PER_NODE"before submitting a job if the job needs to specify --mem-per-cpu

  • run "unset SLURM_EXPORT_ENV"

Transferring Files between Oscar and Campus File Storage (Replicated and Non-Replicated)

You may use either Globus (recommended) or smbclient to transfer data between Oscar and Campus File Storage.

Globus

Follow the instructions here for transferring data between files.brown.edu and Oscar.

smbclient

You can transfer files between Campus File Storage and Oscar using .

Transfer Instructions

1) Log into Oscar:

2) Start a screen session. This will allow you to reattach to your terminal window if you disconnect.

3) To use Oscar's high-speed connection to Campus File Storage - Replicated:

Similarly to access Campus File Storage - Non-Replicated ( LRS: Locally Redundant Share)

Replace SHARE_NAME, DIRECTORY_NAME, and BROWN_ID. DIRECTORY_NAME is an optional parameter. The password required is your Brown password.

4) Upload/download your data using the FTP "put"/"get" commands. Replace DIRECTORY_NAME with the folder you'd like to upload.

5) You can detach from the screen session with a "CTRL+A D" keypress. To reattach to your session:

smbclient basics

  • put is upload to Campus File Storage

Usage: put <local_file> [remote file name]

Copy <local_file> from Oscar to Campus File Storage. The remote file name is optional (use if you want to rename the file)

  • get is download to Oscar

Usage: get <remote_file> [local file name] Copy <remote_file> from the Campus File Storage to Oscar. The local file name is optional (use if you want to rename the file)

Moving more than one file:

To move more than one file at once use mput or mget. By default:

recurse is OFF. smbclient will not recurse into any subdirectories when copying files

prompt is ON. smbclient will ask for confirmation for each file in the subdirectories

You can toggle recursion ON/OFF with:

You can toggle prompt OFF/ON with:

Setup virtual environment and debugger

  1. If you have an existing virtual environment, proceed to step 2. Otherwise, to create a new virtual environment:

$ python3 -m venv my_env
$ source my_env/bin/activate
#Install packages manually or from requirements.txt file
$ pip install -r requirements.txt

2. Search for Python.VenvPath as shown in the picture below:

Select your virtual environment

3. VSCode expects you to have multiple virtual environments for each of your different python projects, and it expects you to put them all in the same directory. Pointing to the parent directory lets it scan and find all expected virtual environments, and then you can easily toggle between them in interface.

4. Once you have the virtual environment selected, the debugging capabilities should work.

Submitting GPU Jobs

The Oscar GPUs are in a separate partition to the regular compute nodes. The partition is called gpu. To see how many jobs are running and pending in the gpu partition, use

allq gpu

Interactive use

To start an session on a GPU node, use the interact command and specify the gpu partition. You also need to specify the requested number of GPUs using the -g option:

interact -q gpu -g 1

Batch jobs

Here is an example batch script for a cuda job that uses 1 gpu and 1 cpu for 5 minutes

To submit this script:

DGX GPU Nodes in the GPU-HE Partition

All the nodes in the gpu-he partition have V100 GPUs. However, two of them are DGX nodes (gpu1404/1405) which have 8 GPUs. When a gpu-he job requests for more than 4 GPUs, the job will automatically be allocated to the DGX nodes.

The other non-DGX nodes actually have a better NVLink interconnect topology as all of them have direct links to the other. So the non-DGX nodes are better for a gpu-he job if the job does not require more than 4 GPUs.

Python on Oscar

Several versions of Python are available on Oscar as modules. However, we recommend using the system Python available at /usr/bin/python . You do not need to load any module to use this version of Python.

$ which python
/usr/bin/python
$ python --version
Python 3.9.16

pip is also installed as a system package, but other common Python packages (e.g., SciPy, NumPy) are not installed on the system. This affords individual users complete control over the packages they are using, thereby avoiding issues that can arise when code written in Python requires specific versions of Python packages.

We do not provide Python version 2 modules since it has reached its end of life. You may install Python 2 locally in your home directory, but CCV will not provide any Python2 modules.

Users can install any Python package they require by following the instructions given on the Installing Python Packages page.

Intel provides optimized packages for numerical and scientific work that you can install through or .

Python 2 has entered End-of-Life (EOL) status and will receive no further official support as of January 2020. As a consequence, you may see the following message when using pip with Python 2.

DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.

Going forward, the using Python 3 for development.

Screen

screen is a "terminal multiplexer", it enables a number of terminals (or windows) to be accessed and controlled from a single terminal. screen is a great way to save an interactive session between connections to oscar. You can reconnect to the session from anywhere!

Screen commands

Common commands are:

  • start a new screen session with session name: screen -S <name>

  • list running sessions/screens: screen -ls

  • attach to session by name: screen -r <name>

  • detach: Ctrl+a d

  • detach and logout (quick exit): Ctrl+a d d

  • kill a screen session: screen -XS session_name quit

Reconnecting to your screen session

There are several login nodes in Oscar, and the node from where you launched screen matters! That is, you can only reconnect from the login node in which you launched screen from

In order to reconnect to a running screen session, you need to be connected to the same login node that you launched your screen session from. In order to locate and identify your screen sessions correctly, we recommed the following:

  • Create a directory to store the information of your screen sessions. You only need do this once.

  • Put the following line into your /.bashrc. This tells the screen program to save the information of your screen sessions in the directory created in the previous step . This allows you to query your screen sessions across different login nodes. To make this change effective in your current sessions, you need run 'source /.bashrc' in each of your current session . However, you do not need to run 'source /bashrc' in your new sessions.

  • Name your new screen session using the name of the login node. For instance, start your screen with a commnd similar to

VASP

The Vienna Ab initio Simulation Package (VASP) is a package for performing advanced mechanical computations. This page will explain how VASP can be accessed and used on Oscar.

Setting up VASP

In order to use VASP, you must be a part of the vasp group on Oscar. To check your groups, run the groups command in the terminal.

First, you must choose which VASP module to load. You can see the available modules using module avail vasp. You can load your preferred VASP module using module load <module-name>.

Available Versions

  • VASP 5.4.1

  • VASP 5.4.4

  • VASP 6.1.1

Running VASP

Within a batch job, you should specify the number of MPI tasks as

If you would like 40 cores for your calculation, you would include the following in your batch script:

If you're not sure how many cores you should include in your calculation, refer to

Compiling CUDA

Compiling with CUDA

To compile a CUDA program on Oscar, first load the CUDA module with:

$ module load cuda

The CUDA compiler is called nvcc, and for compiling a simple CUDA program it uses syntax simlar to gcc:

Optimizations for Fermi

The Oscar GPU nodes feature NVIDIA M2050 cards with the Fermi architecture, which supports CUDA's "compute capability" 2.0. To fully utilize the hardware optimizations available in this architecture, add the -arch=sm_20 flag to your compile line:

This means that the resulting executable will not be backwards-compatible with earlier GPU architectures, but this should not be a problem since CCV nodes only use the M2050.

Memory caching

The Fermi architecture has two levels of memory cache similar to the L1 and L2 caches of a CPU. The 768KB L2 cache is shared by all multiprocessors, while the L1 cache by default uses only 16KB of the available 64KB shared memory on each multiprocessor.

You can increase the amount of L1 cache to 48KB at compile time by adding the flags -Xptxas -dlcm=ca to your compile line:

If your kernel primarily accesses global memory and uses less than 16KB of shared memory, you may see a benefit by increasing the L1 cache size.

If your kernel has a simple memory access pattern, you may have better results by explicitly caching global memory into shared memory from within your kernel. You can turn off the L1 cache using the flags –Xptxas –dlcm=cg.

Interactive Jobs

To start an interactive session for running serial or threaded programs on an Oscar compute node, simply run the command interact from the login node:

interact

By default, this will create an interactive session that reserves 1 core and 4GB of memory for a period of 30 minutes. You can change the resources reserved for the session from these default limits by modifying the interact command:

usage: interact [-n cores] [-t walltime] [-m memory] [-q queue]
                [-o outfile] [-X] [-f featurelist] [-h hostname] [-g ngpus]

Starts an interactive job by wrapping the SLURM 'salloc' and 'srun' commands.

options:
  -n cores        (default: 1)
  -t walltime     as hh:mm:ss (default: 30:00)
  -m memory       as #[k|m|g] (default: 4g)
  -q queue        (default: 'batch')
  -o outfile      save a copy of the sessions output to outfile (default: off)
  -X              enable X forwarding (default: no)
  -f featurelist  CCV-defined node features (e.g., 'e5-2600'),
                  combined with '&' and '|' (default: none)
  -h hostname     only run on the specific node 'hostname'
                  (default: none, use any available node)
  -a account      user SLURM accounting account name
  -g ngpus        number of GPUs   

For example, the command

$ interact -n 20 -t 01:00:00 -m 10g

requests an interactive session with 20 cores and 10 GB of memory (per node) for a period of 1 hour.

Keeping Interactive Jobs Alive:

If you lose connectivity to your login node, you lose access to your interactive job. To mitigate this issue you can use screen to keep your connection alive. For more information on using screen on the login nodes, see the

Setting Job Submission Settings

We have provided templates for you to use for job submission settings. These templates are in/gpfs/runtime/opt/forge/19.1.2/templates

Click Run and debug a program to open the following menu

Click Configure next to Submit to Queue and enter /gpfs/runtime/opt/forge/19.1.2/templates/slurm-ccv.qtf as the Submission template file

slurm-ccv-qtf lets you specify the total number of tasks. The number of tasks may not be equal for each node. This option will be the shortest time in the queue, but may not give you consistent run times.

slurm-ccv-mpi.qtf is for MPI jobs where you want to specify number of nodes and tasks per node

slurm-ccv-threaded.qtf is for threaded (single node) jobs

Dependent Jobs

Here is an example script for running dependent jobs on Oscar.

#!/bin/bash

# first job - no dependencies
jobID_1=$(sbatch  job1.sh | cut -f 4 -d' ')

# second job - depends on job1
jobID_2=$(sbatch --dependency=afterok:$jobID_1 job2.sh | cut -f 4 -d' ')

# third job - depends on job2
sbatch  --dependency=afterany:$jobID_2  job3.sh

There are 3 batch jobs. Each job has it's own batch script: job1.sh, job2,sh, jobs.sh. The script above (script.sh) submits the three jobs.

line 4: job1 is submitted.

line 7: job2 depends on job1 finishing successfully.

line 10: job3 depends on job2 finishing successfully.

To use the above script to submit the 3 jobs, run the script as follows:

./script.sh

For details on the types of dependencies you can use in slurm see the manual page.

Configuring Remote Launch

Configuring Remote Launch from the client

You will need to configure remote launch for Oscar

  1. Open the client on your machine

  2. Click 'Remote Launch' -> Configure

  3. Add [email protected] as the Host Name

  4. Add /gpfs/runtime/opt/forge/19.1.2 as the Remote Installation Directory

  5. Test Remote Launch. You should enter the password used for Oscar. If successful you should see the message Remote Launch test completed successfully

If you have a mismatch between your client version on the version of Forge on Oscar you will see an error message. To fix this make sure you are using compatible client and remote versions

Once you are connected you will see a Licence checked out and "Connected to [email protected]' on the client.

Restoring Deleted Files

Nightly snaphots of the file system are available for the last 30 days.

CCV does not guarantee that each of the last 30 days will be available in snapshots because occasionally the snapshot process does not complete within 24 hours.

Restore a file from a snapshot in the last 30 days

Open OnDemand

Open OnDemand (OOD) is a web portal to the Oscar computing cluster. An Oscar account is required to access Open OnDemand. Visit this link in a web browser and sign in with your Brown username and password to access this portal.

Windows (PuTTY)

SSH Agent Forwarding on a Windows system using PuTTY, with an example application to git.

Agent Forwarding with PuTTY

  1. Once adding your private key to Pageant, open PuTTY and navigate to the Auth menu.

2. Check the 'Allow agent forwarding' checkbox, and return to the Session menu.

Grace Hopper GH200 GPUs

Oscar has two Grace Hopper GH200 GPU nodes. Each node combines and .

Hardware Specifications

Each GH200 node has 72 Arm cores with 550G memory. Multiple-Install GPU (MIG) is enabled on only one GH200 node that has 4 MIGs. The other GH200 node doesn't have MIGs and only one GPU. Both CPU and GPU threads on GH200 nodes can now .

Managing Jobs

Listing running and queued jobs

The squeue command will list all jobs scheduled in the cluster. We have also written wrappers for squeue on Oscar that you may find more convenient:

Inspecting Disk Usage (Ncdu)

To determine the sizes of files and discover the largest files in a directory, one can use the Ncdu module.

To get started with NCDU, load the module using the following command:

Once the module has been loaded, it can be used to easily show the size of all files within a directory:

To view options you can use with the ncdu command, simply use the command ncdu --help

The line above uses Ncdu to rank all of the files within the my_directory directory. Your window should change to show a loading screen (if the directory doesn't have a lot in it, you may not even see this screen):

H100 NVL Tensor Core GPUs

Oscar has two H100 nodes. is based on the that accelerates the training of AI models. The two DGX nodes provides better performance when multiple GPUS are used, in particular with Nvidia software like .

Multiple-Instance GPU (MIG) is not enabled on the DGX H100 nodes

Hardware Specifications

Installing R Packages

Installing R packages

Users should install R packages for themselves locally. This documentation shows you how to install R packages locally (without root access) on Oscar.

If the package you want to install has operating-system-level dependencies (i.e. the package depends on core libraries), then we can install it as a module.

Tunneling into Jupyter with Windows

This page is for users trying to open Jupyter Notebooks/Labs through Oscar with Windows.

Software that makes it easy

If you are using Windows, you can use any of the following options to open a terminal on your machine (ranked in order of least difficult to set up and use):

Web-based Terminal App

Open OnDemand offers a browser-based terminal app to access Oscar. Windows users who do not want to install an SSH client like Putty will find this app very useful.

Accessing the terminal

  1. Log in to

Installing JAX

This page describes how to install JAX with Python virtual environments

In this example, we will install Jax.

Step 1: Request an interactive session on a GPU node with Ampere architecture GPUs

Here, -f = feature. We only need to build on Ampere once.

Step 2: Once your session has started on a compute node, run nvidia-smi to verify the GPU and then load the appropriate modules

Step 3: Create and activate the virtual environment

Step 4: Install the required packages

Step 5: Test that JAX is able to detect GPUs

Job Arrays

A job array is a collection of jobs that all run the same program, but on different values of a parameter. It is very useful for running parameter sweeps, since you don't have to write a separate batch script for each parameter setting.

To use a job array, add the option:

in your batch script. The range can be a comma separated list of integers, along with ranges separated by a dash. For example:

A job will be submitted for each value in the range. The values in the range will be substituted for the variable $SLURM_ARRAY_TASK_ID in the remainder of the script. Here is an example of a script for running a serial Matlab script on 16 different parameters by submitting 16 different jobs as an array:

You can then submit the multiple jobs using a single sbatch command:

The $SLURM_ARRAY_TASK_ID can be manipulated as needed. For example, you can generate a fixed length number form it. The following example generates a number of length of 3 from

Installing Frameworks (PyTorch, TensorFlow, Jax)

This page describes installing popular frameworks like TensorFlow, PyTorch & JAX, etc. on your Oscar account.

Preface: Oscar is a heterogeneous cluster meaning we have nodes with different architecture GPUs (Pascal, Volta, Turing, and Ampere). We recommend building the environment first time on Ampere GPUs with the latest CUDA11 modules so it's backward compatible with older architecture GPUs.

In this example, we will install PyTorch (refer to sub-pages for TensorFlow and Jax).

Step 1: Request an interactive session on a GPU node with Ampere architecture GPUs

interact -q gpu -g 1 -f ampere -m 20g -n 4

$ nvcc -o program source.cu
before submitting an MPI job
OSCAR Question and Answer Forum
Sign-up here
this page
SSH to Oscar from Windows
Batch Job Submission on Oscar
Linux Basics for Oscar
How to Use Modules in Oscar
Remote Rendering with ParaView in Oscar
pip
anaconda
Python Software Foundation recommends
software section
sbatch
A grace period of 14 days

The quota is per individual user

  • A grace period of 21 days

  • apptainer cache clean
    conda clean -a
    smbclient
    #!/bin/bash
    
    # Request a GPU partition node and access to 1 GPU
    #SBATCH -p gpu --gres=gpu:1
    
    # Request 1 CPU core
    #SBATCH -n 1
    #SBATCH -t 00:05:00
    
    # Load a CUDA module
    module load cuda
    
    # Run program
    ./my_cuda_program
    sbatch my_script.sh
    Selecting the right amount of cores for a VASP calculation
    $ nvcc -arch=sm_20 -o program source.cu

    Nightly snapshots of the file system are available for the last 30 days can be found in the following directories.

    Home directory snapshot

    Data directory snapshot

    Scratch directory snapshot

    To restore a file, copy the file from the snapshot to your directory.

    Do not use the links in your home directory snapshot to try and retrieve snapshots of data and scratch. The links will always point to the current versions of these files. An easy way to check what a link is pointing to is to use ls -l

    e.g.:

    /oscar/home/.snapshot/Oscar_<yyyy-mm-dd>_00_00_00_UTC/<username>/<path_to_file>
    /oscar/data/.snapshot/Oscar_<yyyy-mm-dd>_00_00_00_UTC/<groupname>/<username>/path_to_file>
    Access

    The two GH200 nodes are in the gracehopper partition.

    gk-condo Account

    A gk-condo user can submit jobs to the GH200 nodes with their gk-gh200-gcondo account, i.e.,

    CCV Account

    For users who are not a gk-condo user, a High End GPU priority account is required for accessing the gracehopper partition and GH200 nodes. All users with access to the GH200 nodes need to submit jobs to the nodes with the ccv-gh200-gcondo account, i.e.

    MIG Access

    To request a MIG, the feature mig needs be specified, i.e.

    Running NGC Containers

    NGC containers provide the best performance from the GH200 nodes. Running tensorflow containers is an example for running NGC containers.

    A NGC container must be built on a GH200 node for the container to run on GH200 nodes

    Running Modules

    The two nodes have Arm CPUs. So Oscar modules do not run on the two GH200 nodes. Please contact [email protected] about installing and running modules on GH200 nodes.

    Nvidia Grace Arm CPU
    Hopper GPU architecture
    concurrently and transparently access both CPU and GPU memory
    Viewing estimated time until completion for pending jobs

    This command will list all of your pending jobs and the estimated time until completion.

    Canceling jobs

    View details about completed jobs

    sacct

    The sacct command will list all of your running, queued and completed jobs since midnight of the previous day. To pick an earlier start date, specify it with the -S option:

    To find out more information about a specific job, such as its exit status or the amount of runtime or memory it used, specify the -l ("long" format) and -j options with the job ID:

    (example)

    myjobinfo

    The myjobinfo command uses the sacct command to display "Elapsed Time", "Requested Memory" and "Maximum Memory used on any one Node" for your jobs. This can be used to optimize the requested time and memory to have the job started as early as possible. Make sure you request a conservative amount based on how much was used.

    ReqMem shows the requested memory: A c at the end of number represents Memory Per CPU, a n represents Memory Per Node. MaxRSS is the maximum memory used on any one node. Note that memory specified to sbatch using --mem is Per Node.

    jobstats

    The 'jobstats' utility is now available for analyzing recently completed jobs, comparing the resources used to those requested in the job script, including CPU, GPU, and memory. If email notifications are enabled, 'jobstats' sends an email with the results and includes a prompt to contact support for help with resource requests.

    Run this command in a bash shell on Oscar. No additional module needs to be loaded.

    To send this output to your email after the job is completed, make sure that these lines are in your job submit script

    Each DGX H100 node has 112 Intel CPUs with 2TB memory, and 8 Nvidia H100 GPUs. Each H100 GPU has 80G memory.

    Access

    The two DGX H100 nodes are in the gpu-he partition. To access H100 GPUs, users need to submit jobs to the gpu-he partition and request the h100 feature, i.e.

    Running NGC Containers

    NGC containers provide the best performance from the DGX H100 nodes. Running tensorflow containers is an example for running NGC containers.

    Running Oscar Modules

    The two nodes have Intel CPUs. So Oscar modules can still be loaded and run on the two DGX nodes.

    DGX
    H100
    Nividia Hopper architecutre
    NGC containers
    Installing an R package

    First load the R version that you want to use the package with:

    Start an R session

    Note some packages will require code to be compiled so it is best to do R packages installs on the login node.

    To install the package 'wordcloud':

    You will see a warning:

    Answer y . If you have not installed any R packages before you will see the following message:

    Answer y . The package will then be installed. If the install is successful you will see a message like:

    If the installation was not successful you will see a message like:

    There is normally information in the message that gives the reason why the install failed. Look for the word ERROR in the message.

    Possible reasons for an installation failing include:

    • Other software is needed to build the R package, e.g. the R package rgdal needs gdal so you have to do module load gdal

    • A directory needs deleting from a previous failed installation.

    Reinstalling R packages

    To reinstall R packages, start an R session and run the update.packages() command

    Removing an R package

    Start an R session:

    To remove the 'wordcloud' package:

    If the above function returns gpu, then it's working correctly. You are all set, now you can install other necessary packages.

    Modify batch file: See below the example batch file with the created environment

    interact -q gpu -g 1 -f ampere -m 20g -n 4
    module purge 
    unset LD_LIBRARY_PATH
    module load cuda cudnn
    python -m venv jax.venv
    source jax.venv/bin/activate
    $SLURM_ARRAY_TASK_ID
    .

    For more info: https://slurm.schedmd.com/job_array.html

    #SBATCH --array=<range>
    1-20
    1-10,12,14,16-20
    #!/bin/bash
    #SBATCH -J MATLAB
    #SBATCH -t 1:00:00
    #SBATCH --array=1-16
    
    # Use '%A' for array-job ID, '%J' for job ID and '%a' for task ID
    #SBATCH -e arrayjob-%a.err
    #SBATCH -o arrayjob-%a.out
    
    echo "Starting job $SLURM_ARRAY_TASK_ID on $HOSTNAME"
    matlab -r "MyMatlabFunction($SLURM_ARRAY_TASK_ID); quit;"
    Here, -f = feature. We only need to build on Ampere once.

    Step 2: Once your session has started on a compute node, run nvidia-smi to verify the GPU and then load the appropriate modules

    Step 3: Create and activate the virtual environment, unload the pre-loaded modules then load cudnn and cuda dependencies

    Step 4: Create a new vittual environment

    Step 5: Install the required packages

    The aforementioned will install the latest version of PyTorch with cuda11 compatibility, for older versions you can specify the version by:

    Step 6: Test that PyTorch is able to detect GPUs

    If the above functions return True and GPU model, then it's working correctly. You are all set, now you can install other necessary packages.

    module purge
    unset LD_LIBRARY_PATH
    module load cudnn cuda
    python -m venv pytorch.venv
    source pytorch.venv/bin/activate
    pip install --upgrade pip
    pip install torch torchvision torchaudio
    $ checkquota
    Name       Path                 Used(G)    (%) Used   SLIMIT(G)  H-LIMIT(G) Used_Inodes     SLIMIT     HLIMIT     Usage_State  Grace_Period  
    ccvdemo1   /oscar/home          3.72       2          100        140        63539           2000000    3000000    OK           None          
    ccvdemo1   /oscar/scratch       0.00       0          512        10240      1               4000000    16000000   OK           None          
    Now fetching Data directory quotas...
    Name        Used(T)   (%) Used   SLIMIT(T)   HLIMIT(T)   Used_Inodes   SLIMIT    HLIMIT    Usage_State   Grace_Period  
    data+nopi   0.0       0          0.88        0.98        466           4194304   6291456   OK            None 
    tar -cvf <directory_name>.tar.gz <directory_name>
    tar -cvf /oscar/scratch/$USER/<directory_name>.tar.gz <directory_name>
       ssh ssh.ccv.brown.edu
        screen
        smbclient "//smb.isi.ccv.brown.edu/SHARE_NAME" -D DIRECTORY_NAME -U "ad\BROWN_ID" -m SMB3
    smbclient "//smblrs.ccv.brown.edu/Research" -D DIRECTORY_NAME -U "ad\BROWN_ID" -m SMB3
       put DIRECTORY_NAME
       screen -r
    recurse
    prompt
    mkdir ~/.screen && chmod 700 ~/.screen
    export SCREENDIR=$HOME/.screen
    screen -S experiment1-login003
    mpirun -n <number-of-tasks> vasp_std
    # 2 nodes
    #SBATCH -n 2
    # 20 tasks per node
    #SBATCH --ntasks-per-node=20
    
    mpirun -n 2 vasp_std
    $ nvcc -Xptxas -dlcm=ca -o program source.cu
    /oscar/scratch/.snapshot/Oscar_Daily_<yyyy-mm-dd>_02_00_00_UTC/<username>/<path_to_file>
    ls -l /oscar/home/.snapshot/Osar_2023-06-22_00_00_00_UTC/ghopper/data
    lrwxrwxrwx 1 ghopper navy 22 Mar  1  2016 /oscar/home/.snapshot/Osar_2023-06-22_00_00_00_UTC/ghopper/data -> /oscar/data/navy
    #SBATCH --account=gk-gh200-gcondo
    #SBATCH --partition=gracehopper
    #SBATCH --account=ccv-gh200-gcondo
    #SBATCH --partition=gracehopper
    #SBATCH --constraint=mig
    myq                   List only your own jobs.
    allq                  List all jobs, but organized by partition, and a summary of the nodes in use in the
                          partition.
    allq <partition>      List all jobs in a single partition.
    myjobinfo            Get the time and memory used for your jobs.
    squeue -u <your-username> -t PENDING --start
    scancel <jobid>
    sacct -S 2012-01-01
    sacct -lj <jobid>
    myjobinfo
    
    Info about jobs for user 'mdave' submitted since 2017-05-19T00:00:00
    Use option '-S' for a different date or option '-j' for a specific Job ID.
    
    JobID    JobName                  Submit      State        Elapsed     ReqMem     MaxRSS
    1861     ior 2017-05-19T08:31:01  COMPLETED   00:00:09     2800Mc      1744K
    1862     ior 2017-05-19T08:31:11  COMPLETED   00:00:54     2800Mc     22908K
    1911     ior 2017-05-19T15:02:01  COMPLETED   00:00:06     2800Mc      1748K
    1912     ior 2017-05-19T15:02:07  COMPLETED   00:00:21     2800Mc      1744K
    jobstats <jobid>
    #SBATCH --mail-type=END
    #SBATCH --mail-user=<email>
    #SBATCH --partition=gpu-he
    #SBATCH --constraint=h100
    module load r/4.2.2
    R
    > install.packages("wordcloud", repos="http://cran.r-project.org")
    Warning in install.packages("wordcloud", repos = "http://cran.r-project.org") :
      'lib = "/gpfs/runtime/opt/R/3.4.2/lib64/R/library"' is not writable
    Would you like to use a personal library instead?  (y/n) 
    Would you like to create a personal library
    ~/R/x86_64-pc-linux-gnu-library/3.4
    to install packages into?  (y/n) 
    ** R
    ** data
    ** preparing package for lazy loading
    ** help
    *** installing help indices
    ** building package indices
    ** testing if installed package can be loaded
    * DONE (wordcloud)
    Warning message:
    In install.packages("wordcloud", repos = "http://cran.r-project.org") :
      installation of package ‘wordcloud’ had non-zero exit status
    
    module load r/4.2.2
    R
    update.packages(checkBuilt=TRUE, ask=FALSE)
    R
    > remove.packages("wordcloud")
    pip install --upgrade pip
    pip  install  --upgrade  "jax[cuda12_pip]"  -f  https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
    python
    >>> from jax.lib import xla_bridge
    >>> print(xla_bridge.get_backend().platform)
    gpu
    #SBATCH -J RBC
    #SBATCH -N 1
    #SBATCH --ntasks=1
    #SBATCH --ntasks-per-node=1
    #SBATCH --time=3:30:00
    #SBATCH --mem=64GB
    #SBATCH --partition=gpu
    #SBATCH --gres=gpu:1
    #SBATCH -o RBC_job_%j.o
    #SBATCH -e RBC_job_%j.e
    
    echo $LD_LIBRARY_PATH
    unset LD_LIBRARY_PATH
    echo $LD_LIBRARY_PATH
    
    source /oscar/data/gk/psaluja/jax_env.venv/bin/activate
    python3 -u kernel.py
    $ sbatch <jobscript>
    #!/bin/bash
    #SBATCH -J MATLAB
    #SBATCH -t 1:00:00
    #SBATCH --array=1-16
    
    # Use '%A' for array-job ID, '%J' for job ID and '%a' for task ID
    #SBATCH -e arrayjob-%a.err
    #SBATCH -o arrayjob-%a.out
    
    echo "Starting job $SLURM_ARRAY_TASK_ID on $HOSTNAME"
    t=`printf "%03d" $SLURM_ARRAY_TASK_ID`
    matlab -r "MyMatlabFunction($t); quit;"
    pip install torch torchvision torchaudio
    python
    >>> import torch 
    torch.cuda.is_available()
    True
    >>> torch.cuda.get_device_name(0)
    'NVIDIA GeForce RTX 3090'
    Intro to Open OnDemand Slides

    OOD provides with a several resources for interacting with Oscar.

    • Use the File Explorer in the portal to view, copy, download or delete files on Oscar.

    • Launch interactive apps/softwares, like Matlab and Jupyter Notebook, inside your web browser.

    • Access the Oscar shell with your browser without needing a separate terminal emulator. This is especially handy for Windows users, since you do not need to install a separate program.

    Features:

    1. No installation needed. Just use your favorite browser!

    2. No need to enter your password again. SSH into Oscar in seconds!

    3. No need to use two-factor authentication multiple times. Just do it once, when you log into OOD.

    4. Use it with, or without, VPN. Your workflow remains the same.

    Open on Demand

    3. Enter the Host Name you usually use to connect to Oscar, and click 'Open'.

    4. Entering your password. If you have ssh keys setup on your local computer to connect to GitHub, you can confirm your ssh-agent was properly forwarded by checking GitHub . If the ssh command fails, your agent has not been properly forwarded.

    ssh -T [email protected]
    Hi JaneDoe! You've successfully authenticated, but GitHub does not provide shell access.
    Connection to github.com closed.
    An example loading screen for Ncdu (the full directory for the "Current item" has been obscured)

    Once Ncdu has finished loading, you will see a result like this:

    A list of files and their sizes as displayed in the output of a call to Ncdu

    The files will be ordered with the largest file at the top and the smallest file at the bottom. The bottom left corner shows the Total disk usage (which in this case is 25.5 KiB). To quit out of this display, simply press q on your keyboard.

    If there is a subdirectory within the directory you're inspecting, the files and directories within that subdirectory can be viewed by selecting the directory with the gray bar (using up and down arrow keys as needed) and then using the right arrow key.

    module load ncdu/1.14
    ncdu my_directory

    MobaXterm

  • WSL2 (we recommend Ubuntu as your Linux distribution)

  • After opening a terminal using any of these programs, simply enter the ssh command provided by the jupyter-log-{jobid}.txt file. Then continue with the steps given by the documentation that led you to this page.

    If you have PuTTY and would prefer to not download any additional software, there are steps (explained below) that you can take to use PuTTY to tunnel into a Jupyter Notebook/Lab.

    Using PuTTY

    These instructions will use ssh -N -L 9283:172.20.209.14:9283 [email protected] as an example command that could be found in the jupyter-log-{jobid}.txt file.

    Open PuTTY and enter your host name ([email protected]) in the textbox.

    Next, navigate to the 'Tunnels' Menu (click the '+' next to SSH in order to have it displayed).

    Enter the source port (9283 in the example) and destination (172.20.209.14:9283 in the example). Click 'Add'. The source port and destination should show up as a pair in the box above. Then click 'Open'. A new window should open requesting your password.

    After entering your password, you should be able to access the notebook/lab in a browser using localhost:ipnport (see the documentation that led you here for details).

    Windows Terminal

    In the top menu, click Clusters -> >_OSCAR Shell Access

    A new tab will open and the web-based terminal app will be launched in it. The shell will be launched on one of the login nodes.

    The shell DOES NOT start on a compute node. Please do not run computations or simulations on the login nodes, because they are shared with other users. You can use the login nodes to compile your code, manage files, and launch jobs on the compute nodes.

    3. You are logged into one of the login nodes. You can launch batch jobs from this terminal or start an interactive job for anything computationally intensive.

    Features:

    1. No installation needed. Just use your favorite browser!

    2. No need to enter your password again. SSH into Oscar in seconds!

    3. No need to use two factor authentication again. Just do it once, when you log into OOD.

    4. Use it with, or without, VPN. Your workflow remains the same.

    https://ood.ccv.brown.edu
    [email protected]
    Remote launch settings for Ocsar

    Using Python or Conda environments in the Jupyter App

    We recommend all users to install Python packages within an environment. This can be a Conda to a python virtual environment. More information can be found here. Follow these steps to use such environments in the Jupyter app.

    Python Environments:

    One Time Setup:

    1. Open a terminal on Oscar.

    2. Load the relevant python module and create and/or activate the environment. See this page for more information about creating .

    3. Run pip install notebook to install Jupyter notebook, if not already installed.

    Launching Jupyter Notebook

    1. Open the "Basic Jupyter Notebook for Python Environments" app on the Open OnDemand interface

    2. Under "Python Module on Oscar", choose the python module you loaded when the environment was created.

    3. Under "Python Virtual Environment", add the name of the Virtual Environment you created. Note: If your virtual environment is not at the top level of your home directory, you should input the absolute path to the environment directory.

    Conda Environments

    One Time Setup:

    1. Open a terminal on Oscar.

    2. Activate the conda environment.

    3. Run pip install notebook to install Jupyter notebook, if not already installed.

    4. Run pip install ipykernel

    Launching Jupyter Notebook

    1. Open the "Basic Jupyter Notebook with Anaconda" app on the Open OnDemand interface

    2. Under "Oscar Anaconda module", choose "anaconda/2020.02"

    3. Enter the name of the conda environment in "Conda Env"

    4. Choose the other options as required.

    Version Control

    Git Overview

    Version Control refers to the management of changes made to source code or any such large amount of information in a robust manner by multiple collaborators. Git is by far the most popular version control system.

    Git enables effective collaboration among developers. In a team setting, multiple developers often work on the same project simultaneously. With Git, each developer can work on their own local copy of the project, making changes and experimenting freely without affecting the main codebase. Git allows developers to merge their changes seamlessly, ensuring that modifications made by different individuals can be consolidated efficiently. It provides mechanisms to track who made specific changes, making it easier to understand the evolution of the project and identify potential issues.

    Git Workflow

    Nearly all operations that are performed by Git are in you local computing environment, for the exception of few used purely to synchronize with a remote. Some of the most common git operations are depicted below. In summary a typical flow consists of making changes to your files, staging them via git add, marking a save point via git commit, then finally syncing to your remote (e.g., GitHub) via git push. If you are pushing changes to your remote from multiple places, you can bring changes your most recent version using git pull, which is the equivalent of doing git fetch followed by a git merge operation

    Cheatsheet

    Below are some of the most commonly used Git commands. You can also get much more information by running git --help. And if you'd like to learn more there is an

    Command
    Summary

    Git Configuration

    While using Git on Oscar, make sure that you to have your correct Name and Email ID to avoid confusion while working with remote repositories (e.g., GitHub, GitLab, BitBucket).

    Getting Out of Trouble

    Git can sometimes be a bit tricky. And we all eventually find ourselves in a place where we want to undo something or fix a mistake we made with Git. (pardon the profanity) has a bunch of really excellent solutions to common problems we sometimes run in to with Git.

    From Non-compliant Networks (2-FA)

    Accessing VSCode from Non-Brown compliant networks

    This guide is only for users connecting from Non-Brown Compliant Networks. 2-FA is mandatory.

    1. Install the Remote Development extension pack for VSCode

    2. Open VSCode settings

    • On Windows/Linux - File > Preferences > Settings

    • On macOS - Code > Preferences > Settings

    Search for symlink and make sure the symlink searching is unchecked

    3. Under VSCode settings, search for remote ssh timeout and manually enter a timeout value i.e. 50s. It should give you enough time to complete 2-Factor Authentication.

    4. Edit the ~/.ssh/config file on your local machine, add the following lines. Replace <username> with your Oscar username.

    6. In VSCode, select Remote-SSH: Connect to Host… and after the list populates select ccv-vscode-node

    1. When prompted in VSCode, please enter your Brown password and complete the DUO authentication. After that, wait about 30 seconds and VSCode should connect to Oscar.

    Mac/Linux

    Agent Forwarding in Mac and Linux Systems

    Start the SSH-Agent

    First, start your ssh-agent with the command below.

    $ eval $(ssh-agent)

    You should see an output similar to this:

    Add Key(s)

    Next, add your ssh private keys to the running agent (using the ssh-add command on line 1). This step may be repeated for every key pair you use to connect to different git servers. For most, this file is called id_rsa and will live in ~/.ssh/id_rsa. If you set a password for your ssh keys, the agent will prompt you to enter them.

    Confirm the ssh keys have been loaded into the agent with ssh-add -L:

    Connect to Oscar

    Now ssh into Oscar with the -A option as shown on the first line below (replace username with your Oscar username). -A will forward your ssh-agent to Oscar, enabling you to use the ssh keys on your laptop while logged into Oscar.

    If you have ssh keys setup on your local computer to connect to GitHub, you can confirm your ssh-agent was properly forwarded by checking GitHub . If the ssh command fails, your agent has not been properly forwarded.

    Always connecting with Agent Forwarding

    To make these changes permanent, you can add the ForwardAgent yes option to your ssh configuration file. To learn more about configuring your ssh connections, visit

    Windows(PuTTY)

    Key Generation & Setup

    1. Open PuTTYgen (this comes as part of the PuTTY package), change the 'Number of bits in a generated key:' to 4096 (recommended), then click 'Generate'

    2. Move your cursor around randomly in order to "salt" your key, while the key is being generated. Once the key is generated, you should see something like this:

    3. Replace the text in the 'Key comment:' field with something recognizable and enter a passphrase in the two fields below.

    4. Copy the text in the 'Public key for pasting...' field (the text continues past what is displayed) and paste it wherever the public key is needed. If you are using GitHub, you can now create a new SSH key in your Personal Settings and paste this text into the 'Key' field.

    5. Click on 'Save private key' and select a logical/recognizable name and directory for the file. Your private key is saved in the selected file.

    6. Open Pageant (also part of the PuTTY package). If a message saying "Pageant is already running" is displayed, open your system tray and double click on the Pageant icon.

    To open your system tray, click on the up arrow (looks like: ^ ) icon at the bottom right of your screen (assuming your taskbar is at the bottom of your screen).

    7. Click on 'Add Key' and select the file you saved when generating your key earlier (Step 5). If it is requested, enter the passphrase you created at Step 3 to complete the process.

    In order to not have to add the key to Pageant after every time your machine reboots, you can add the key file(s) to your Windows startup folder (the directory for the current user is C:\Users\[User Name]\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup). You may still be prompted to enter the passphrase after a reboot, but you will not have to find and add the key to Pageant every time.

    Ampere Architecture GPUs

    The new Ampere architecture GPUs on Oscar (A6000's and RTX 3090's)

    The new Ampere architecture GPUs do not support older CUDA modules. Users must re-compile their applications with the newer CUDA/11 or older modules. Here are detailed instructions to compile major frameworks such as PyTorch, and TensorFlow.

    PyTorch

    Users can install PyTorch from a pip virtual environment or use pre-built singularity containers provided by Nvidia NGC.

    To install via virtual environment:

    To use NGC containers via Singularity :

    • Pull the image from NGC

    • Export PATHs to mount the Oscar file system

    • To use the image interactively

    • To submit batch jobs

    Understanding Disk Quotas

    Checkquota

    Use the command checkquota to view your current disk usage and quotas. Here's an example output of this command

    Screenshot of the command checkquota

    Each line represents a top level directory that you have access to.

    Each column represents a usage or quota for these directories.

    Types of Quota:

    Disk usage and quotas are calculated separately for . Two types of quotas are calculated for each of these directories:

    Disk space usage

    This usage is expressed in Gigabytes (G) or Terabytes (T) . This is the total size of all the files in that directory and it does not depend upon the number of files. Run the command checkquota to see your disk usage and quota. Here's an example:

    Inode usage

    This is the total number of files and directories in the particular directory. This number does not depend upon the size of the files. Run the command checkquota to see your inode usage and quota. Here's an example:

    Soft Limits vs Hard Limits

    All quotas have a soft limit (SLimit) and hard limit (HLimit). When usage exceeds the soft limit, a grace period associated with this limit begins. During the grace period, the usage is allowed to increase up to the hard limit. When the usage reaches the hard limit or when the grace period expires, the user is not allowed to write any files to that particular directory.

    Usage State

    The "Usage State" column shows the status of the grace period for a particular directory. Here are some of the status messages:

    SOFT_EXCEEDED

    This indicates that your usage of the disk space or inodes has exceeded the soft limit and you are still within the grace period. Check the Grace_Period column to see the number of days left in the grace period. You may continue writing data into this directory until the end of the grace period, as long as you do not exceed the hard limit

    GRACE_EXPIRED

    This indicates that your usage has exceeded the soft limit AND the grace period has expired. You will not be able to write data into that directory, but you can remove files.

    HARD_EXCEEDED

    This indicates that your usage has reached the hard limit. You will not be able to write data into that directory, but you can remove data.

    OK

    This indicates that your usage of the disk space as well as inodes in within the soft quota.

    Intro to CUDA

    Introduction to CUDA

    CUDA is an extension of the C language, as well as a runtime library, to facilitate general-purpose programming of NVIDIA GPUs. If you already program in C, you will probably find the syntax of CUDA programs familiar. If you are more comfortable with C++, you may consider instead using the higher-level Thrust library, which resembles the Standard Template Library and is included with CUDA.

    In either case, you will probably find that because of the differences between GPU and CPU architectures, there are several new concepts you will encounter that do not arise when programming serial or threaded programs for CPUs. These are mainly to do with how CUDA uses threads and how memory is arranged on the GPU, both described in more detail below.

    There are several useful documents from NVIDIA that you will want to consult as you become more proficient with CUDA:

    There are also many CUDA tutorials available online:

    • from NVIDIA

    • from The Supercomputing Blog

    Threads in CUDA

    CUDA uses a data-parallel programming model, which allows you to program at the level of what operations an individual thread performs on the data that it owns. This model works best for problems that can be expressed as a few operations that all threads apply in parallel to an array of data. CUDA allows you to define a thread-level function, then execute this function by mapping threads to the elements of your data array.

    A thread-level function in CUDA is called a kernel. To launch a kernel on the GPU, you must specify a grid, and a decomposition of the grid into smaller thread blocks. A thread block usually has around 32 to 512 threads, and the grid may have many thread blocks totalling thousands of threads. The GPU uses this high thread count to help it hide the latency of memory references, which can take 100s of clock cycles.

    Conceptually, it can be useful to map the grid onto the data you are processing in some meaningful way. For instance, if you have a 2D image, you can create a 2D grid where each thread in the grid corresponds to a pixel in the image. For example, you may have a 512x512 pixel image, on which you impose a grid of 512x512 threads that are subdivided into thread blocks with 8x8 threads each, for a total of 64x64 thread blocks. If your data does not allow for a clean mapping like this, you can always use a flat 1D array for the grid.

    The CUDA runtime dynamically schedules the thread blocks to run on the multiprocessors of the GPU. The M2050 GPUs available on Oscar each have 14 multiprocessors. By adjusting the size of the thread block, you can control how much work is done concurrently on each multiprocessor.

    Memory on the GPU

    The GPU has a separate memory subsystem from the CPU. The M2050 GPUs have GDDR5 memory, which is a higher bandwidth memory than the DDR2 or DDR3 memory used by the CPU. The M2050 can deliver a peak memory bandwidth of almost 150 GB/sec, while a multi-core Nehalem CPU is limited to more like 25 GB/sec.

    The trade-off is that there is usually less memory available on a GPU. For instance, on the Oscar GPU nodes, each M2050 has only 3 GB of memory shared by 14 multiprocessors (219 MB per multiprocessor), while the dual quad-core Nehalem CPUs have 24 GB shared by 8 cores (3 GB per core).

    Another bottleneck is transferring data between the GPU and CPU, which happens over the PCI Express bus. For a CUDA program that must process a large dataset residing in CPU memory, it may take longer to transfer that data to the GPU than to perform the actual computation. The GPU offers the largest benefit over the CPU for programs where the input data is small, or there is a large amount of computation relative to the size of the input data.

    CUDA kernels can access memory from three different locations with very different latencies: global GDDR5 memory (100s of cycles), shared memory (1-2 cycles), and constant memory (1 cycle). Global memory is available to all threads across all thread blocks, and can be transferred to and from CPU memory. Shared memory can only be shared by threads within a thread block and is only accessible on the GPU. Constant memory is accessible to all threads and the CPU, but is limited in size (64KB).

    Mac/Linux/Windows(PowerShell)

    Step 1 : Check for existing SSH key pair

    Before generating new SSH key pair first check if you have an SSH key on your local machine.

    If there are existing keys, please move to Step 3

    Step 2 : Generate a new SSH Keypair

    Press Enter to accept the default file location and file name.

    The ssh-keygen will ask you to type a secure passphrase. This is optional. If you don't want to use a passphrase just press Enter

    Verify the SSH keys are generated correctly, you should see two files id_rsa and id_rsa.pub under ~/.ssh directory.

    DO NOT upload or send the private key.

    Step 3 : Copy the public key to Oscar

    You will now need to copy your public key to Oscar. There are two ways to acomplish this.

    With ssh-copy-id

    If your OS comes with the ssh-copy-id utility, then you'll be able to copy your public key into Oscar as follows:

    You will be prompted for a Password. The public key will be appended to the authorized_keys file on Oscar.

    If you used a custom name for your key instead of the default id_rsa then you'll need pass the name of your key to ssh-copy-id i.e.,

    Without ssh-copy-id

    If your system does not come with the ssh-copy-id utility installed, then you'll need to copy your public key by hand.

    1. Get the contents of id_rsa.pub file. One option is to use cat in your teminal cat id_rsa.pub.

    2. Copy the contents of this file to your clipboard, as we need to upload it to Oscar.

    3. Login into Oscar via regular ssh ssh <username>@ssh.ccv.brown.edu

    Step 4 : Login to Oscar using your SSH keys

    If everything went well, you will be logged in immediately withouth prompting you for a password.

    Installing TensorFlow

    Setting up a GPU-accelerated environment can be challenging due to driver dependencies, version conflicts, and other complexities. Apptainer simplifies this process by encapsulating all these details

    Apptainer Using NGC Containers (Our #1 Recommendation)

    There are multiple ways to install and run TensorFlow. Our recommended approach is via NGC containers. The containers are available via NGC Registry. In this example we will pull TensorFlow NGC container

    1. Build the container:

    This will take some time, and once it completes you should see a .simg file.

    For your convenience, the pre-built container images are located in directory:

    /oscar/runtime/software/external/ngc-containers/tensorflow.d/x86_64/

    You can choose either to build your own or use one of the pre-downloaded images.

    Working with Apptainer images requires lots of storage space. By default Apptainer will use ~/.apptainer as a cache directory which can cause you to go over your Home quota.

    1. Once the container is ready, request an interactive session with a GPU

    1. Run a container wih GPU support

    the --nv flag is important. As it enables the NVIDA sub-system

    1. Or, if you're executing a specific command inside the container:

    1. Make sure your Tensorflow image is able to detect GPUs

    1. If you need to install more custom packages, the containers itself are non-writable but we can use the --user flag to install packages inside .local Example:

    Slurm Script:

    Here is how you can submit a SLURM job script by using the srun command to run your container. Here is a basic example:

    Gaussian

    Gaussian is a general purpose computational chemistry package. Oscar uses the Gaussian 9 package.

    Setting Up Gaussian

    In order to use Gaussian on Oscar, you must be a part of the ccv-g09 group. To check your groups, run the groups command in the terminal.

    You must first choose a Gaussian module to load. To see available Gaussian modules, run module avail gauss. You can load a Gaussian module using the command module load <module-name>.

    Available Versions

    • Gaussian 9 (g09)

    • Gaussian 16 (g16)

    NOTE: There are three versions of g09, you can load any one of those, but the newer version g16 is preferred now. If using g09 just replace g16 below with g09.

    Running Gaussian

    Gaussian can be run either interactively or within a batch script using one of two command styles:

    • g16 job-name

    • g16 <input-file >output-file

    In the first form, the program reads input from job-name.gjf and writes its output to job-name.log. When no job-name has been specified, the program will read from standard input and write to standard output

    Given a valid .gjf file (we'll call it test-file.gjf), we can use the following simple batch script to run Gaussian:

    g16-test.sh

    Then queue the script using

    Once the job has been completed, you should have a g16-test.out, a g16-test.err, and a test-file.out.

    Arm Forge

    Arm Forge is available on Oscar. There are two products, DDT (debugger) and MAP (performance reports).

    We recommend you use the Arm Forge remote client to launch your debugging jobs on Oscar. The first time you set up Arm Forge you will need to configure the client with the following steps:

    1. Download the arm forge remote client on your machine.

    2. Configuring Remote Launch from the client

    Compile your code with -g so you can see the source code in your debugging session

    Arm DDT

    Arm DDT is a powerful graphical debugger suitable for many different development environments, including:

    • Single process and multithreaded software.

    • OpenMP.

    • Parallel (MPI) software.

    Arm MAP

    Arm MAP is a parallel profiler that shows you which lines of code took the most time to run, and why. Arm MAP does not require any complicated configuration, and you do not need to have experience with profiling tools to use it.

    Arm MAP supports:

    • MPI, OpenMP and single-threaded programs.

    • Small data files. All data is aggregated on the cluster and only a few megabytes written to disk, regardless of the size or duration of the run.

    • Sophisticated source code view, enabling you to analyze performance across individual functions.

    • Both interactive and batch modes for gathering profile data.

    IDL

    Interactive Data Language (IDL) is a programming language used for data analysis and is popular in several scientific fields. This page explains how to use the IDL module on Oscar run IDL programs.

    Setting Up IDL

    First load the IDL module that you want to use with module load idl/version_number:

    You can use the command module load idl to simply load the default version. This is demonstrated in the following command followed by system dialogue.

    As indicated by the system dialogue, you will need to enter the following command to set up the environment for IDL:

    IDL Command Line

    Once you've set up IDL in the way outlined above, you can open the IDL command line by simply using the command idl:

    Note: To exit this environment, simply use the command exit

    As is stated in the , IDL in command-line mode "uses a text-only interface and sends output to your terminal screen or shell window." Thus, this is a mode in which you can enter commands and see their results in real time, but it is not where one should write full IDL programs.

    IDL Programs

    To write an IDL program, you can use any of the text editors on Oscar (such as vim, emacs, and nano) or you can create the program in a file on your own computer and then copy that file to Oscar when you are finished. Here is an example (hello world) IDL program idl_hello_world.pro:

    This file and the batch file below can be found at /gpfs/runtime/software_examples/idl/8.5.1 if you wish to copy them and test the process yourself.

    Once you have the .pro file on Oscar, you can then run this file using a batch script. Here is a bare bones version of a batch script (called idl_hello_world.sh)that will run the script idl_hello_world.pro (note that the .pro is omitted in the script).

    We can then run the batch file by using the sbatch command:

    SSH (Terminal)

    To log in to Oscar you need Secure Shell (SSH) on your computer.

    You need log in using your Brown password. Old Oscar password can not be used for ssh any more.

    There are two options for signing into Oscar: with or without VPN.

    If you are connected to the Brown VPN, you have the option of to connect to Oscar without having to enter your password.

    SSH Configuration File

    How to save ssh configurations to a configuration file

    When regularly connecting to multiple remote systems over SSH, you’ll find that remembering all the hosts and various command-line options becomes tedious. OpenSSH allows setting up a configuration file to store different SSH options for each remote machine you connect t.

    SSH Config File Location

    OpenSSH client-side (in this case your personal computer) configuration file is named config, and it is stored in the hidded .sshdirectory under your user’s home directory (i.e., ~/.ssh)

    Using Modules

    CCV uses thepackage for managing the software environment on OSCAR. The advantage of the modules approach is that it allows multiple versions of the same software to be installed at the same time. With the modules approach, you can "load'' and "unload'' modules to dynamically control your environment.

    Check out our !

    Mixing MPI and CUDA

    Combining CUDA and MPI

    Mixing MPI (C) and CUDA (C++) code requires some care during linking because of differences between the C and C++ calling conventions and runtimes. One option is to compile and link all source files with a C++ compiler, which will enforce additional restrictions on C code. Alternatively, if you wish to compile your MPI/C code with a C compiler and call CUDA kernels from within an MPI task, you can wrap the appropriate CUDA-compiled functions with the extern keyword, as in the following example.

    These two source files can be compiled and linked with both a C and C++ compiler into a single executable on Oscar using:

    The CUDA/C++ compiler nvcc is used only to compile the CUDA source file, and the MPI C compiler mpicc

    Using CCMake

    Guide to build and compile software using CCMake.

    Open-source software refers to any program whose source code is available for use or modification as users or other developers see fit. This is usually developed as a public collaboration and made freely available.

    CMake and CCMake

    Due to the complexity of some software, we often have to link to third party or external libraries. When working with software that has complicated building and linking steps, it is often impractical to use GCC (or your favorite compiler) directly. GNU Make is a build system that can simplify things somewhat, but "makefiles" can become unwieldy in their own way. Thankfully for us, there is a tool that simplifies this process.

    CMake is a build system generator that one can use to facilitate the software build process. CMake allows one to specify—at a higher level than GNU Make—the instructions for compiling and linking our software. Additionally, CMake comes packaged with CCMake, which is an easy-to-use interactive tool that will let us provide build instructions to the compiler and the linker for projects written in C, Fortran, or C++. For more information about CMake and CCMake, please click

    Intro to Parallel Programming

    This page serves as a guide for application developers getting started with parallel programming, or users wanting to know more about the working of parallel programs/software they are using.

    Although there are several ways to classify parallel programming models, a basic classification is:

    1. Distributed Memory Programming

    Condo/Priority Jobs

    Note: we do not provide users condo access by default if their group/PI has a condo on the system. You will have to explicitly request a condo access and we will ask for approval from the PI.

    To use your condo account to submit jobs, please follow the steps below to check the association of your Oscar account and include condo information in your batch script or command line.

    Step 1 - Check your account associations to find your condo Account and Partition information by running the following command:

    In the example below, the user has access to two condos, where their

    MPI4PY

    This page documents how to use the MPI for Python package within a Conda environment.

    Using MPI4PY in a Python Script

    The installation of mpi4py will be discussed in the following sections. This section provides an example of how mpi4py would be used in a python script after such an installation.

    To use MPI in a python script through

    Slurm Partitions

    Partition Overview

    Oscar has the following partitions. The number and size of jobs allowed on Oscar vary with both partition and type of user account. You can email [email protected] if you need advice on which partitions to use‌.

    To list partitions on Oscar available to your account, run the following command:

    To view all partitions (including ones you don't have access to), replace the -O in the command above with -aO.

    Agent pid 48792
    [ccvdemo2@login010 ~]$ checkquota
    Name       Path                 Used(G)    (%) Used   SLIMIT(G)  H-LIMIT(G) Used_Inodes     SLIMIT     HLIMIT     Usage_State  Grace_Period
    ccvdemo2   /oscar/home          23.29      18         100        125        188001          2000000    3000000    OK           None
    ccvdemo2   /oscar/scratch       0.00       0          512        10240      2               4000000    16000000   OK           None
    Now fetching Data directory quotas...
    Name            Used(T)   (%) Used   SLIMIT(T)   HLIMIT(T)   Used_Inodes   SLIMIT    HLIMIT    Usage_State   Grace_Period
    data+nopi       0.0       0          0.88        0.98        549           4194304   6291456   OK            None
    data+ccvinter   0.015     1          0.50        1.00        122281        4194304   6291456   OK            None
    ==========================================================================================
    Jobtmp Quotas: /jobtmp/$USER is the ultra-fast parallel storage system only for jobs. Not meant for long-term use
    ==========================================================================================
    Block Limits                                    |     File Limits
    Filesystem type         blocks      quota      limit   in_doubt    grace |    files   quota    limit in_doubt    grace  Remarks
    jobtmp     USR               0         1T        12T          0     none |        1 1000000  2000000        0     none sss6k.oscar.ccv.brown.edu
    Got more Questions?
    Read documentation: https://docs.ccv.brown.edu/oscar/managing-files/filesystem
    Email: [email protected]
    --------------------------------------------------------------------------------
    ls ~/.ssh/id_*.pub
    apptainer build tensorflow-24.03-tf2-py3.simg docker://nvcr.io/nvidia/tensorflow:24.03-tf2-py3
    Run pip install ipykernel to install ipykernel in this environment.
  • Run python -m ipykernel install --user --name=<myenv> where <myenv> is the name of the environment.

  • Under the "Modules" , enter the name of the python module used to create the environment. Add any additional modules you may need separated with a space.
  • Choose the other options as required.

  • Click "Launch" to start the job

  • Click "Connect to Jupyter" on the next screen.

  • To start a new notebook, click "New" -> <myenv> where <myenv> is the environment.

  • For starting a pre-existing notebook, open the notebook. In the Jupyter interface, click "Kernel" -> "Change Kernel" -> <myenv> where myenv is the name of the environment.

  • to install ipykernel in this environment.
  • Run python -m ipykernel install --user --name=<myenv> where <myenv> is the name of the environment.

  • Click "Launch" to start the job

  • Click "Connect to Jupyter" on the next screen.

  • To start a new notebook, click "New" -> <myenv> where <myenv> is the environment.

  • For starting a pre-existing notebook, open the notebook. In the Jupyter interface, click "Kernel" -> "Change Kernel" -> <myenv> where myenv is the name of the environment.

  • virtual environments
    CUDA C Programming Guide
    CUDA C Best Practices Guide
    CUDA Runtime API
    CUDA Training
    CUDA, Supercomputing for the Masses
    CUDA Tutorial
  • A rich set of metrics, that show memory usage, floating-point calculations and MPI usage across processes, including:

    • Percentage of vectorized instructions, including AVX extensions, used in each part of the code.

    • Time spent in memory operations, and how it varies over time and processes, to verify if there are any cache bottlenecks.

    • A visual overview across aggregated processes and cores that highlights any regions of imbalance in the code.

  • Set up Job Submission Settings
    Shared Memory Parallelism

    This model is useful when all threads/processes have access to a common memory space. The most basic form of shared memory parallelism is Multithreading. According to Wikipedia, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler (Operating System).

    Note that most compilers have inherent support for multithreading up to some level. Multithreading comes into play when the compiler converts your code to a set of instructions such that they are divided into several independent instruction sequences (threads) which can be executed in parallel by the Operating System. Apart from multithreading, there are other features like "vectorized instructions" which the compiler uses to optimize the use of compute resources. In some programming languages, the way of writing the sequential code can significantly affect the level of optimization the compiler can induce. However, this is not the focus here.

    Multithreading can also be induced at code level by the application developer and this is what we are interested in. If programmed correctly, it can also be the most "efficient" way of parallel programming as it is managed at the Operating System level and ensures optimum use of "available" resources. Here too, there are different parallel programming constructs which support multithreading.

    Pthreads

    POSIX threads is a standardized C language threads programming interface. It is a widely accepted standard because of being lightweight, highly efficient and portable. The routine to create Pthreads in a C program is called pthread_create and an "entry point" function is defined which is to be executed by the threads created. There are mechanisms to synchronize the threads, create "locks and mutexes", etc. Help pages:

    • Comprehensive tutorial page on POSIX Threads Programming

    • Compiling programs with Pthreads

    OpenMP

    OpenMP is a popular directive based construct for shared memory programming. Like POSIX threads, OpenMP is also just a "standard" interface which can be implemented in different ways by different vendors.

    Compiler directives appear as comments in your source code and are ignored by compilers unless you tell them otherwise - usually by specifying the appropriate compiler flag (https://computing.llnl.gov/tutorials/openMP). This makes the code more portable and easier to parallelize. you can parallelize loop iterations and code segments by inserting these directives. OpenMP also makes it simpler to tune the application during run time using environment variables. for example, you can set the number of threads to be used by setting the environment variable OMP_NUM_THREADS before running the program. Help pages:

    • https://computing.llnl.gov/tutorials/openMP

    • Compiling OpenMP Programs

    Shared Memory Programming
    top level directories

    git add <FILENAME>

    Add files to staging area for next commit

    git commit -m "my awesome message"

    Commit staged files

    git push

    Upload commit to remote repository

    git pull

    Get remote repo's commits and download (try and resolve conflicts)

    git clone <URL>

    Download entire remote repository

    excellent and thorough tutorial on Atlassian's website.
    configure Git
    This website
    SSH Configuration File
    # Make sure none of the LMOD modules are loaded
    module purge 
    module list
    
    # create and activate the environment
    python -m venv pytorch.venv
    source pytorch.venv/bin/activate
    pip install torch torchvision torchaudio
    
    # test if it can detect GPUs 
    . Once you are on the login node, open the authorized_keys file with your text editor of choice e.g.,
    vim ~/.ssh/authorized_keys
    or
    nano ~/.ssh/authorized_keys
    Add your public keys to
    end
    of this file. Save and exit.
    ssh-keygen -t rsa
    ssh-keygen.exe
    IDL Documentation

    When you use the sshcommand for the first time. The ~/.ssh directory is automatically created. If the directory doesn’t exist on your system, create it using the command below:

    By default, the SSH configuration file may not exist, so you may need to create it using the touch command :

    This file must be readable and writable only by the user and not accessible by others:

    SSH Config File Structure Basics

    The SSH Config File takes the following structure:

    The contents of the SSH config file is organized into sections. Each section starts with the Host directive and contains specific SSH options used when establishing a connection with the remote SSH server.

    Oscar Hosts

    Here we peovide a list of Oscar hosts and typical SSH configuration options. You have two options

    1. Copy the list of hosts below directly into your SSH Config File (i.e., ~/.ssh/config)

    2. Keep this content in a spearate file for Oscar hosts, lets say ~/.ssh/config.oscar and include that file in your main configuration file. In this case, the first line of ~/.ssh/config will be Include "~/.ssh/config.oscar"

    Don't forget to replace <username> with your user. Also the configuration assumes your identity key is ~/.ssh/id_rsa - if you named it anything else, please update the value. If you need to generate a key got here

    Connecting to your preconfigured host

    You may now connect using the shortchut notation provided by your configuration file. That is, all you need to type is:

    According to the configuration above, this is equivalent to

    Much shorter. Enjoy!

    is used to compile the C code and to perform the linking. /
    multiply.cu
    /

    include

    global void multiply (const float a, float b) { const int i = threadIdx.x + blockIdx.x blockDim.x; b[i] = a[i]; }

    extern "C" void launch_multiply(const float a, const b) { / ... load CPU data into GPU buffers a_gpu and b_gpu /

    Note the use of extern "C" around the function launch_multiply, which instructs the C++ compiler (nvcc in this case) to make that function callable from the C runtime. The following C code shows how the function could be called from an MPI task.

    / main.c /

    include

    void launch_multiply(const float a, float b);

    int main (int argc, char **argv) { int rank, nprocs; MPI_Init (&argc, &argv); MPI_Comm_rank (MPI_COMM_WORLD, &rank); MPI_Comm_size (MPI_COMM_WORLD, &nprocs);

    Mixing MPI and CUDA

    Mixing MPI (C) and CUDA (C++) code requires some care during linking because of differences between the C and C++ calling conventions and runtimes. One option is to compile and link all source files with a C++ compiler, which will enforce additional restrictions on C code. Alternatively, if you wish to compile your MPI/C code with a C compiler and call CUDA kernels from within an MPI task, you can wrap the appropriate CUDA-compiled functions with the extern keyword, as in the following example.

    These two source files can be compiled and linked with both a C and C++ compiler into a single executable on Oscar using:

    The CUDA/C++ compiler nvcc is used only to compile the CUDA source file, and the MPI C compiler mpicc is used to compile the C code and to perform the linking.

    Note the use of extern "C" around the function launch_multiply, which instructs the C++ compiler (nvcc in this case) to make that function callable from the C runtime. The following C code shows how the function could be called from an MPI task.

    mpi4py
    , you must first import it using the following code:

    Example Script

    Here is an example python script mpi4pytest.py that uses MPI:

    The file mpi4pytest.py can be found at /gpfs/runtime/softwareexamples/mpi4py/

    Conda Environment

    Start by creating and activating a conda environment:

    Once you have activated your conda environment, run the following commands to install mpi4py:

    You may change the python version in the pip command.

    To check that the installation process was a success you can run

    If no errors result from running the command, the installation has worked correctly.

    Here is an example batch job script mpi4pytest_conda.sh that uses mpi4pytest.py and the conda environment setup:

    The example script above runs the python script on two nodes by using the #SBATCH -N 2 command. For more information on #SBATCH options, see our documentation.

    Python Environment

    Start by creating and activating a Python environment

    Once you have activated your conda environment, run the following command to install mpi4py:

    Below is an example batch job script mpi4pytest_env.sh:

    $ git config --global user.name "John Smith“
    $ git config --global user.email [email protected]
    $ ssh-add ~/.ssh/id_rsa
    Enter passphrase for ~/.ssh/id_rsa:
    Identity added: ~/.ssh/id_rsa 
    $ ssh-add -L
    ssh-rsa AAAAB3NzaC1y...CQ0jPj2VG3Mjx2NR user@computer
    $ ssh -A [email protected]
    $ ssh [email protected]
    
    Hi JaneDoe! You've successfully authenticated, but GitHub does not provide shell access.
    Connection to github.com closed.
    singularity build pytorch:21.06-py3 docker://nvcr.io/nvidia/pytorch:21.06-py3
    export SINGULARITY_BINDPATH="/gpfs/home/$USER,/gpfs/scratch/$USER,/gpfs/data/"
    singularity shell --nv pytorch\:21.06-py3
    #!/bin/bash
    
    # Request a GPU partition node and access to 1 GPU
    #SBATCH -p 3090-gcondo,gpu --gres=gpu:1
    
    # Ensures all allocated cores are on the same node
    #SBATCH -N 1
    
    # Request 2 CPU cores
    #SBATCH -n 2
    #SBATCH --mem=40g
    #SBATCH --time=10:00:00
    
    #SBATCH -o %j.out
    
    export SINGULARITY_BINDPATH="/gpfs/home/$USER,/gpfs/scratch/$USER,/gpfs/data/"
    singularity --version
    
    # Use environment from the singularity image
    singularity exec --nv pytorch:21.06-py3 python pytorch-cifar100/train.py -net vgg16 -gpu
    ssh-copy-id <username>@ssh.ccv.brown.edu
    ssh-copy-id -i ~/.ssh/<keyname> <username>@ssh.ccv.brown.edu
    ssh <username>@sshcampus.ccv.brown.edu
    export APPTAINER_CACHEDIR=/tmp
    export APPTAINER_TMPDIR=/tmp
    interact -q gpu -g 1 -f ampere -m 20g -n 4
    export APPTAINER_BINDPATH="/oscar/home/$USER,/oscar/scratch/$USER,/oscar/data"
    # Run a container with GPU support
    apptainer run --nv tensorflow-24.03-tf2-py3.simg
    # Execute a command inside the container with GPU support
    $ apptainer exec --nv tensorflow-24.03-tf2-py3.simg nvidia-smi
    $ python
    >>> import tensorflow as tf
    >>> tf.test.is_gpu_available(cuda_only=False, min_cuda_compute_capability=None)
    True
    Apptainer> pip install <package-name> --user
    #!/bin/bash
    #SBATCH --nodes=1               # node count
    #SBATCH -p gpu --gres=gpu:1     # number of gpus per node
    #SBATCH --ntasks-per-node=1     # total number of tasks across all nodes
    #SBATCH --cpus-per-task=1       # cpu-cores per task (>1 if multi-threaded tasks)
    #SBATCH --mem=40G               # total memory (4 GB per cpu-core is default)
    #SBATCH -t 01:00:00             # total run time limit (HH:MM:SS)
    #SBATCH --mail-type=begin       # send email when job begins
    #SBATCH --mail-type=end         # send email when job ends
    #SBATCH --mail-user=<USERID>@brown.edu
    
    module purge
    unset LD_LIBRARY_PATH
    export APPTAINER_BINDPATH="/oscar/home/$USER,/oscar/scratch/$USER,/oscar/data"
    srun apptainer exec --nv tensorflow-24.03-tf2-py3.simg python examples/tensorflow_examples/models/dcgan/dcgan.py
    #!/bin/sh
    # Job name
    #SBATCH -J g16-test
    
    # One task/node
    #SBATCH -n 1
    
    # Eight CPUs per task
    #SBATCH -c 8
    
    # batch partition
    #SBATCH -p batch
    
    # Run the command
    g16 test-file.gjf
    sbatch g16-test.sh
    $ module load idl
    module: loading 'idl/8.5.1'
    module: idl: License owned by Jonathan Pober. Set up the environment for IDL by running: "shopt -s expand_aliases; source $IDL/envi53/bin/envi_setup.bash".
    $ shopt -s expand_aliases; source $IDL/envi53/bin/envi_setup.bash
    $ idl
    IDL Version 8.5.1 (linux x86_64 m64). (c) 2015, Exelis Visual Information Solutions, Inc., a subsidiary of Harris Corporation.
    Installation number: 5501393-2.
    Licensed for use by: Brown University
    
    IDL>
    PRO IDL_HELLO_WORLD
    
    PRINT, ("Hello World!")
    
    END
    #!/bin/bash
    
    module load idl
    shopt -s expand_aliases; source $IDL/envi53/bin/envi_setup.bash
    
    idl -e idl_hello_world
    $ sbatch idl_hello_world.sh
    mkdir -p ~/.ssh && chmod 700 ~/.ssh
    touch ~/.ssh/config
    chmod 600 ~/.ssh/config
    Host hostname1
        SSH_OPTION value
        SSH_OPTION value
    
    Host hostname2
        SSH_OPTION value
    
    Host *
        SSH_OPTION value
    # Oscar Hosts. Any hosts with the -campus suffix can be accessed
    # only whithin Brown network i.e. campus or vpn
    # Hosts without -campus sufix can be accessed from outside Brown
    # but will requiere 2FA
    
    # Hosts to connect to login nodes
    Host oscar
        HostName ssh.ccv.brown.edu
        User <username>
        IdentityFile ~/.ssh/id_rsa
        ForwardAgent yes
        ForwardX11 yes
        TCPKeepAlive yes
        ServerAliveCountMax 20
        ServerAliveInterval 15
    Host oscar-campus
        HostName sshcampus.ccv.brown.edu
        User <username>
        IdentityFile ~/.ssh/id_rsa
        ForwardAgent yes
        ForwardX11 yes
        TCPKeepAlive yes
        ServerAliveCountMax 20
        ServerAliveInterval 15
        
    # When connecting from VSCODE use the following hosts
    Host vscode-oscar-campus
        HostName oscar2
        User <username>
        ProxyCommand ssh -q -W %h:%p desktop-oscar-campus
    Host vscode-oscar
        HostName oscar2
        User <username>
        ProxyCommand ssh -q -W %h:%p desktop-oscar
    ssh oscar-campus
    ssh -X -A -o TCPKeepAlive=yes -o ServerAliveCountMax=20 -o ServerAliveInterval=15 [email protected]
    module load mpi cuda
    mpicc -c main.c -o main.o
    nvcc -c multiply.cu -o multiply.o
    mpicc main.o multiply.o -lcudart
     __multiply__ <<< ...block configuration... >>> (a_gpu, b_gpu);
    
     safecall(cudaThreadSynchronize());
     safecall(cudaGetLastError());
    
     /* ... transfer data from GPU to CPU */
     /* ... prepare arrays a and b */
    
     launch_multiply (a, b);
    
     MPI_Finalize();
        return 1;
    $ module load mvapich2 cuda
    $ mpicc -c main.c -o main.o
    $ nvcc -c multiply.cu -o multiply.o
    $ mpicc main.o multiply.o -lcudart
    /* multiply.cu */
    
    #include <cuda.h>
    #include <cuda_runtime.h>
    
    __global__ void __multiply__ (const float *a, float *b)
    {
        const int i = threadIdx.x + blockIdx.x * blockDim.x;
        b[i] *= a[i];
    }
    
    extern "C" void launch_multiply(const float *a, const *b)
    {
        /* ... load CPU data into GPU buffers a_gpu and b_gpu */
    
        __multiply__ <<< ...block configuration... >>> (a_gpu, b_gpu);
    
        safecall(cudaThreadSynchronize());
        safecall(cudaGetLastError());
        
        /* ... transfer data from GPU to CPU */
    /* main.c */
    #include <mpi.h>
    
    void launch_multiply(const float *a, float *b);
    
    int main (int argc, char **argv)
    {
        int rank, nprocs;
        MPI_Init (&argc, &argv);
        MPI_Comm_rank (MPI_COMM_WORLD, &rank);
        MPI_Comm_size (MPI_COMM_WORLD, &nprocs);
    
        /* ... prepare arrays a and b */
    
        launch_multiply (a, b);
        MPI_Finalize();
        return 1;
    }
    $ python -c "import mpi4py"
    from mpi4py import MPI
    from mpi4py import MPI
    import sys
    
    def print_hello(rank, size, name):
      msg = "Hello World! I am process {0} of {1} on {2}.\n"
      sys.stdout.write(msg.format(rank, size, name))
    
    if __name__ == "__main__":
      size = MPI.COMM_WORLD.Get_size()
      rank = MPI.COMM_WORLD.Get_rank()
      name = MPI.Get_processor_name()
    
      print_hello(rank, size, name)
    $ module load hpcx-mpi/4.1.5rc2-mt
    $ pip install mpi4py
    #!/bin/bash
    
    #SBATCH --nodes=2
    #SBATCH --ntasks-per-node=4
    #SBATCH --mem=1G
    
    module load miniconda3/23.11.0s
    source /oscar/runtime/software/external/miniconda3/23.11.0/etc/profile.d/conda.sh
    conda activate my_env
    module hpcx-mpi/4.1.5rc2-mt
    
    srun --mpi=pmix python mpi4pytest.py
    $ python -m pip install mpi4py
    $ deactivate
    #!/bin/bash
    
    #SBATCH --nodes=2
    #SBATCH --ntasks-per-node=4
    #SBATCH --mem=1G
    
    
    module load hpcx-mpi/4.1.5rc2-mt
    source my_env/bin/activate
    
    srun --mpi=pmix python mpi4pytest.py

    Summary of SSH Hosts

    • ssh.ccv.brown.edu You can connect from anywhere. You will need Two Factor Authentication

    • sshcampus.ccv.brown.edu You can connect when whithin Brown Wifi, Network or VPN. You will need to set up passwordless authentication.

    • poodcit4.services.brown.edu This is the host to be used when connecting from a remote IDE, i.e., Visual Studio Code.

    • transfer.ccv.brown.edu This host is used to using SFTP protocol

    macOS and Linux

    To log in to Oscar, open a terminal and

    • If you are not connected to the Brown VPN, use the following command:

    • If you are connected to the Brown VPN, use the following command:

    The -X allows Oscar to display windows on your machine. This allows you to open and use GUI-based applications, such as the text editor gedit.

    Watch our videos on SSHing on Linux and SSHing on Mac.

    Windows

    Windows users need to install an SSH client. We recommend PuTTY, a free SSH client for Windows.

    • If you are not connected to the Brown VPN, use [email protected] as the Host Name and click Open.

    • If you are connected to the Brown VPN, use [email protected] as the Host Name and click Open.

    Confused? Watch our tutorial on PuTTY installation or SSHing to Oscar on Windows.

    Connecting to Oscar for the First Time

    The first time you connect to Oscar you will see a message about the authenticity of the host:

    You can type yes and press return. On subsequent logins you should not see this message.

    You will then be prompted for your password.

    Nothing will show up on the screen as you type in your password. Just type it in and press enter.

    You will now be in your home directory on Oscar. In your terminal you will see a prompt like this:

    Congratulations, you are now on one of the Oscar login nodes! The login nodes are for administrative tasks such as editing files and compiling code. To use Oscar for computation you will need to use the compute nodes. To get to the compute nodes from the login nodes you can either start an interactive session on a compute node, or submit a batch job.

    Please do not run CPU-intense or long-running programs directly on the login nodes! The login nodes are shared by many users, and you will interrupt other users' work.

    using an SSH key pair
    Module commands

    command

    module list

    Lists all modules that are currently loaded in your software environment.

    module avail

    Lists all available modules on the system. Note that a module can have multiple versions.

    module help <name>

    Prints additional information about the given software.

    module load <name>

    Adds a module to your current environment. If you load using just the name of a module, you will get the default version. To load a specific version, load the module using its full name with the version: "module load gcc/6.2"

    module unload <name>

    Removes a module from your current environment.

    Finding modules

    The module avail command allows searching modules based on partial names. For example:

    will list all available modules whose name starts with "bo".

    Output:

    This feature can be used for finding what versions of a module are available.

    Auto-completion using tab key

    The module load command supports auto-completion of the module name using the "tab" key. For example, writing module load bo"on the shell prompt and hitting "tab" key a couple of times will show results similar to that shown above. Similarly, the module unload command also auto completes using the names of modules which are loaded.

    What modules actually do...

    Loading a module sets the relevant environment variables like PATH, LD_LIBRARY_PATH and CPATH. For example, PATH contains all the directory paths (colon separated) where executable programs are searched for. So, by setting PATH through a module, now you can execute a program from anywhere in the file-system. Otherwise, you would have to mention the full path to the executable program file to run it which is very inconvenient. Similarly, LD_LIBRARY_PATH has all the directory paths where the run time linker searches for libraries while running a program, and so on. To see the values in an environment variable, use the echo command. For instance, to see what's in PATH:

    LMOD
    tutorial on using modules on Oscar
    .

    Make sure the source code has a CMakeLists.txt file in the root folder

    Getting the source code from a Git Repository

    Much of the time, source code is available on platforms such as GitHub, GitLab or BitBucket. Cloning (or downloading) the project from any of those is the same process. First, you need to get the URL from the repository. It usually looks like this:

    GitHub repository

    Bitbucket repository

    Where username indicates the GitHub (or BitBucket, etc) account of the owner of the project, and project_name indicates, well, try to guess.

    GitHub and BitBucket have a button at the top right side of the repository web page labeled "clone". Copy that URL

    Clone The Repository

    Create a new folder on a path with the necessary read/write permissions

    Go inside inside that folder

    Clone the repository:

    URL is the repository's link mentioned above.

    Getting the source code from a .tar or .zip file

    In case you downloaded the project from a different source, and it is contained in a .tar or .zip file. Just extract the source code on a folder with the necessary read/write permissions.

    Build the Project

    Create a new folder and name it build

    Go inside inside that folder

    Execute CCMake pointing to the root folder which has a CMakeLists.txt file

    In this example, let's assume the build folder is at the same level as the CMakeLists.txt file.

    The CCMake text interface will pop up with all the necessary attributes to build the software.

    Set up the paths to the required libraries and press "c" to configure the project. Some errors might come up about CMake unable to find some specific libraries. This could be because that library does not exist in the system or you have not loaded the right module. Please contact CCV staff on how to fix this type of errors.

    Make sure the attribute CMAKE_INSTALL_PREFIX points to a path with the necessary read/write permissions. By default it is set to the folder /usr/bin/ , which most of the users have no access to.

    Once the configuration process has ended successfully, press "g" to generate the project. Generate the project does not mean compile or execute the program, please continue reading.

    Compile the Project

    Compile the project using the command make

    You might want to increase the number of jobs compiling the software.

    To speed up the compilation process, add the parameter "-j 8" to parallelize the job.

    Once it is done, your project will be installed in the path set in the CMAKE_INSTALL_PREFIX attribute as explained above.

    If you have any questions or need help please email [email protected].

    here
    Account
    and
    Partition
    are highlighted.

    Cluster|Account|User|Partition|Share|Priority|GrpJobs|GrpTRES|GrpSubmit|GrpWall|GrpTRESMins|MaxJobs|MaxTRES|MaxTRESPerNode|MaxSubmit|MaxWall|MaxTRESMins|QOS|Def QOS|GrpTRESRunMins|

    slurmctld|abcd-condo|ccvdemo1|batch|1|||||||||||||abcd-condo|abcd-condo||

    slurmctld|default|ccvdemo1|abcd-condo|1|||||||||||||abcd-condo|abcd-condo||

    Step 2 - Choose the correct way to submit jobs to a condo according to the condo's Account column:

    For batch script - Please include the following line:

    #SBATCH --partition=<Partition>

    For command line - You can also provide this option on the command line while submitting the job using sbatch:

    $ sbatch --partition=<Partition> <batch-script>

    For interactive session - Similarly, you can change the account while asking for interactive access too:

    $ interact -q <Partition> ... <other_options>

    For batch script - Please include the following line:

    #SBATCH --account=<Account>

    For command line - You can also provide this option on the command line while submitting the job using sbatch:

    $ sbatch --account=<Account> <batch-script>

    For interactive session - Similarly, you can change the account while asking for interactive access too:

    To see the running and pending jobs in a condo:

    condo <condo-name>

    Premium Account (priority) jobs

    If you have a premium account, that should be your default QOS for submitting jobs. You can check if you have a premium account with the command groups. If you have a priority account you will see priority in your the output form groups.

    You can check the qos for a running job by running the command myq. The QOS column should show "pri-<username>"

    If you are interested in seeing all your accounts and associations, you can use the following command:

    Name

    Purpose

    batch

    general purpose computing

    debug

    short wait time, short run time partition for debugging

    vnc

    graphical desktop environment

    gpu

    GPU nodes

    gpu-he

    High End GPU nodes

    gpu-debug

    batch is the default partition.

    Partition Details

    Below are brief summary of partitions. For the details of nodes in partitions, please see here.

    batch

    • General purpose computing

    • Priority is determined by account type (from highest

      to lowest: condo, priority, exploratory)

    Condo limits apply to the group (i.e., they reflect the sum of all users on the condo). Condo users can check the limits on their condo with the command condos.

    There is no limit on the time for condo jobs, but users should be aware that planned maintenance on the machine may occur (one month’s notice is given prior to any planned maintenance).‌

    debug

    • Short wait time, short run time access for debugging

    • All users have the same limits and priority on the debug partition

    vnc

    • These nodes are for running VNC sessions/jobs

    • Account type may affect Priority

    gpu

    • For GPU-based jobs

    • GPU Priority users get higher priority and more resources than free users on the GPU partition

    • Condo users submit to the gpu partition with normal or priority access (if they have a priority account in addition to their condo)

    gpu-he

    • For GPU-based jobs

    • Uses Tesla V100 GPUs

    • Restricted to High End GPU Priority users

    gpu-debug

    • Short wait time, short run time gpu access for debugging

    • All users have the same limits and priority on the gpu-debug partition

    bigmem

    • For jobs requiring large amounts of memory

    • Priority users get higher priority and more resources than free users on the bigmem partition

    • Condo users submit to the bigmem partition with normal or priority access (if they have a priority account in addition to their condo)

    • Premium users get higher priority and more resources than free users on the SMP partition

    • Condo users submit to the SMP partition with normal or priority access (if they have a priority account in addition to their condo)

    slurm

    Account Information

    To request a priority account or a condo, use the account form on the CCV homepage. For more information on resources available to priority accounts and costs, visit the CCV Rates page.

    What username and password should I be using?

    • If you are at Brown and have requested a regular CCV account, your Oscar login can be authenticated using your Brown credentials itself, i.e. the same username and password that you use to login to any Brown service such as "canvas".

    • If you are an external user, you will have to get a sponsored ID at Brown through the department with which you are associated, before requesting an account on Oscar. Once you have the sponsored ID at Brown, you can on Oscar and use your Brown username and password to login.

    Changing Passwords

    Oscar users should use their Brown passwords to log into Oscar. Users should change their Brown passwords at .

    Exploratory Account

    • Exploratory accounts are available to all members of the Brown community for free.

    • See thefor detailed description of the resources

    • Jobs are submitted to the batch partition. See the page for available hardware

    Priority Accounts

    The following accounts are billed quarterly and offer more computational resources than the exploratory accounts. See thefor pricing and detailed description of the resources

    HPC Priority

    • Intended for users running CPU-intensive jobs. These offer more CPU and memory resources than an exploratory account

    • Two types of accounts:

      • HPC Priority

    Standard GPU Priority

    • Intended for users running GPU intensive jobs. These accounts offer fewer CPU and memory resources but more GPU resources than an exploratory account.

    • Two types of accounts:

      • Standard GPU Priority

    High End GPU Priority

    • Intended for GPU jobs required high-end gpus. These offer the same number of CPUS as Standard GPU priority accounts

    • High end GPUS like A40, v100 and a6000 are available

    • See thefor pricing and detailed description of the resources

    • Jobs are submitted to the

    Large Memory Priority

    • Intended for jobs requiring large amounts of memory.

    • These accounts offer 2TB of memory and twice the wall-time of exploratory accounts.

    • See thefor pricing and detailed description of the resources

    Condo

    PIs who purchase hardware (compute nodes) for the CCV machine get a Condo account. Condo account users have the highest priority on the number of cores equivalent to the hardware they purchased. Condo accounts last for five years and give their owners access to 25% more CPU cores than they purchase for the first three years of their lifespan. GPU resources do not decrease over the lifetime of the condo.

    Investigators may also purchase condos to grant access to computing resources for others working with them. After a condo is purchased, they can have users request to join the condo group through the "Request Access to Existing Condo" option on the on the CCV homepage.

    Common Acronyms and Terms

    Anaconda / Conda

    A distribution of Python and R used for scientific computing that is meant to simplify package management and deployment. Conda is used for installing packages and managing their dependencies. [Related Page - Anaconda]

    Association

    Within Oscar, an association refers to a combination of four factors: Cluster, Account, User, and Partition. Associations are used to control job submissions for users. []

    Batch Jobs

    Put simply, batch jobs are scheduled programs that are assigned to run on a computer without further user interaction. []

    CCV

    Brown University's Center for Computation and Visualization. Provides software, expertise, and other services for Brown's research community. See for more information.

    CESM

    Stands for Community Earth System Model. "CESM is a fully-coupled, community, global climate model that provides state-of-the-art computer simulations of the Earth's past, present, and future climate states." () []

    Condo

    PIs can purchase condos that have a significant amount of computing resources which can be shared with others. []

    CUDA

    " is an extension of the C language, as well as a runtime library, to facilitate general-purpose programming of NVIDIA GPUs." () []

    Desktop App

    This app on Open OnDemand allows users to launch a Desktop GUI on Oscar. This app is based on VNC which is a desktop sharing system that allows you to remotely control another desktop.[]

    HPC

    Stands for High Performance Computing. HPC is the ability to process data and perform highly complex calculations at an accelerated rate. Oscar is the service that CCV offers to the Brown community for their High Performance Computing needs.

    Job Array

    A job array is a collection of jobs that all run the same program but on different values of a parameter. []

    Jupyter Notebook

    "The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text." []

    Interactive Jobs

    Jobs that allow the user to interact in real time with applications within Oscar, often from the command line. This differs from batch jobs in that each command to be run must be put in one at a time. []

    Modules

    Modules are software components that can easily be loaded or unloaded into Oscar. For instance, a user can load the Python 3 module using a module load command. []

    MPI

    Stands for . MPI is a system that aims to be the standard for portable and efficient message passing. is a technique often used in object-oriented programming and parallel programming []

    Open OnDemand (OOD)

    Open OnDemand (OOD) is a web portal to the Oscar computing cluster. It can be used to launch a Desktop session on Oscar []

    OOD app

    OOD app is a web application that runs on the Open OnDemand web portal. It allows users to launch interactive applications like Jupyter Notebook, RStudio, Matlab or Desktop. []

    Partition

    Partitions are essentially groupings of nodes that allocate resources for specific types of tasks. On Oscar, partitions are based on job submissions through the Slurm workload manager. []

    PI

    Stands for Principal Investigator. Mainly used to refer to the individual responsible for conducting and administrating a research grant. Within Oscar, PIs have their own data directories that can be shared to students. PIs may also purchase condos. []

    PuTTY

    A client for SSH for Windows and Unix that emulates a terminal []

    Python

    An object-oriented, high-level, and popular programming language []

    Quality of Service (QOS)

    The job limits that are linked to a given association. For instance, Priority Accounts will generally have a higher quality of service than Exploratory Accounts. []

    Slurm

    A workload manager used within Oscar to schedule jobs []

    SSH

    Stands for Secure Shell Protocol. Used to communicate securely between computers and often used within a command-line interface (CLI) for connections to remote servers []

    SMB

    The Server Message Block (SMB) protocol is a network protocol that allows users to communicate with remote computers for file-sharing and other uses. It is one of the versions of the Common Internet File System (CIFS). Within Oscar, SMB is mainly used for file transfer. []

    Using File Explorer on OOD

    The filesystem on Oscar can be accessed through the file explorer on this web portal. The file explorer allows you

    • List files

    • Create a directory

    • Rename files

    • Copy/Move files

    To access the file explorer, click "Files" -> "Home Directory" at the top of the screen.

    Check the documentation below for some of these services below:

    Changing directories on File explorer

    To access a directory, click "Change directory" and enter the path name

    Do not use "~" in your directory path name. The path should start with "/users" or "/gpfs/"

    • To access your home directory, click the "Home Directory" link on the left. The path name at the top of the page should change to "/users/<username>"

    • To access your scratch directory, click the "scratch" directory in your home directory OR click "Change directory" and enter "/users/<username>/scratch"

    • To access your data directory, click the "data" directory in your home directory OR click "Change directory" and enter "/users/<username>/data"

    Edit plain-text files

    1. that contains the plain-text file.

    2. Click the icon with the three dots -> Edit

    3. The file will open in a text editor in a new tab

    Download files or directories

    1. that contains the file or directory.

    2. Click the icon with the three dots -> Download

    To download multiples files:

    1. Click the check-box to the left of the file name.

    2. Scroll to the top of the page and click "Download"

    Directories are downloaded as zipped files on your computer.

    Upload files or directories

    1. where you need to upload the files.

    2. Click the "Upload" button.

    3. Follow the instructions on the screen. You can click the "Browse" buttons or drag and drop files.

    Launch a terminal

    1. where you would like to open the terminal

    2. Click "Open in Terminal" at the top of the page.

    3. A web-based terminal will open in a new tab of your browser. You will be logged into one of the login nodes.

    Associations & Quality of Service (QOS)

    Associations

    Oscar uses associations to control job submissions from users. An association refers to a combination of four factors: Cluster, Account, User, and Partition. For a user to submit jobs to a partition, an association for the user and partition is required in Oscar.

    To view a table of association data for a specific user (thegrouch in the example), enter the following command in Oscar:

    If thegrouch has an exploratory account, you should see an output similar to this:

    Note that the first four columns correspond to the four factors that form an association. Each row of the table corresponds to a unique association (i.e., a unique combination of Cluster, Account, User, and Partition values). Each association is assigned a Quality of Service (see QOS section below for more details).

    Some associations have a value for GrpTRESRunMins. This value indicates a limit on the total number of Trackable RESource (TRES) minutes that can be used by jobs running with this association at any given time. The cpu=110000 for the association with the batch partition indicates that all of the jobs running with this association can have at most an accumulated 110,000 core-minute cost. If this limit is reached, new jobs will be delayed until other jobs have completed and freed up resources.

    Example of GrpTRESRunMins Limit

    Here is an example file that incurs a significant core-minute cost:

    If this file is named too_many_cpu_minutes.sh, a user withthegrouch's QOS might experience something like this:

    The REASON field will be (None) at first, but after a minute or so, it should resemble the output above (after another myq command).

    Note that the REASON the job is pending and not yet running is AssocGrpCPURunMinutesLimit. This is because the program requests 30 cores for 90 hours, which is more than the oscar/default/thegrouch/batch association allows (30 cores * 90 hours * 60 minutes/hour = 162,000 core-minutes > 110,000 core-minutes). In fact, this job could be pending indefinitely, so it would be a good idea for thegrouch to run scancel 12345678 and make a less demanding job request (or use an association that allows for that amount of resources).

    Account Quality of Service (QoS) and Resources

    Quality of Service (QoS) refers to the ability of a system to prioritize and manage network resources to ensure a certain level of performance or service quality. An association's QOS is used for job scheduling when a user requests that a job be run. Every QOS is linked to a set of job limits that reflect the limits of the cluster/account/user/partition of the association(s) that has/have that QOS. QOS's can also have information on GrpTRESRunMins limits for their corresponding associations. For example, have job limits of 1,198,080 core-minutes per job, which are associated with those accounts' QOS's. Whenever a job request is made (necessarily through a specific association), the job will only be queued if it meets the requirements of the association's QOS. In some cases, a QOS can be defined to have limits that differ from its corresponding association. In such cases, the limits of the QOS override the limits of the corresponding association. For more information, see the .

    myaccount - To list the QoS & Resources

    The myaccount command serves as a comprehensive tool for users to assess the resources associated with their accounts. By utilizing this command, individuals can gain insights into critical parameters such as Max Resources Per User and Max Jobs Submit Per User.

    Installing Python Packages

    For Python 3, we recommend using the system Python. You do not need to load any Python module to use system Python3

    Python modules do not include other common Python packages (e.g., SciPy, NumPy). This affords individual users complete control over the packages they are using.

    There are several ways for users to install python packages on Oscar

    • using a Python environment

    • using

    • into their home directory

    • into a custom location

    • from source into a custom location

    We recommend using a Python environment for your workflow if you preferpip. If you are a conda user we recommend managing your workflow with You can load an module and then use conda.

    In this document, we use angular brackets <> to denote command line options that you should replace with an appropriate value

    Intel provides optimized packages for numerical and scientific work that you can install through or .

    Using Python Enviroments (venv)

    Python environments are a cleaner way to install python packages for a specific workflow. In the example below, a virtual environment called my_cool_science is set up in your home directory:

    line 1: load the version of python you want to use

    line 2: change directory to home

    line 3: create the Python environment

    line 4: activate the Python environment

    line 5: install any packages you need for the Python environment

    line 6: deactivate the environment

    When you want to use the environment, e.g. in a batch script or an interactive session

    source ~/my_cool_science/bin/activate

    When your work is finished, deactivate the environment with

    deactivate

    Reinstalling environment

    Step 1: Generate a list of installed packages

    Activate the environment and print the list of installed packages to a file

    Step 2: Create a new environment and install packages

    Here, we create a new environment and install packages inside it from old_env_req.txt file.

    Install into your home directory

    The --user flag will instruct pip to install to you home directory

    This will install the package under the following path in user's HOME directory:

    If you omit the --user flag you will see

    This is because users do not have access to the default locations where software is installed.

    Python packages can often have conflicting dependencies. For workflows that require a lot of python packages, we recommend using virtual environments.

    Install at custom location

    Users have a limit of 20GB for their home directories on Oscar. Hence, users might want to use their data directory instead for installing software. Another motivation to do that is to have shared access to the software among the whole research group.

    This path to install location will have to be added to the PYTHONPATH environment variable so that python can find the python modules to be used. This is not necessary for software installed using the --user option.

    This can be added at the end of your .bashrc file in your home directory. This will update the PYTHONPATH environment variable each time during startup. Alternatively, you can update PYTHONPATH in your batch script as required. This can be cleaner as compared to the former method. If you have a lot of python installs at different locations, adding everything to PYTHONPATH can create conflicts and other issues.

    A caveat of using this method is that pip will install the packages (along with its requirements) even if the package required is already installed under the global install or the default local install location. Hence, this is more of a brute force method and not the most efficient one.

    For example, if your package depends on numpy or scipy, you might want to use the numpy and scipy under our global install as those have been compiled with MKL support. Using the --target option will reinstall numpy with default optimizations and without MKL support at the specified location.

    Installing from source

    Sometimes, python software is not packaged by the developers to be installed by pip. Or, you may want to use the development version which has not been packaged. In this case, the python package can be installed by downloading the source code itself. Most python packages can be installed by running the setup.py script that should be included in the downloaded files.

    You will need to provide a "prefix path" for the install location

    This will create the sub-directories bin, lib, etc. at the location provided above and install the packages there. The environment will have to be set up accordingly to use the package:

    GPUs on Oscar

    To view the various GPUs available on Oscar, use the command

    nodes gpu

    Interactive Use

    To start an session on a GPU node, use the interact command and specify the gpu partition. You also need to specify the requested number of GPUs using the -g option:

    To start an interactive session on a particular GPU type (QuadroRTX, 1080ti, p100 etc) use the feature -f option:

    GPU Batch Job

    For production runs, please submit a to the gpu partition. E.g. for using 1 GPU:

    This can also be mentioned inside the batch script:

    You can view the status of the gpu partition with:

    Sample batch script for CUDA program:

    Getting started with GPUs

    While you can program GPUs directly with CUDA, a language and runtime library from NVIDIA, this can be daunting for programmers who do not have experience with C or with the details of computer architecture.

    You may find the easiest way to tap the computation power of GPUs is to link your existing CPU program against numerical libraries that target the GPU:

    • is a drop-in replacement for BLAS libraries that runs BLAS routines on the GPU instead of the CPU.

    • is a similar library for LAPACK routines.

    • , , and provide FFT, sparse matrix, and random number generation routines that run on the GPU.

    OpenACC

    OpenACC is a portable, directive-based parallel programming construct. You can parallelize loops and code segments simply by inserting directives - which are ignored as comments if OpenACC is not enabled while compiling. It works on CPUs as well as GPUs. We have the PGI compiler suite installed on Oscar which has support for compiling OpenACC directives. To get you started with OpenACC:

    MATLAB

    NVLink Enabled GPU Nodes

    NVLink enables GPUs to pool memory over high speed links (25 G/s). This will increase performance of your application code.

    Nodes gpu[1210,1211,1212]have 4 fully connected NVLink (SXM2) V100 GPUs.

    To submit interactive job to NVLink Enabled GPU nodes:

    To submit batch job(s) add following line to your batch script.

    SMB (Local Mount)

    CCV users can access their home, data, and scratch directories as a local mount on their own Windows, Mac, or Linux system using the Common Internet File System (CIFS) protocol (also called Samba). This allows you to use applications on your machine to open files stored on Oscar. It is also a convenient way to move files between Oscar and your own machine, as you can drag and drop files.

    To use SMB you will need to be connected to the VPN. Please install the before proceeding.

    Conda and Mamba

    Both the miniconda3 and minforge modules include only conda, python, and a few other packages. Only the miniforge module provides mamba.

    Mamba is a drop-in replacement of conda, and is faster at resolving dependencies than conda. For commands like conda install and conda search, condacan be replaced with mambaon Oscar. More details can be found in .

    Conda Initialization

    CCV New User Accountbrown.co1.qualtrics.com
    ssh -X [email protected]
    ssh -X [email protected]
    The authenticity of host 'ssh.ccv.brown.edu (138.16.172.8)' can't be established.
    RSA key fingerprint is SHA256:Nt***************vL3cH7A.
    Are you sure you want to continue connecting (yes/no)?
    [username@login004 ~]$
     $ module avail bo
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ name: bo*/* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    boost/1.49.0        boost/1.63.0        bowtie2/2.3.0
    boost/1.62.0-intel  bowtie/1.2.0
    $ echo $PATH
    /gpfs/runtime/opt/perl/5.18.2/bin:/gpfs/runtime/opt/python/2.7.3/bin:/gpfs/runtime/opt/java/7u5/bin:
    /gpfs/runtime/opt/intel/2013.1.106/bin:/gpfs/runtime/opt/centos-updates/6.3/bin:/usr/lib64/qt-3.3/bin:
    /usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/ibutils/bin:/gpfs/runtime/bin
    make -j 8
    mkdir <new_folder_name>
    cd <new_folder_name>
    git clone  <URL>
    tar -xf archive.tar.gz
    mkdir build
    cd build
    ccmake ../
    make
    sacctmgr -p list assoc where user=$USER | grep -E 'condo|Account|Partition'
    sacctmgr -p list assoc where user=<username>
    $ sinfo -O "partition"     
    # Jump box with public IP address
    Host jump-box
      HostName ssh8.ccv.brown.edu
      User <username>
    # Target machine with private IP address
    Host ccv-vscode-node
      HostName vscode1
      User <username>
      ProxyCommand ssh -q -W %h:%p jump-box
    (sacctmgr list assoc | head -2; sacctmgr list assoc | grep thegrouch) | cat
       Cluster    Account       User  Partition     Share GrpJobs       GrpTRES GrpSubmit     GrpWall   GrpTRESMins MaxJobs       MaxTRES MaxTRESPerNode MaxSubmit     MaxWall   MaxTRESMins                  QOS   Def QOS GrpTRESRunMin
    ---------- ---------- ---------- ---------- --------- ------- ------------- --------- ----------- ------------- ------- ------------- -------------- --------- ----------- ------------- -------------------- --------- -------------
         oscar    default  thegrouch  gpu-debug         1                                                                                                                                               gpu-debug gpu-debug
         oscar    default  thegrouch     bigmem         1                                                                                                                                             norm-bigmem norm-big+
         oscar    default  thegrouch        smp         1                                                                                                                                                norm-smp  norm-smp
         oscar    default  thegrouch        gpu         1                                                                                                                                                norm-gpu  norm-gpu cpu=34560,gr+
         oscar    default  thegrouch      batch         1                                                                                                                                                  normal    normal    cpu=110000
         oscar    default  thegrouch        vnc         1                                                                                                                                                     vnc       vnc
         oscar    default  thegrouch      debug         1                                                                                                                                                   debug     debug

    module bin <name>

    Prints programs made available by a module

    short wait time, short run time partition for gpu debugging

    bigmem

    large memory nodes

    $ interact -a <Account> ... <other_options>
    Related Page - Associations & Quality of Service
    Related Page - Batch Jobs
    our website
    Source
    Related Page - Using a CESM module
    Related Page - Account Types
    CUDA
    Source
    Related Page - Intro to CUDA
    Related Page- Desktop App (VNC)
    Related Page - Job Arrays
    Related Page - Jupyter Notebooks on Oscar
    Related Page - Interactive Jobs
    Related Page - Using Modules
    Message Passing Interface
    Message passing
    Related Page - MPI Jobs
    Related Page - Open OnDemand
    Related Page - Interactive Apps on OOD
    Related Page - Slurm Partitions
    Related Page - Account Types
    Related Page - SSH (Terminal)
    Related Page - Python on Oscar
    Related Page - Associations & Quality of Service (QOS)
    Related Page - Slurm Partitions
    Related Page - SSH (Terminal)
    Related Page - SMB (Local Mount)
    Edit plain-text files
    Upload files or directories
    Download files or directories
    Launch a terminal from the current directory
    Navigate to the directory
    Navigate to the directory
    Navigate to the directory
    Navigate to the directory
    HPC Priority+ (Twice the resources of HPC Priority)

    See the CCV Rates page for pricing and detailed description of the resources.

  • Jobs are submitted to the batch partition. See the System Hardware page for available hardware

  • Standard GPU Priority+ (Twice the resources of Standard GPU Priority)

    See the CCV Rates page for pricing and detailed description of the resources.

  • Jobs are submitted to the gpu partition. See the System Hardware page for available GPU hardware

  • gpu-he partition
    . See the
    page for available GPU hardware
    Jobs are submitted to the bigmem partition. See the System Hardware page for available hardware

    request an account
    myaccount.brown.edu
    CCV Rates page
    System Hardware
    CCV Rates page
    CCV Rates page
    CCV Rates page
    account form
    System Hardware
    Logo
    HPC Priority accounts
    slurm QOS documentation
    conda
    conda environments .
    anaconda
    pip
    anaconda
    MAGMA combines custom GPU kernels, CUBLAS, and a CPU BLAS library to use both the GPU and CPU to simultaneously use both the GPU and CPU; it is available in the 'magma' module on Oscar.
  • Matlab has a GPUArray feature, available through the Parallel Computing Toolkit, for creating arrays on the GPU and operating on them with many built-in Matlab functions. The PCT toolkit is licensed by CIS and is available to any Matlab session running on Oscar or workstations on the Brown campus network.

  • PyCUDA is an interface to CUDA from Python. It also has a GPUArray feature and is available in the cuda module on Oscar.

  • interactive
    batch job
    CUBLAS
    CULA
    CUFFT
    CUSPARSE
    CURAND
    Introduction to OpenACC Online Course
    PGI Accelerator Compilers with OpenACC Directives
    Getting Started with OpenACC
    Running OpenACC Programs on NVIDIA and AMD GPUs
    GPU Programming in Matlab
    It is not recommended to initialize conda via conda init.

    Access Conda via Modules

    To access the conda or mamba command, load either a miniconda3 or miniforge module and then run the source command

    module load miniconda3/23.11.0s
    source /oscar/runtime/software/external/miniconda3/23.11.0/etc/profile.d/conda.sh
    module load miniforge/23.11.0-0s
    source /oscar/runtime/software/external/miniforge/23.11.0-0/etc/profile.d/conda.sh
    • shared among all users if the environment is installed in a shared directory

    • private to one user if the environment is installed in a user's private directory

    The command 'conda info' shows important configurations for conda environment.

    Below are some important configurations:

    • envs directories: a list of directories where a conda environment is installed by default. In the output of 'conda info' above, the first default directory to install a conda environment is a $HOME/anaconda.

    • package cache: a list of directories where downloaded packages are stored.

    Create a New Conda Environment

    To create a new conda environment in a default directory, run the following command:

    To create a new conda environment in a different directory, run the following command:

    Activate a Conda Environment

    After creating a conda environment, users can activate a conda environment to install or access packages in the environment via the following command.

    The commands above will only work if:

    • A conda environment with the specified name (conda_environment_name in the example) exists

    • The appropriate anaconda module has been loaded (if you are unsure about this one, consult this documentation)

    If you need to activate a conda environment in a bash script, you need to source the conda.sh as shown in the following example bash script:

    module load miniconda3/23.11.0s

    source /oscar/runtime/software/external/miniconda3/23.11.0/etc/profile.d/conda.sh

    conda activate my_env

    module load miniforge/23.11.0-0s

    source /oscar/runtime/software/external/miniforge/23.11.0-0/etc/profile.d/conda.sh

    conda activate my_env

    After installing packages in an active environment (instructions below), you do not need to load or install those packages in the bash script; any packages installed in the conda environment (before the script even starts) will be available through the environment after it is activated (line 4 in the code above).

    Do NOT activate a conda environment before submitting a batch job if the batch job activates a conda environment. Otherwise, the batch job will not be able to activate the conda environment and hence fail.

    To deactivate a conda environment, simply use the following command:

    Install Packages in an Active Conda Environment

    To install a package, we need to first activate a conda environment, and then run

    conda install package_name=version

    mamba install package_name=version

    The "=version" is optional. By default, conda install a package from the anaconda channel. To install a package from a different channel, run conda install with the -c option. For example, to install a package from the bioconda channel, run

    conda install -c bioconda package_name

    mamba install -c bioconda package_name

    Delete a Conda Environment

    To delete a conda environment, run

    Remove Caches

    Conda may download lots of additional packages when installing a package. A user may use up all quota due to these downloaded packages. To remove the downloaded packges, run

    Mamba User Guide
    #!/bin/bash
    #SBATCH -n 30
    #SBATCH --mem=32G
    #SBATCH -t 90:00:00
    
    echo "Is this too much to ask? (Hint: What is the GrpTRESRunMins limit for batch?)"
    $ sbatch too_many_cpu_minutes.sh
    Submitted batch job 12345678
    $ myq
    Jobs for user thegrouch
    
    Running:
    (none)
    
    Pending:
    ID        NAME                     PART.  QOS     CPU  WALLTIME    EST.START  REASON
    15726799  too_many_cpu_minutes.sh  batch  normal  30   3-18:00:00  N/A        (AssocGrpCPURunMinutesLimit)
    [ccvdemo1@login010 ~]$ myaccount
    My QoS                    Total Resources in this QoS              Max Resources Per User                   Max Jobs Submit Per User
    ------------------------- ------------------------------           ------------------------------           -----------         
    debug                                                                                                       1200                
    gpu-debug                                                          cpu=8,gres/gpu=4,mem=96G                 1200                
    gpu                                                                node=1                                   1200                
    normal                                                             cpu=32,mem=246G                          1000                
    norm-bigmem                                                        cpu=32,gres/gpu=0,mem=770100M,node=2     1200                
    norm-gpu                                                           cpu=12,gres/gpu=2,mem=192G               1200                
    vnc                                                                                                         1                   
    cd ~
    python -m venv my_cool_science
    source ~/my_cool_science/bin/activate
    pip install <your package>
    deactivate
    source ~/old_env/bin/activate
    pip freeze > ~/old_env_req.txt
    cd ~
    python -m venv new_env
    source ~/new_env/bin/activate
    pip install -r ~/old_env_req.txt
    deactivate
    pip install --user <package>
    ~/.local/lib/python<version>/site-packages
        IOError: [Errno 13] Permission denied: '/gpfs/runtime/opt/python/2.7.3/lib/python2.7/site-packages/ordereddict.py'
     pip install --target=</path/to/install/location> <package>
    export PYTHONPATH=</path/to/install/location>:$PYTHONPATH
    python setup.py install --prefix=</path/to/install/location>
    export PATH=</path/to/install/location>/bin:$PATH
    export PYTHONPATH=</path/to/install/location>/lib/python<version>/site-packages:$PYTHONPATH
    $ interact -q gpu -g 1
    interact -q gpu -f quadrortx
    $ sbatch -p gpu --gres=gpu:1 <jobscript>
    #SBATCH -p gpu --gres=gpu:1
    $ allq gpu
    ~/batch_scripts/cuda.sh
    interact -q gpu -f v100
    #SBATCH --constraint=v100
    $ conda info 
    
         active environment : None
                shell level : 0
           user config file : /users/yliu385/.condarc
     populated config files : /users/yliu385/.condarc
              conda version : 23.1.0
        conda-build version : not installed
             python version : 3.10.9.final.0
           virtual packages : __archspec=1=x86_64
                              __glibc=2.34=0
                              __linux=5.14.0=0
                              __unix=0=0
           base environment : /oscar/runtime/software/external/miniconda3/23.11.0  (writable)
          conda av data dir : /oscar/runtime/software/external/miniconda3/23.11.0/etc/conda
      conda av metadata url : None
               channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                              https://repo.anaconda.com/pkgs/main/noarch
                              https://repo.anaconda.com/pkgs/r/linux-64
                              https://repo.anaconda.com/pkgs/r/noarch
              package cache : /oscar/runtime/software/external/miniconda3/23.11.0/pkgs
                              /users/yliu385/.conda/pkgs
           envs directories : /oscar/runtime/software/external/miniconda3/23.11.0/envs
                              /users/yliu385/.conda/envs
                   platform : linux-64
                 user-agent : conda/23.1.0 requests/2.28.1 CPython/3.10.9 Linux/5.14.0-284.11.1.el9_2.x86_64 rhel/9.2 glibc/2.34
                    UID:GID : 140348764:2128288
                 netrc file : None
               offline mode : False
    
    conda create -n conda_environment_name
    conda create -p  /path/to/install/conda_environment_name
    conda activate conda_environment_name
    conda deactivate
    conda env remove -n conda_environment_name
    conda clean --all
    A user's Windows machine is required to have
    installed on the Windows machine to use SMB.

    Users should ensure that the date and time are set correctly on their machine. Now you are ready to mount your CCV directories locally. Instructions for each of the various operating systems are given below.

    Since the Jun'23 maintenance, you do not need to put your username in the Server address. Please update your server address if you see issues connecting to Oscar.

    macOS

    1. In the Finder, press "Command + K" or select "Connect to Server..."

      from the "Go" menu.

    2. For "Server Address", enter smb://smb.ccv.brown.edu/<volume>/

      and click "Connect".

      • To access your Home directory, entersmb://smb.ccv.brown.edu/home/

      • To access your Scratch space, entersmb://smb.ccv.brown.edu/scratch/

      • To access your Data directory, entersmb://smb.ccv.brown.edu/data/<pi_group>/

        • To check your PI group run 'groups' command.

    3. Enter your AD username and password. If you have trouble connecting, enter <username>@ad.brown.edu as your Username

    4. You may choose to add your login credentials to your keychain so you will not need to enter this again.

    Optional. If you would like to automatically connect to the share at startup:

    1. Open "System Preferences" (leave the Finder window open).

    2. Go to "Accounts" > "(your account name)".

    3. Select "Login Items".

    4. Drag your data share from the "Finder" window to the "Login Items" window.

    Linux

    1. Install the cifs-utils package:

    2. Make a directory to mount the share into:

    3. Create a credentials file and add your AD account information:

    4. Allow only root access to the credentials files:

    5. Add an entry to thefstab:

    6. Thefstabentry should be following:

    7. Replace<localUser>to the login used on your Linux workstation, and replace <user> and <pi_group> with your Oscar username and PI group, respectively.

    8. Mount the share:

    Windows

    1. Right-click "Computer" and select "Map Network Drive"

    2. Select an unassigned drive letter

    3. To mount specific volumes:

    • For Home directory, enter\\smb.ccv.brown.edu\home\

    • For Scratch space, enter\\smb.ccv.brown.edu\scratch\

    • For Data directory, enter\\smb.ccv.brown.edu\data\<pi_group>\

      • To check your<pi_group>run 'groups' command.

    1. Check "Connect using different credentials"

    2. Click "Finish"

    3. Enter your AD user name. If your computer is not in Active Directory (AD), you should enter your username in the format ad\username

    4. Enter your AD password and click "OK"

    You can now access your home directory through Windows Explorer with the assigned drive letter. Your data and scratch directories are available as the subdirectories (~/data and ~/scratch) of your home directory.

    Brown VPN client
    Crowdstrike Home
    transfer files to/from oscar

    Arbiter2

    Arbiter2 is a cgroups-based mechanism that is designed to prevent the misuse of login nodes and VSCode node, which are scarce, shared resources. It is installed on shared nodes listed below:

    • login009

    • login010

    • vscode1

    Status and Limits

    Arbiter2 applies different limits to a user's processes depending on the user's status: normal, penalty1, and penalty2.

    Arbiter2 limits apply only to the shared nodes, not compute nodes.

    Normal Status and Limits

    Upon first log in, the user is in the normal status. These normal limits apply to all the user's processes on the node:

    1/3 of the total CPU time. For example, a user's processes can use up to 1/3 of the total CPU time of the 24 cores on a login node.

    40GB

    Penalty1 Status and Limits

    When a user's processes consume CPU time more than the default CPU time limit for a period of time, the user's status is changed to the penalty1 status. These penalty1 limits are applied:

    80% of the normal limit.

    0.8 * 40GB = 32GB (80% of the normal limit)

    While a user is in penalty1 status, their processes are throttled if they consume more CPU time than penalty1 limit. However, if a user's processes exceed penalty1 memory limit, the processes (PIDs) will be terminated by cgroups.

    The user's status returns to the normal status after a user's processes consume CPU time less than the penalty1 limit for 30 minutes.

    Penalty restrictions are enforced independently for each shared node, and the penalty status does not carry over between these nodes.

    Penalty2 Status and Limits

    When a user's processes consume more CPU time than the penalty1 limit for a period of time, the user is put in the penalty2 status, and the penalty2 limits apply to the user's processes.

    50% of the normal limit

    20GB (50% of the normal limit)

    In penalty2 status, the user's processes will be throttled if they consume more CPU time than penalty2 limit. However, if a user's processes exceed penalty2 memory limit, the processes (PIDs) will be terminated by cgroups.

    The user's status returns to the normal status after a user's processes consume CPU time less than the penalty2 limit for one hour.

    Penalty3 Status and Limits

    When a user's processes consume more CPU time than the penalty2 limit for a period of time, the user is put in the penalty3 status. These penalty3 limits apply to the user's processes.

    30% of the normal limit

    12GB (30% of the normal limit)

    In penalty3 status, the user's processes will be throttled if they consume more CPU time than penalty3 limit. If a user's processes exceed penalty3 memory limit, the processes (PIDs) will be terminated by cgroups

    The user's status returns to the normal status after a user's processes consume CPU time less than the penalty3 limit for two hours.

    Email Notification

    A user receives an email notification upon each violation. Below is a example email:

    Violation of usage policy

    A violation of the usage policy by ccvdemo (CCV Demo,,,,ccvdemo) on login006 was automatically detected starting at 08:53 on 04/25.

    This may indicate that you are running computationally-intensive work on the interactive/login node (when it should be run on compute nodes instead). Please utilize the 'interact' command to initiate a SLURM session on a compute node and run your workloads there.

    You now have the status penalty1 because your usage has exceeded the thresholds for appropriate usage on the node. Your CPU usage is now limited to 80% of your original limit (8.0 cores) for the next 30 minutes. In addition, your memory limit is 80% of your original limit (40.0 GB) for the same period of time.

    These limits will apply on login006.

    High-impact processes

    Usage values are recent averages. Instantaneous usage metrics may differ. The processes listed are probable suspects, but there may be some variation in the processes responsible for your impact on the node. Memory usage is expressed in GB and CPU usage is relative to one core (and may exceed 100% as a result).

    Process
    Average core usage (%)
    Average memory usage (GB)

    Recent system usage

    *This process is generally permitted on interactive nodes and is only counted against you when considering memory usage (regardless of the process, too much memory usage is still considered bad; it cannot be throttled like CPU). The process is included in this report to show usage holistically.

    **This accounts for the difference between the overall usage and the collected PID usage (which can be less accurate). This may be large if there are a lot of short-lived processes (such as compilers or quick commands) that account for a significant fraction of the total usage. These processes are whitelisted as defined above.

    Required User Actions

    When a user receives an alert email that the user is put in a penalty status, the user should

    • kill the processes that use too much resources on the shared node listed in the alert email, and/or reduce the resources used by the processes

    • submit an, a , or an to run computational intensive programs including but not limited to Python, R and Matlab

    • consider attending to learn more about correctly using Oscar.

    CCV reserves the right to suspend a user's access to Oscar, if the user repeatedly violates the limits, and the user is not able to work with CCV to find a solution.

    Exempt Processes

    Essential Linux utilities, such as rsync, cp, scp, SLURM commands, creating Singularity images, and code compilation, are exempt. To obtain a comprehensive list, please get in touch with us

    The CPU resources used by exempt programs are not count against the CPU limits. However, the memory resources used by exempt program still counted against the memory limits.

    Desktop App (VNC)

    The Desktop app on Open OnDemand is a replacement for the older VNC Java client. This app allows you to launch a Desktop GUI on Oscar.

    Advanced users looking for more resources can try the Desktop (Advanced) app.

    Do not load any anaconda module in your .modules or .bashrc file. These modules prevent Desktop sessions from starting correctly. You may load them inside the Desktop session.

    Launching Desktop App (VNC)

    0. Launch Open OnDemand

    Click to launch Open OnDemand (OOD) and log in with you Brown Credentials.

    1. Select the Desktop option in Interactive Apps dropdown list:

    2. Choose the resource option:

    3. Wait and Launch!

    You may change the Image Quality if your internet connection is bad. Image quality can be changed in the middle of the session.

    Reconnecting to session

    A session may get disconnected if it is not active for a while:

    If the session disconnects as shown above, please don't click the "Connect" button on the screen. You may go to Open OnDemand page and click “My Interactive Sessions” to find the session again:

    Please don’t launch a new session if you have an existing session. You cannot launch two desktop sessions at the same time.

    Sometimes, the “My interactive Sessions” button is shortened to look like:

    Copying and pasting text

    If you are using Google Chrome, switch on the "Clipboard" permission and you can directly copy and paste text into the OOD Desktop from any other program.

    1. Click the Lock icon to the left of the URL

    2. Switch on the "Clipboard" permission

    Click the side panel button on the extreme left hand side of the screen.

    Desktop (Advanced)

    If you need more or different resources than those available from the default Desktop session, you should use the Advanced Desktop app. Resources requested here count against the resources allowed for your Oscar account.

    1. Select the Desktop (Advanced) app under Interactive Apps.

    2. Choose required resources

    Fill out the form with your required resources.

    • Account: Enter your condo account name. If you are not a member of a condo, leave this blank

    • Desktop Environment: Choose XFCE. KDE works for CPU jobs, but may not be able to use GPU acceleration correctly.

    • Number of hours: Choose appropriately. Your Desktop session will end abruptly after this time has lapsed. Requesting a very long session will result in a lower job priority.

    3. Wait and Launch!

    Wait and launch this session like the regular Desktop session.

    Modify the Terminal App

    Inside the Desktop session , click on Applications in the top left

    Applications -> Settings -> Default Applications

    In the new Window, click on the "Utilities" tab and choose "Gnome Terminal" in the drop down menu under "Terminal Emulator"

    Then click on "Applications -> Terminal Emulator" to launch the terminal:

    If the steps mentioned above do not work:

    1. Close the Desktop session

    2. Inside a terminal (outside the Desktop session), run this command:

    rm -r ~/.ood_config

    1. Start a new desktop session .

    Change the Terminal icon for launcher panel

    Please drag and drop the "Terminal Emulator" icon from the "Applications" menu to the launcher panel at the bottom of the screen, it will be inserted to the launcher panel:

    Then click on "Create Launcher":

    You may remove the old terminal icon after adding the new icon:

    Batch Jobs

    Submitting jobs using batch scripts

    If you'd prefer to see the following instructions in a video, we have a tutorial on batch job submission on Oscar.

    To run a batch job on Oscar, you first have to write a script that describes what resources you need and how your program will run. Some example batch scripts are available in your home directory on Oscar, in the directory:

    A batch script starts by specifying the bash shell as its interpreter with the line:

    By default, a batch job will reserve 1 core and 2.8GB of memory per core for your job. You can customize the amount of resources allocated for your job by explicitly requesting them in your batch script with a series of lines starting with #SBATCH, e.g.,

    The above lines request 4 cores (-n), 16GB of memory per node (--mem), and one hour of runtime (-t). After you have described the resources you want allocated for the job, you then give the commands that you want to be executed.

    All of the #SBATCH instructions in your batch script must appear before the commands you want to run.

    Once you have your batch script, you can submit a batch job to the queue using the sbatch command:

    Submitting jobs from the command line

    As an alternative to requesting resources within your batch script, it is possible to define the resources requested as command-line options to sbatch. For example, the command below requests 4 cores (-n), 16GB of memory per node (--mem), and one hour of runtime (-t) to run the job defined in the batch script.

    Note that command-line options passed to sbatch will override the resources specified in the script, so this is a handy way to reuse an existing batch script when you just want to change a few of the resource values.

    Output from batch jobs

    The sbatch command will return a number, which is your Job ID. You can view the output of your job in the file slurm-<jobid>.out in the directory where you invoked the sbatch command. For instance, you can view the last 10 lines of output with:

    Alternatively, you can mention the file names where you want to dump the standard output and errors using the -o and -e flags. You can use %j within the output/error filenames to add the id of the job. If you would like to change your output file to be MyOutput-<job-id>, you can add the following line to your batch job:

    sbatch command options

    A full description of all of the options forsbatch can be found or by using the following command on Oscar:

    The table below summarizes some of the more useful options forsbatch .

    Passing environment variables to a batch job

    When a user logs into Oscar, there are pre-set environment variables such as HOME, which are the user's login environment variables. A user may modify an existing enviornmet variable, or add a new environment variable. So when a user submits a slurm batch job, the user's current environment variables may differ from the user's login environment. By default, a user's current environment variables, instead of the user's login environment variables, are accessible to the user's batch jobs on Oscar.

    To modify or add an environment variable, run the following command:

    • run the following command in your shell

    • or have the following line in your batch script

    After the step above to modify or add an environment variable, your batch job can access the environment variable my_variable whose value is my_value.

    To export more than one environment variables, just list all the name=value pairs separated by commas:

    Here is an example that a batch script loops over an input file and submits a job for each directory in the input file, where a directory is passed to a batch job for processing.

    The input file test.txt has multiple lines where each line is a directory:

    The loop.sh script reads each line (directory) from the input file and passes the directory as an environment variable to a batch job:

    The test.job is a job script, which runs the test.sh to process the directory passed as an environment variable:

    The test.sh is a bash script which simply echoes the directory:

    If you run ./loop.sh, then three jobs are submitted. Each job generates an output like the following:

    Using variables to set slurm job name, output filename, and error filename

    Variables can be passed at the sbatch command line to set the job name, output and error file names, as shown in the following example:

    MPI Jobs

    Resources from the web on getting started with MPI:

    • https://computing.llnl.gov/tutorials/mpi

    • http://mpitutorial.com

    • http://www.math-cs.gordon.edu/courses/cps343/presentations/Intro_to_MPI.pdf

    MPI is a standard that dictates the semantics and features of "message passing". There are different implementations of MPI. Those installed on Oscar are

    • hpcx-mpi

    • OpenMPI

    We recommend using hpcx-mpi as it is integrated with the SLURM scheduler and optimized for the Infiniband network.

    MPI modules on Oscar

    Oscar uses a Hierarchical module system where users need to load the required MPI module before they can load any other module that depends upon that particular MPI module. You can read more about this module system .

    Currently, the two available mpi implementations on Oscar are hpcx-mpi and openmpi. You can check the available versions by running these commands

    hpcx-mpi/4.1.5rc2s-yflad4v is the recommend version of MPI on Oscar. It can be loaded by running

    srun instead of mpirun

    Use srun --mpi=pmix to run MPI programs. All MPI implementations are built with SLURM support. Hence, the programs need to be run using SLURM's srun command.

    The --mpi=pmix flag is also required to match the configuration with which MPI is installed on Oscar.

    Running MPI programs - Interactive

    To run an MPI program interactively, first create an allocation from the login nodes using the salloc command:

    For example, to request 4 cores to run 4 tasks (MPI processes):

    Once the allocation is fulfilled, you can run MPI programs with the srun command:

    When you are finished running MPI commands, you can release the allocation by exiting the shell:

    Also, if you only need to run a single MPI program, you can skip the salloc command and specify the resources in a single sruncommand:

    This will create the allocation, run the MPI program, and release the allocation.

    Note: It is not possible to run MPI programs on compute nodes by using the interact command.

    salloc documentation:

    srun documentation:

    Running MPI programs - Batch Jobs

    Here is a sample batch script to run an MPI program:

    Hybrid MPI+OpenMP

    If your program has multi-threading capability using OpenMP, you can have several cores attached with a single MPI task using the --cpus-per-task or -c option with sbatch or salloc. The environment variable OMP_NUM_THREADS governs the number of threads that will be used.

    The above batch script will launch 4 MPI tasks - 2 on each node - and allocate 4 CPUs for each task (total 16 cores for the job). Setting OMP_NUM_THREADS governs the number of threads to be used, although this can also be set in the program.

    Performance Scaling

    The maximum theoretical speedup that can be achieved by a parallel program is governed by the proportion of sequential part in the program (Amdahl's law). Moreover, as the number of MPI processes increases, the communication overhead increases i.e. the amount of time spent in sending and receiving messages among the processes increases. For more than a certain number of processes, this increase starts dominating over the decrease in computational run time. This results in the overall program slowing down instead of speeding up as number of processes are increased.

    Hence, MPI programs (or any parallel program) do not run faster as the number of processes are increased beyond a certain point.

    If you intend to carry out a lot of runs for a program, the correct approach would be to find out the optimum number of processes which will result in the least run time or a reasonably less run time. Start with a small number of processes like 2 or 4 and first verify the correctness of the results by comparing them with the sequential runs. Then increase the number of processes gradually to find the optimum number beyond which the run time flattens out or starts increasing.

    Maximum Number of Nodes for MPI Programs

    An MPI program is allowed to run on at most 32 nodes. When a user requests more than 32 nodes for an MPI program/job, the user will receive the following error:

    Batch job submission failed: Requested node configuration is not available

    Remote IDE (VS Code)

    You can access Oscar's file-system remotely from Visual Studio Code (VS Code). Note that access of Oscar from VS Code is still considered experimental, and as such, 24x7 support is not available.

    VS Code one-time setup

    To use VS Code you must be on a Brown compliant network or connected to the VPN. Please install the Brown VPN client before proceeding.

    September 10, 2023: Some users have reported issues while connecting to the Oscar VS Code remote extension. This is due to a recent change introduced by VS Code. To address this issue

    Ctrl (cmd on Mac) + Shift + P > Remote-SSH: Settings

    Disable the Remote.SSH: Use Exec Server option

    To use VS Code you will need to be connected to the VPN. Please install the before proceeding.

    Step 1: Install VSCode Extension

    Install the for VS Code:

    Step 2: Uncheck symlink box

    Open VS Code settings and uncheck symlink:

    Code > Preferences > Settings

    File > Preferences > Settings

    Search for symlink and make sure the symlink searching is unchecked

    Step 3: Setup Passwordless SSH

    Make sure you have set up passwordless SSH authentication to Oscar. If you haven't, please refer to this .

    If you have Windows Subsystem for Linux (WSL) installed in your computer, you need to follow the instructions for Windows (PowerShell).

    Step 4: Edit the SSH config file

    Edit the config file:

    The config file is located at:

    ~/.ssh/config

    The config file is located at:

    If you have Windows Subsystem for Linux (WSL) installed in your computer, you need to follow the instructions for Windows (PowerShell).

    Edit the config file on your local machine, add the following lines. Replace <username> with your Oscar username.

    Step 5: Fixes

    September 10, 2023: Some users have reported issues while connecting to the Oscar VSCode remote extension. This is due to a recent change introduced by VSCode. To address this issue

    Step 6: Connect for the first time

    In VS Code, select Remote-SSH: Connect to Host… and after the list populates select ccv-vscode-node

    Step 7: Initial Setup

    Install and set up of VS Code

    After a moment, VS Code will connect to the SSH server and set itself up.

    After a moment, VS Code will connect to the SSH server and set itself up. You might see the Firewall prompt, please click allow.

    Step 8: Configure VS Code

    Important: Please run the following to add a settings.json file to your config. This is because the filewatcher and file searcher (rg) indexes all the files you have access to in your workspace. If you have a large dataset (e.g. machine learning) this can take a lot of resources on the vscode node.

    Connect to VS Code first.

    You can either create a symlink via the ln command below,

    or manually create /users/$USER/.vscode-server/data/Machine/settings.json file with following contents

    Reconnect to VS Code

    1. Click the green icon "Open a Remote Window" in the bottom left corner of VS Code Window. Then click "Connect to Host" in the drop down list.

    2. Select the ccv-vscode-node option to connect to Oscar.

    Getting Started

    This guide assumes you have an Oscar account. To request an account see .

    If you're confused about any acronyms or terms throughout the guide, check out our page to see definitions of commonly used terms

    Transferring Files to and from Oscar

    There are several ways to move files between your machine and Oscar. Which method you choose will depend on how much data you need to move and your personal preference for each method.

    1. (scp)

    Jupyter Labs on Oscar

    Installing Jupyter Lab

    The anaconda modules provide jupyter-lab. Users can also use pip or anaconda to .

    Running Jupyter Lab on Oscar

    CentOS/RHEL:   $ sudo yum install cifs-utils
    Ubuntu:        $ sudo apt-get install cifs-utils
    $ sudo mkdir -p /mnt/rhome /mnt/rscratch /mnt/rdata
    $ sudo gedit /etc/cifspw
    
    username=user
    password=password
    ~/batch_scripts

    1.90

    0.30

    python3.10 (1)

    0.56

    0.02

    sshd* (2-4)

    0.01

    0.01

    bash (1-4)

    0.00

    0.01

    python (1)

    0.00

    0.01

    SeekDeep (21)

    800.09

    0.24

    mamba-package (1)

    90.58

    0.01

    other processes** (1)

    3.48

    0.00

    @brown.edu
    interactive job
    batch job
    interactive Open OnDemand app
    CCV workshops or tutorials

    mamba (1)

    Requested memory per node

    -p

    Request a specific partition

    -o

    Filename for standard output from the job

    -e

    Filename for standard error from the job

    -C

    Add a feature constraint (a tag that describes a type of node). Note: you can view the available features on Oscar with the nodescommand or sinfo -o "%20N %10c %10m %25f %10G "

    You can also select multiple feature constraints using '|', i.e. #SBATCH -C quadrortx|intel

    --mail-type=

    Specify the events that you should be notified of by email: BEGIN, END, FAIL, REQUEUE, and ALL

    --mail-user=

    Email ID where you should be notified

    option

    purpose

    -J

    Specify the job name that will be displayed when when listing the job

    -n

    Number of tasks (= number of cores, if "--cpus-per-task" or "-c" option is not mentioned)

    -c

    Number of CPUs or cores per task (on the same node)

    -N

    Number of nodes

    -t

    Runtime, as HH:MM:SS

    online

    --mem=

    here
    https://slurm.schedmd.com/salloc.html
    https://slurm.schedmd.com/srun.html
    There are a couple of ways to use Jupyter Lab on Oscar. You can run a Jupyter Lab
    • in an OOD Desktop App (VNC)

    • using a batch job

    • in an interactive session

    With the batch job or interactive session method, you use a browser on your machine to connect to your Jupyter Lab server on Oscar.

    Do not run Jupyter Lab on login nodes.

    In a OOD Desktop App VNC Session

    Start an OOD Desktop App (VNC) session, and open up a terminal in the VNC session. To start a Jupyter Lab, enter

    This will start the Jupyter lab server and open up a browser with the lab.

    If you installed Jupyter Lab with pip, you may need to give the full path:

    ~/.local/bin/jupyter-lab

    Using a Batch Job

    1. Submit an ssh tunnel to the server.

    2. Set up an ssh tunnel to the server.

    3. Open a browser to view the lab.

    4. Use scancel to end the batch job when you are done.

    1. Submit batch script

    Here is an example batch script to start a Jupyter Lab server on an Oscar compute node

    If you installed Jupyter Lab with pip, you may need to give the full path:

    ~/.local/bin/jupyter-lab --no-browser --port=$ipnport --ip=$ipnip

    This script can be found in ~/batch_scripts. Copy this example and submit this script with

    sbatch jupyter.sh

    Once your batch job is running there will be a file named jupyter-log-{jobid}.txtcontaining the information you need to connect to your Jupyter lab server on Oscar. To check if your job is running, use myq.

    The output from myq will look something like this:

    2. Setup an ssh tunnel to the notebook server

    In this example the jobID is 7239096. To view the lab server information, use cat. For this example:

    cat jupyter-log-7239096.txt

    Open a terminal on your machine and copy and paste the ssh -N -L ........ line into the terminal.

    If you are using Windows, follow the Tunneling into Jupyter with Windows documentation to complete this step.

    Enter your Oscar password. Note it will appear that nothing has happened.

    3. Open a browser to view the lab

    Open a browser on your local machine to the address given in cat jupyter-log-{jobid}.txt.

    The lab will ask for a token. Copy the token from jupyter-log-{jobid}.txt. Then your lab will start.

    Remember to scancel {jobid} when you are done with your notebook session

    In an Interactive Session

    1. Start Jupyter Lab in an interactive job

    2. Setup an ssh tunnel to the server.

    3. Open a browser to view the notebook.

    4. Use scancel to end the batch job when you are done.

    1.Start Jupyter Lab in in interactive job

    Start an Interactive job and then in your interactive session enter the following:

    An output similar to the one below indicates that Jupyter Lab has started:

    $ jupyter-lab --no-browser --port=$ipnport --ip=$ipnip

    [I 13:12:03.404 LabApp] JupyterLab beta preview extension loaded from /gpfs/runtime/opt/anaconda/3-5.2.0/lib/python3.6/site-packages/jupyterlab

    [I 13:12:03.404 LabApp] JupyterLab application directory is /gpfs/runtime/opt/anaconda/3-5.2.0/share/jupyter/lab

    [I 13:12:03.410 LabApp] Serving notebooks from local directory: /gpfs_home/yliu385

    [I 13:12:03.410 LabApp] 0 active kernels

    [I 13:12:03.410 LabApp] The Jupyter Notebook is running at:

    [I 13:12:03.410 LabApp] http://172.20.209.7:9414/?token=dd9936098d03b8195fc626f017c97ca56a054887d134cb1e

    [I 13:12:03.410 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

    [C 13:12:03.411 LabApp]

    2. Setup an ssh tunnel to the server

    Open a terminal on your machine and enter the following line (replace $ipnip and $ipnport with the values from the two echo commands in the previous step).

    If you are using Windows, follow the Tunneling into Jupyter with Windows documentation to complete this step.

    Enter your Oscar password. Note it will appear that nothing has happened.

    3. Open a browser to view the notebook

    Open a browser on your local machine to the address:

    Again, you need to replace $ipnport with the value from the first echo command in Step 1. The notebook will ask for a token. You can copy the token from the output from Step 2.

    4. Press Ctrl+C twice to kill your Jupyter Lab server

    Once you finish and no longer need the Jupyter Lab server, you can kill the server by pressing Ctrl+C twice in your interactive session.

    install jupyter lab
    $ sudo chmod 0600 /etc/cifspw
    $ sudo gedit /etc/fstab
    # Home
    //smb.ccv.brown.edu/home/ /mnt/rhome cifs credentials=/etc/cifspw,nounix,uid=<localuser>,domain=ad.brown.edu 0 0
    
    # Scratch 
    //smb.ccv.brown.edu/scratch/ /mnt/rscratch cifs credentials=/etc/cifspw,nounix,uid=<localuser>,domain=ad.brown.edu 0 0
    
    # Data
    //smb.ccv.brown.edu/data/<pi_group>/ /mnt/rdata cifs credentials=/etc/cifspw,nounix,uid=<localUser>,domain=ad.brown.edu 0 0
    $ mount -a
    #!/bin/bash
    #SBATCH -n 4
    #SBATCH --mem=16G
    #SBATCH -t 1:00:00
    sbatch <jobscript>
    sbatch -n 4 -t 1:00:00 --mem=16G <jobscript>
    tail -10 slurm-<jobid>.out
    #SBATCH -o my-output-%j.out
    $ man sbatch
    export my_variable=my_value
    #SBATCH --export=my_variable=my_value
    #SBATCH --export=my_variable1=my_value1,my_variable2=my_value2,my_variable3=my_value3
    /users/yliu385/data/yliu385/Test/
    /users/yliu385/data/yliu385/Test/pip
    /users/yliu385/data/yliu385
    #!/bin/bash
    
    if [ "$#" -ne 1 ] || ! [ -f "$1" ]; then
        echo "Usage: $0 FILE"
        exit 1
    fi
    
    while IFS= read -r line; do
       sbatch --export=directory=$line test.job 
    done < $1
    #!/bin/sh
    
    #SBATCH -N 1
    #SBATCH -n 1
    
    ./test.sh $directory
    #!/bin/bash
    
    echo "$0 argument: $1"
    /users/yliu385/data/yliu385/Test/
    
    ./test.sh argument: /users/yliu385/data/yliu385/Test/
    t=`date +"%Y-%m-%d"`
    sbatch --job-name=test.$t --output=test.out.$t --error=test.err.$t test.job
    $ module avail hpcx-mpi
    
    ------------------------ /oscar/runtime/software/spack/0.20.1/share/spack/lmod/linux-rhel9-x86_64/Core -------------------------
       hpcx-mpi/4.1.5rc2s-yflad4v
       
    $ module avail openmpi
    
    ------------------------ /oscar/runtime/software/spack/0.20.1/share/spack/lmod/linux-rhel9-x86_64/Core -------------------------
       openmpi/4.1.2-s5wtoqb    openmpi/4.1.5-hkgv3gi    openmpi/4.1.5-kzuexje (D)
                 
    module load hpcx-mpi
    $ salloc -N <# nodes> -n <# MPI tasks> -p <partition> -t <minutes>
    $ salloc -n 4 
    $ srun --mpi=pmix ./my-mpi-program ...
    $ exit
    $ srun -N <# nodes> -n <# MPI tasks> -p <partition> -t <minutes> --mpi=pmix ./my-mpi-program
    #!/bin/bash
    
    # Request an hour of runtime:
    #SBATCH --time=1:00:00
    
    # Use 2 nodes with 8 tasks each, for 16 MPI tasks:
    #SBATCH --nodes=2
    #SBATCH --tasks-per-node=8
    
    # Specify a job name:
    #SBATCH -J MyMPIJob
    
    # Specify an output file
    #SBATCH -o MyMPIJob-%j.out
    #SBATCH -e MyMPIJob-%j.err
    
    # Load required modules
    module load hpcx-mpi/4.1.5rc2s
    
    srun --mpi=pmix MyMPIProgram
    #!/bin/bash
    
    # Use 2 nodes with 2 tasks each (4 MPI tasks)
    # And allocate 4 CPUs to each task for multi-threading
    #SBATCH --nodes=2
    #SBATCH --tasks-per-node=2
    #SBATCH --cpus-per-task=4
    
    # Load required modules
    module load hpcx-mpi/4.1.5rc2s
    
    export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
    srun --mpi=pmix ./MyMPIProgram
    jupyter-lab
    #!/bin/bash
    #SBATCH --nodes 1
    #SBATCH -c 6
    #SBATCH --time 04:00:00
    #SBATCH --mem-per-cpu 3G
    #SBATCH --job-name tunnel
    #SBATCH --output jupyter-log-%J.txt
    ## get tunneling info
    XDG_RUNTIME_DIR=""
    ipnport=$(shuf -i8000-9999 -n1)
    ipnip=$(hostname -i)
    ## print tunneling instructions to jupyter-log-{jobid}.txt
    echo -e "
        Copy/Paste this in your local terminal to ssh tunnel with remote
        -----------------------------------------------------------------
        ssh -N -L $ipnport:$ipnip:$ipnport [email protected]
        -----------------------------------------------------------------
        Then open a browser on your local machine to the following address
        ------------------------------------------------------------------
        localhost:$ipnport  (prefix w/ https:// if using password)
        ------------------------------------------------------------------
        "
    ## start an ipcluster instance and launch jupyter server
    module load anaconda/3-5.2.0
    jupyter-lab --no-browser --port=$ipnport --ip=$ipnip
    Jobs for user mhamilton
    
    Running:
    ID       NAME    PART.  QOS          CPU  WALLTIME  REMAIN   NODES
    7239096  tunnel  batch  pri-mhamilt  6    4:00:00   3:57:33  node1036
    
    Pending:
    (none)
     ssh -N -L $ipnport:$ipnip:$ipnport [email protected]
    localhost:9349  (prefix w/ https:// if using password)
    unset XDG_RUNTIME_DIR
    module load anaconda/3-5.2.0
    ipnport=$(shuf -i8000-9999 -n1)
    echo $ipnport
    ipnip=$(hostname -i)
    echo $ipnip
    jupyter-lab --no-browser --port=$ipnport --ip=$ipnip
     ssh -N -L $ipnport:$ipnip:$ipnport [email protected]
    localhost:$ipnport  (prefix w/ http:// if using password)
  • To copy text into the Desktop session, paste the data into the Clipboard. It will be available to paste inside the Desktop session.

  • To copy text from the Desktop session, open the Clipboard. The copied text will be displayed inside it. You can select and copy the text inside the Clipboard and paste it to an external program.

  • Partition: Equivalent to #SBATCH -p option. The desktop session will run on this partition.

  • Num Cores: Equivalent to the #SBATCH -n option.

  • Num GPUs: Equivalent to the #SBATCH --gres=gpu: option. This field is ignored if the partition does not have any GPU nodes, e.g. batch

  • Memory (GB): Equivalent to the #SBATCH --mem= option.

  • Reservation: Equivalent to the #SBATCH --reservation= option. Leave blank if you are not using a reservation.

  • here
    OSCAR

    Oscar is the shared compute cluster operated by CCV.

    Oscar runs the Linux Red Hat 9 operating system. General Linux documentation is available from The Linux Documentation Project. We recommend you read up on basic Linux commands before using Oscar. Some of the most common commands you'll be using in Oscar can also be found on our Quick Reference page.

    If you'd like a brief introduction to Linux commands, watch our tutorial on Linux basics on Oscar.

    Oscar has two login nodes and several hundred compute nodes. When users log in through Secure Shell (SSH), they are first put on one of the login nodes which are shared among several users at a time. You can use the login nodes to compile your code, manage files, and launch jobs on the compute nodes from your own computer. Running computationally intensive or memory intensive programs on the login node slows down the system for all users. Any processes taking up too much CPU or memory on a login node will be killed. Please do not run Matlab on the login nodes.

    What username and password should I be using?

    • If you are at Brown and have requested a regular CCV account, your Oscar login will be authenticated using your Brown credentials, i.e. the same username and password that you use to log into any Brown service such as "canvas".

    • If you are an external user, you will have to get a sponsored ID at Brown through the department with which you are associated before requesting an account on Oscar. Once you have the sponsored ID at Brown, you can request an account on Oscar and use your Brown username and password to log in.

    Connecting to Oscar for the first time

    To log in to Oscar you need Secure Shell (SSH) on your computer. Mac and Linux machines normally have SSH available. To login in to Oscar, open a terminal and type

    Windows users need to install an SSH client. We recommend PuTTY, a free SSH client for Windows. Once you've installed PuTTY, open the client and use <username>@ssh.ccv.brown.edufor the Host Name and click Open. The configuration should look similar to the screenshot below.

    The first time you connect to Oscar you will see a message like:

    You can type yes . You will be prompted for your password. Note that nothing will show up on the screen when you type in your password; just type it in and press enter. You will now be in your home directory on Oscar. In your terminal you will see a prompt like this:

    Congratulations, you are now on one of the Oscar login nodes.

    Note: Please do not run computations or simulations on the login nodes, because they are shared with other users. You can use the login nodes to compile your code, manage files, and launch jobs on the compute nodes.

    File system

    Users on Oscar have three places to store files:

    • home

    • scratch

    • data

    Note that class accounts may not have a data directory. Users who are members of more than one research group may have access to multiple data directories.

    From the home directory, you can use the command ls to see your scratch directory and your data directory (if you have one) and use cd to navigate into them if needed.

    To see how much space in your directories, use the command checkquota. Below is an example output:

    Files not accessed for 30 days may be deleted from your scratch directory. This is because scratch is high performance space. The fuller scratch is, the worse the read/write performance. Use ~/data for files you need to keep long term.

    A good practice is to configure your application to read any initial input data from ~/data and write all output into ~/scratch. Then, when the application has finished, move or copy data you would like to save from ~/scratch to ~/data. For more information on which directories are backed up and best practices for reading/writing files, see Oscar's Filesystem and Best Practices. You can go over your quota up to the hard limit for a grace period. This grace period is to give you time to manage your files. When the grace period expires you will be unable to write any files until you are back under quota.

    You can also transfer files to and from the Oscar Filesystem from your own computer. See Transferring Files to and from Oscar.

    Software modules

    CCV uses the Lmod package for managing the software environment on OSCAR. To see the software available on Oscar, use the command module avail. You can load any one of these software modules using module load <module>. The command module list shows what modules you have loaded. Below is an example of checking which versions of the module 'workshop' are available and loading a given version.

    For a list of all Lmod commands, see Software Modules. If you have a request for software to be installed on Oscar, email [email protected].

    Using a Desktop on Oscar

    You can connect remotely to a graphical desktop environment on Oscar using CCV's OpenOnDemand. The OOD Desktop integrates with the scheduling system on Oscar to create dedicated, persistent VNC sessions that are tied to a single user.

    Using VNC, you can run graphical user interface (GUI) applications like Matlab, Mathematica, etc. while having access to Oscar's compute power and file system.

    Choose a session that suits your needs

    Running Jobs

    You are on Oscar's login nodes when you log in through SSH. You should not (and would not want to) run your programs on these nodes as these are shared by all active users to perform tasks like managing files and compiling programs.

    With so many active users, a shared cluster has to use a "job scheduler" to assign compute resources to users for running programs. When you submit a job (a set of commands) to the scheduler along with the resources you need, it puts your job in a queue. The job is run when the required resources (cores, memory, etc.) become available. Note that since Oscar is a shared resource, you must be prepared to wait for your job to start running, and it can't be expected to start running straight away.

    Oscar uses the SLURM job scheduler. Batch jobs are the preferred mode of running programs, where all commands are mentioned in a "batch script" along with the required resources (number of cores, wall-time, etc.). However, there is also a way to run programs interactively.

    For information on how to submit jobs on Oscar, see Running Jobs.

    There is also extensive documentation on the web on using SLURM (quick start guide).

    Where to get help

    • Online resources: SLURM, Linux Documentation, Basic Linux Commands, stackoverflow

    • CCV's page detailing common problems you might face on Oscar

    • Email [email protected]

    create an account
    Quick Reference

    Globus online (best for large transfers)

  • LFTP

  • 1. SMB

    You can drag and drop files from your machine to the Oscar filesystem via SMB. This is an easy method for a small number of files. Please refer to this page for mounting filesystem via SMB.

    2. Command line

    Mac and Linux

    SCP

    You can use scp to transfer files. For example to copy a file from your computer to Oscar:

    To copy a file from Oscar to your computer:

    RSYNC

    You can use rsync to sync files across your local computer to Oscar:

    Windows On Windows, if you have PuTTY installed, you can use it's pscp function from the terminal.

    3. GUI programs for transferring files using the sftp protocol and transfer.ccv.brown.edu hostname

    • DUO is required if you are not connected to approved networks, e.g., home network

      • There is no interactive terminal message but your Phone will get a prompt automatically

    • DUO is NOT required if you are connected to approved Brown networks

      • A personal Windows computer must have installed in order to be on approved Brown networks.

    In general, you can specify the following for your GUI programs:

    • Protocol: SFTP

    • Host: transfer.ccv.brown.edu

    • User: your Oscar username

    • Password: your Brown password

    3.1 WinSCP for Windows

    3.1.1 Limit Concurrent Transfer and Change Reconnect Options

    Click the Optionsand then Preferences menu in WinsCP. In the poped up window, click Transfer and then Background to (Figure 1)

    • change Maximal number of transfers at the same time to 1

    • uncheck Use multiple connections for single transfer

    Figure 1 WinSCP Maximal Transfers

    click Endurace to (Figure 2)

    • set Automatically reconnect session to 5 seconds

    • uncheck Automatically reconnect session, if it stalls

    • set Keep reconnection for to 10 seconds

    Figure 2 WinSCP Reconnect

    3.1.2 Add a New Site

    Figure 3 WinSCP Session Creation

    3.2 FileZilla

    3.2.1. Disable Timeout

    Click the Edit menu and then select the Settings submenu, and then change the Timeout in seconds to 0 to disable, as shown in Figure 2

    Figure 4 Disable Timeout

    3.2.2 Add a New Site

    Open the Site Manager as show in Figure 5.

    Figure 5 Open Site Manager

    Click the 'New Site' button to add a new site, as shown in Figure 4:

    Figure 6 New Site

    Limit the number of simultaneous connections to 1, as shown in Figure 5.

    Figure 7 Limit Simultaneous Connections

    Click the 'Connect' button to connect to Oscar and transfer files.

    3.3 Cyberduck

    Figure 8 Cyberduck Connection

    You may see a popup window on 'Unknown Fingerprint'. You just need to check the 'Always' option and click 'Allow'. This is windows should not pop up again unless the transfer server is changed again.

    Figure 9 Unknown Fingerprint

    4. Globus online

    Globus is a secure, reliable research data management service. You can move data directly to Oscar from anothe Globus endpoint. Oscar has one Globus endpoint:

    If you want to use Globus Online to move data to/from you own machine, you can install Globus Connect Personal. For more instructions on how to use Globus, see the Oscar section in the Globus documentation.

    5. LFTP

    LFTP is a sophisticated file transfer program supporting a number of network protocols (ftp, http, sftp, fish, torrent). It has bookmarks, a built-in mirror command, can transfer several files in parallel and was designed with reliability in mind. You can use the LFTP module from Oscar to transfer data from any (S)FTP server you have access to directly to Oscar. Below are the main LFTP commands to get you started:

    SMB
    Command line
    GUI application
    Brown VPN client
    Remote Development extension pack
    documentation page
    Allow Firewall connections

    DMTCP

    Distributed Multithreaded checkpointing (DMTCP) checkpoints a running program on Linux with no modifications to the program or OS. It allows to restart running the program from a checkpoint.

    Modules

    To access dmtcp, load a dmtcp module. For example:

    module load dmtcp/3.0.0

    Example Programs

    Here's a dummy example prints increasing integers, every 2 seconds. Copy this to a text file on Oscar and name it dmtcp_serial.c

    Compile this program by running

    You should have the files in your directory now:

    • dmtcp_serial

    • dmtcp_serial.c

    Basic Usage

    Launch a Program

    The dmtcp_launch command launches a program, and automatically checkpoints the program. To specify the interval (seconds) for checkpoints, add the "-i num_seconds" option to the dmtcp_lauch command.

    Example: the following command launches the program dmtcp_serial and checkpoints every 8 seconds.

    As shown in the example above, a checkpoint file (ckpt_dmtcp_serial_24f183c2194a7dc4-40000-42af86bb59385.dmtcpp) is created, and can be used to restart the program

    Restart from a checkpoint

    The dmtcp_resart command restarts a program from a checkpoint, and also automatically checkpoints the program. To specify the interval (seconds) for checkpoints, add the "-i num_seconds" option to the dmtcp_restart command.

    Example: the following command restarts the dmtcp_serial program from a checkpoint, and checkpoints every 12 seconds

    Batch Jobs

    It is desirable goal that single job script can

    • launch a program if there is checkpoints, or

    • automatically restarts from a checkpoint if there is one or more checkpoints

    The job script dmtcp_serial_job.sh below is an example which shows how to achieve the goal:

    • If there is no checkpoint in the current directory, launch the program dmtcp_serial

    • If one or more checkpoints exist in the current directory, restart the program dmtcp_serial from the latest checkpoint

    First Submission - Launch a Program

    Submit dmtcp_serial_job.sh and then wait for the job to run until time out. Below shows the beginning and end of the job output file

    Later Submissions - Restart from a Checkpoint

    Submit dmtcp_serial_job.sh and then wait for the job to run until time out. Below shows the beginning of the job output file, which demonstrate that the job restarts from the checkpoint of the previous job.

    Job Array

    The following example script

    • creates a sub directory for each task of a job array, and then saves a task's checkpoint in the task's own sub directory when the job script is submitted for the first time

    • restarts checkpoints in task subdirectories when the job script is submitted for the second time or later

    Jupyter Notebooks on Oscar

    Installing Jupyter Notebook

    The anaconda modules provide jupyter-notebook. Users can also use pip or anaconda to install jupyter notebook.

    Running Jupyter Notebook on Oscar

    There are a couple of ways to use Notebook on Oscar. You can run Jupyter Notebook

    • in an

    • using a batch job

    • in an interactive session

    With the batch job or interactive session method, you use a browser on your machine to connect to your Jupyter Notebook server on Oscar.

    Start by going to the directory you want to access when using Jupyter Notebook, and then start Jupyter Notebook. The directory where a Jupyter Notebook is started is the working directory for the Notebook.

    Do not run Jupyter Notebook on login nodes.

    In a OOD Desktop App VNC Session

    Start an session, and open up a terminal in the VNC session. To start a Jupyter Notebook, enter

    This will start the Jupyter Notebook server and open up a browser with the notebook.

    If you installed Jupyter Notebook with pip, you may need to give the full path:

    ~/.local/bin/jupyter-notebook

    Using a Batch Job

    1. Submit an ssh tunnel to the server.

    2. Set up an ssh tunnel to the server.

    3. Open a browser to view the notebook.

    4. Use scancel to end the batch job when you are done.

    1. Submit batch script

    Here is an example batch script to start a Jupyter notebook server on an Oscar compute node. This script assumes that you are not using a Conda or a virtual environment.

    If you installed Jupyter notebook with pip you may need to give the full path:

    ~/.local/bin/jupyter-notebook --no-browser --port-$ipnport --ip=$ipnip

    If you are using a Conda environment, replace the last two lines with thes lines:

    This script can be found in ~/batch_scripts. Copy this example and submit this script with

    sbatch jupyter.sh

    Once your batch job is running there will be a file named jupyter-log-{jobid}.txtcontaining the information you need to connect to your jupyter notebook server on Oscar. To check if your job is running, use myq.

    The output from myq will look something like this:

    2. Set up an ssh tunnel to the notebook server

    In this example the jobID is 7239096. To view the notebook server information, use cat. For this example:

    cat jupyter-log-7239096.txt

    Open a terminal on your machine and copy and paste the ssh -N -L ........ line into the terminal.

    If you are using Windows, follow the documentation to complete this step.

    Enter your Oscar password. Note it will appear that nothing has happened.

    3. Open a browser to view the notebook

    Open a browser on your local machine to the address given in cat jupyter-log-{jobid}.txt.

    The notebook will ask for a token. Copy the token from jupyter-log-{jobid}.txt. Then your notebook will start.

    Remember to scancel {jobid} when you are done with your notebook session.

    In an Interactive Session

    1. Start Jupyter Notebook in an interactive job.

    2. Set up an ssh tunnel to the server.

    3. Open a browser to view the notebook.

    4. Use scancel to end the batch job when you are done.

    1. Start a Jupyter Notebook in an interactive job

    Start an and then in your interactive session enter the following:

    An output similar to the one below indicates that Jupyter Notebook has started:

    $ jupyter-notebook --no-browser --port=$ipnport --ip=$ipnip

    [I 13:35:25.948 NotebookApp] JupyterLab beta preview extension loaded from /gpfs/runtime/opt/anaconda/3-5.2.0/lib/python3.6/site-packages/jupyterlab

    [I 13:35:25.948 NotebookApp] JupyterLab application directory is /gpfs/runtime/opt/anaconda/3-5.2.0/share/jupyter/lab

    [I 13:35:25.975 NotebookApp] Serving notebooks from local directory: /gpfs_home/yliu385

    [I 13:35:25.975 NotebookApp] 0 active kernels

    [I 13:35:25.975 NotebookApp] The Jupyter Notebook is running at:

    [I 13:35:25.975 NotebookApp] http://172.20.207.61:8855/?token=c58d7877cfcf1547dd8e6153123568f58dc6d5ce3f4c9d98

    [I 13:35:25.975 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

    [C 13:35:25.994 NotebookApp]

    2. Setup an ssh tunnel to the server

    Open a terminal on your machine and enter the following line (replace $ipnip and $ipnport with the values from the two echo commands in the previous step).

    If you are using Windows, follow the documentation to complete this step.

    Enter your Oscar password. Note it will appear that nothing has happened.

    3. Open a browser to view the notebook

    Open a browser on your local machine to the address:

    Again, you need to replace $ipnport with the value from the first echo command in Step 1. The notebook will ask for a token. You can copy the token from the output from Step 2.

    4. Press Ctrl+C twice to kill your Jupyter Notebook server

    Once you finish and no longer need the Jupyter Notebook server, you can kill the server by pressing Ctrl+C twice in your interactive session.

    https://bitbucket.org/<username/<project_name>.gitbitbucket.org
    https://github.com/<userrname>/<project_name>.gitgithub.com
    ssh <username>@ssh.ccv.brown.edu
    The authenticity of host 'ssh.ccv.brown.edu (138.16.172.8)' can't be established.
    RSA key fingerprint is SHA256:Nt***************vL3cH7A.
    Are you sure you want to continue connecting (yes/no)? 
    [mhamilton@login004 ~]$ 
    $ checkquota
    Name       Path                 Used(G)    (%) Used   SLIMIT(G)  H-LIMIT(G) Used_Inodes     SLIMIT     HLIMIT     Usage_State  Grace_Period  
    ccvdemo1   /oscar/home          3.72       2          100        140        63539           2000000    3000000    OK           None          
    ccvdemo1   /oscar/scratch       0.00       0          512        10240      1               4000000    16000000   OK           None          
    Now fetching Data directory quotas...
    Name        Used(T)   (%) Used   SLIMIT(T)   HLIMIT(T)   Used_Inodes   SLIMIT    HLIMIT    Usage_State   Grace_Period  
    data+nopi   0.0       0          0.88        0.98        466           4194304   6291456   OK            None 
    [mhamilton@login001 ~]$ module avail workshop
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ name: workshop*/* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    workshop/1.0  workshop/2.0  
    [mhamilton@login001 ~]$ module load workshop/2.0
    module: loading 'workshop/2.0'
    [mhamilton@login001 ~]$ 
    scp /path/to/source/file <username>@ssh.ccv.brown.edu:/path/to/destination/file
    scp <username>@ssh.ccv.brown.edu:/path/to/source/file /path/to/destination/file
    rsync -azvp --progress path/to/source/directory <username>@ssh.ccv.brown.edu:/path/to/destination/directory
    BrownU_CCV_Oscar
    module load lftp  # To load the LFTP module from Oscar
    lftp -u login,passwd MyAwesomeUrl  # To connect to your (S)FTP server
    ls   # To list files on the (S)FTP server
    !ls  # To list files in your directory on Oscar
    get MyAwesomeFile  # To download a single file
    mirror # To download everything as is from the server
    mirror --directory=/name_of_directory/ # To download a specific directory
    C:\Users\<uname>\.ssh\config
    # Jump box with public IP address
    Host jump-box
        HostName poodcit4.services.brown.edu
        User <username>
    
    # Target machine with private IP address
    Host ccv-vscode-node
        HostName vscode1
        User <username>
        ProxyCommand ssh -q -W %h:%p jump-box
    Ctrl (cmd on Mac) + Shift + P > Remote-SSH: Settings
    Disable the Remote.SSH: Use Exec Server option
    cp -v /gpfs/runtime/opt/vscode-server/ccv-vscode-config/settings.json /users/$USER/.vscode-server/data/Machine/settings.json
    {
        "files.watcherExclude": {
            "**/.git/objects/**": true,
            "**/.git/subtree-cache/**": true,
            "**/node_modules/**": true,
            "/usr/local/**": true,
            "/gpfs/home/**": true,
            "/gpfs/data/**": true,
            "/gpfs/scratch/**": true
        },
        "search.followSymlinks": false,
        "search.exclude": {
            "**/.git/objects/**": true,
            "**/.git/subtree-cache/**": true,
            "**/node_modules/**": true,
            "/usr/local/**": true,
            "/gpfs/home/**": true,
            "/gpfs/data/**": true,
            "/gpfs/scratch/**": true
        }
    }
    Copy/paste this URL into your browser when you connect for the first time,

    to login with a token:

    http://172.20.207.61:8855/?token=c58d7877cfcf1547dd8e6153123568f58dc6d5ce3f4c9d98&token=c58d7877cfcf1547dd8e6153123568f58dc6d5ce3f4c9d98

    Jupyter
    OOD Desktop App (VNC)
    OOD Desktop App (VNC)
    Tunneling into Jupyter with Windows
    Interactive job
    Tunneling into Jupyter with Windows
    #include<stdio.h>
    #include<unistd.h>
    
    int main(int argc, char* argv[])
    {
        int count = 1;
        while (1)
        {
            printf(" %2d\n",count++);
            fflush(stdout);
            sleep(2)
        }
        return 0;
    }
    gcc dmtcp_serial.c -o dmtcp_serial
    $port=$(shuf -i 40000-60000 -n 1)
    $dmtcp_launch -p$port -i 8 ./dmtcp_serial  
      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
    ^C
    [yliu385@node1317 interact]$ ll
    total 2761
    -rw------- 1 yliu385 ccvstaff 2786466 May 18 11:18 ckpt_dmtcp_serial_24f183c2194a7dc4-40000-42af86bb59385.dmtcp
    lrwxrwxrwx 1 yliu385 ccvstaff      60 May 18 11:18 dmtcp_restart_script.sh -> dmtcp_restart_script_24f183c2194a7dc4-40000-42af82ef922a7.sh
    -rwxr--r-- 1 yliu385 ccvstaff   12533 May 18 11:18 dmtcp_restart_script_24f183c2194a7dc4-40000-42af82ef922a7.sh
    -rwxr-xr-x 1 yliu385 ccvstaff    8512 May 18 08:36 dmtcp_serial
    $port=$(shuf -i 40000-60000 -n 1)
    $dmtcp_restart -p $port -i 12 ckpt_dmtcp_serial_24f183c2194a7dc4-40000-42af86bb59385.dmtcp 
      9
     10
     11
     12
     13
     14
     15
    ^C
    [yliu385@node1317 interact]$ dmtcp_restart -p $port -i 12 ckpt_dmtcp_serial_24f183c2194a7dc4-40000-42af86bb59385.dmtcp 
     15
     16
     17
    ^C
    
     #!/bin/bash
    
    #SBATCH -n 1
    #SBATCH -t 5:00
    #SBATCH -J dmtcp_serial
    
    module load dmtcp/3.0.0
    
    checkpoint_file=`ls ckpt_*.dmtcp -t|head -n 1`
    checkpoint_interval=8
    port=$(shuf -i 40000-60000 -n 1)
    
    if [ -z $checkpoint_file ]; then
        dmtcp_launch -p $port -i $checkpoint_interval ./dmtcp_serial
    else
        dmtcp_restart -p $port -i $checkpoint_interval $checkpoint_file
    fi
    
    $ head  slurm-5157871.out -n 15
    ## SLURM PROLOG ###############################################################
    ##    Job ID : 5157871
    ##  Job Name : dmtcp_serial
    ##  Nodelist : node1139
    ##      CPUs : 1
    ##   Mem/CPU : 2800 MB
    ##  Mem/Node : 65536 MB
    ## Directory : /gpfs/data/ccvstaff/yliu385/Test/dmtcp/serial/batch_job
    ##   Job Started : Wed May 18 09:38:39 EDT 2022
    ###############################################################################
    ls: cannot access ckpt_*.dmtcp: No such file or directory
      1
      2
      3
      4
    $ tail slurm-5157871.out
     147
     148
     149
     150
     151
     152
     153
     154
     155
    slurmstepd: error: *** JOB 5157871 ON node1139 CANCELLED AT 2022-05-18T09:43:58 DUE TO TIME LIMIT ***
    
    $ head  slurm-5158218.out -n 15
    ## SLURM PROLOG ###############################################################
    ##    Job ID : 5158218
    ##  Job Name : dmtcp_serial
    ##  Nodelist : node1327
    ##      CPUs : 1
    ##   Mem/CPU : 2800 MB
    ##  Mem/Node : 65536 MB
    ## Directory : /gpfs/data/ccvstaff/yliu385/Test/dmtcp/serial/batch_job
    ##   Job Started : Wed May 18 09:50:39 EDT 2022
    ###############################################################################
     153
     154
     155
     156
     157
    
    #!/bin/bash
    
    #SBATCH -n 1
    #SBATCH --array=1-4
    #SBATCH -t 5:00
    #SBATCH -J dmtcp_job_array
    
    module load dmtcp/3.0.0
    
    checkpoint_interval=8
    port=$((SLURM_JOB_ID %20000 + 40000))
    task_dir=jobtask_$SLURM_ARRAY_TASK_ID
    
    if [ ! -d $task_dir ]; then
        mkdir $task_dir
        cd $task_dir
        dmtcp_launch -p $port -i $checkpoint_interval ../dmtcp_serial
    else
        cd $task_dir
        checkpoint_file=`ls ckpt_*.dmtcp -t|head -n 1`
        if [ -z $checkpoint_file ]; then
            dmtcp_launch -p $port -i $checkpoint_interval ../dmtcp_serial
        else
            dmtcp_restart -p $port -i $checkpoint_interval $checkpoint_file
        fi
    fi
    
    jupyter-notebook
    #!/bin/bash
    #SBATCH --nodes 1
    #SBATCH -c 6
    #SBATCH --time 04:00:00
    #SBATCH --mem-per-cpu 3G
    #SBATCH --job-name tunnel
    #SBATCH --output jupyter-log-%J.txt
    ## get tunneling info
    XDG_RUNTIME_DIR=""
    ipnport=$(shuf -i8000-9999 -n1)
    ipnip=$(hostname -i)
    ## print tunneling instructions to jupyter-log-{jobid}.txt
    echo -e "
        Copy/Paste this in your local terminal to ssh tunnel with remote
        -----------------------------------------------------------------
        ssh -N -L $ipnport:$ipnip:$ipnport [email protected]
        -----------------------------------------------------------------
        Then open a browser on your local machine to the following address
        ------------------------------------------------------------------
        localhost:$ipnport  (prefix w/ https:// if using password)
        ------------------------------------------------------------------
        "
    ## start an ipcluster instance and launch jupyter server
    module load anaconda/2023.09-0-7nso27y
    jupyter-notebook --no-browser --port=$ipnport --ip=$ipnip
    module purge
    module load miniconda3/23.11.0s-odstpk5 
    source /oscar/runtime/software/external/miniconda3/23.11.0/etc/profile.d/conda.sh
    jupyter-notebook --no-browser --port=$ipnport --ip=$ipnip
    Jobs for user mhamilton
    
    Running:
    ID       NAME    PART.  QOS          CPU  WALLTIME  REMAIN   NODES
    7239096  tunnel  batch  pri-mhamilt  6    4:00:00   3:57:33  node1036
    
    Pending:
    (none)
     ssh -N -L $ipnport:$ipnip:$ipnport [email protected]
    localhost:9349  (prefix w/ https:// if using password)
    unset XDG_RUNTIME_DIR
    module load anaconda/3-5.2.0
    ipnport=$(shuf -i8000-9999 -n1)
    echo $ipnport
    ipnip=$(hostname -i)
    echo $ipnip
    jupyter-notebook --no-browser --port=$ipnport --ip=$ipnip
     ssh -N -L $ipnport:$ipnip:$ipnport [email protected]
    localhost:$ipnport  (prefix w/ https:// if using password)
    CrowdStrike

    FAQ

    General

    How do I request help?

    Most inquiries can be directed to CCV’s support address, [email protected]

    , which will create a support ticket with one of our staff.

    What are the fees for CCV services?

    All CCV services are billed quarterly, and rates can be found here (requires a Brown authentication to view). Questions about rates should be directed to [email protected].

    How do I acknowledge CCV in a research publication?

    We greatly appreciate acknowledgements in research publications that benefited from the use of CCV services or resources.

    Oscar

    What is Oscar?

    Oscar is our primary research computing cluster with several hundred multi-core nodes sharing a high-performance interconnect and file system. Applications can be run interactively or scheduled as batch jobs.

    How do I request an account on Oscar?

    To request an account, please fill out a New User Account Form. All accounts are subject to our General Terms and Conditions.

    How do I run a job on Oscar?

    Sample batch scripts are available in your home directory at ~/batch_scripts and can be run with the sbatch <jobscript> command. For more information, visit our manual page on Batch Jobs.

    Can I use Oscar for teaching?

    See our page on Academic Classes

    How do I find out when the system is down?

    We post updates to our user mailing list, [email protected] which you are automatically subscribed to when setting up an account with CCV. If you need to be added to the mailing list, please submit a support ticket to [email protected]. We also have an announcement mailing list for office hours, workshops and other events relevant to CCV users, ccv-announce.listserve.brown.edu.

    How do I run a job array on Oscar?

    A job array is a special type of job submission that allows you to submit many related batch jobs with a single command. This makes it easy to do parameter sweeps or other schemes where the submitted jobs are all the same except for a single parameter such as a filename or input variable. Job arrays require special syntax in your job script. Sample batch scripts for job arrays are available in your home directory at ~/batch_scripts and can be run with the sbatch <jobscript> command. For more information, visit our manual page on Running Jobs.

    How do I run a MPI job on Oscar?

    MPI is a type of programming interface. Programs written with MPI can run on and communicate across multiple nodes. You can run MPI-capable programs by calling srun --mpi=pmix <program> in your batch script. For more detailed info, visit our manual page on MPI programs.

    I have some MPI-enabled source code. How can I compile it on Oscar?

    Load an mpi module module load mpi. For a list of mpi modules available, module avail mpi

    What applications are available on Oscar?

    Many scientific and HPC software packages are already installed on Oscar, including python, perl, R, Matlab, Mathematica, and Maple. Use the module avail command on Oscar to view the whole list or search for packages. See our manual page on Software to understand how software modules work. Additional packages can be requested by submitting a support ticket to [email protected].

    What compilers are available on Oscar?

    By default, the gcc compiler is available when you login to Oscar, providing the GNU compiler suite of gcc (C), g++ (C++), and gfortran. We also provide compilers from Intel (intel module) and the Portland Group (pgi module). For more information, visit our manual page on Software.

    How do I get information about finished jobs?

    The sacct command will list all of your completed jobs since midnight of the previous day (as well as running and queued jobs). You can pick an earlier start date with the -S option, e.g. sacct -S 2012-01-01.

    How much storage am I using?

    The checkquota command on Oscar will print a summary of the usage of your directories. For more information, see our manual page on File Systems.

    My job keeps terminating unexpectedly with a "Killed" message, or without any errors. What happened?

    These are symptoms of not requesting enough memory for your job. The default memory allocation is about 3 GB. If your job is resource-intensive, you may need to specifically allocate more. See the user manual for instructions on requesting memory and other resources.

    How do I request a certain amount of memory per CPU?

    Specify the SLURM option --mem-per-cpu= in your script.

    How do I link against a BLAS and LAPACK library?

    We recommend linking against the Intel Math Kernels Library (MKL) which provides both BLAS and LAPACK. The easiest way to do this on Oscar is to include the special environment variable $MKL at the end of your link line, e.g. gcc -o blas-app blas-app.c $MKL. For more complicated build systems, you may want to consult the MKL Link Line Advisor.

    I am getting a "WARNING: Remote HOST IDENTIFICATION HAS CHANGED?

    We have recently updated the login and VSCode node hardware to improve performance, security, and reliability. As a result of this migration, the SSH host keys for our servers have been updated. To fix this:

    • On MacOS:

    • On Linux:

    • On Windows: from VSCode's internal terminal Window:

    and delete the lines starting with Oscar and delete lines starting with vscode and oscarHopefully, this will make things easier.

    • OpenOnDemand (OOD) Shell Access: either get a Desktop session or login via regular terminal into 'ssh.ccv.brown.edu' and run

    Then login again via OOD > Clusters

    RUNNING JOBS

    How is a job identified?

    By a unique JobID, e.g. 1318013

    Which of my jobs are running/pending?

    Use the command myq

    How do I check the progress of my running job?

    You can look at the output file. The default output file is slurm-%j.out" where %j is the JobID. If you specified and output file using #SBATCH -o output_filename and/or an error file #SBATCH -e error_filename you can check these files for any output from your job. You can view the contents of a text file using the program less , e.g.

    Use the spacebar to move down the file, b to move back up the file, and q to quit.

    My job is not running how I intended it to. How do I cancel the job?

    scancel <JobID> where <JobID> is the job allocation number, e.g. 13180139

    How do I save a copy of an interactive session?

    You can use interact -o outfile to save a copy of the session's output to "outfile"

    I've submitted a bunch of jobs. How do I tell which one is which? myq will list the running and pending jobs with their JobID and the name of the job. The name of the job is set in the batch script with #SBATCH -J jobname. For jobs that are in the queue (running or pending) you can use the command scontrol show job <JobID> where <JobID> is the job allocation number, e.g.13180139 to give you more detail about what was submitted.

    How do I ask for a haswell node?

    Use the --constraint (or -C) option:

    You can use the --constraint option restrict your allocation according to other features too. The nodes command provides a list of "features" for each type of node.

    Why won't my job start?

    When your job is pending (PD) in the queue, SLURM will display a reason why your job is pending. The table below shows some common reasons for which jobs are kept pending.

    Reason
    Meaning

    (None)

    You may see this for a short time when you first submit a job

    (QOSGrpCpuLimit)

    All your condo cores are currently in use

    (QOSGrpMemLimit)

    The total memory of your running jobs and this pending job is more than the limit for your account.

    (Priority)

    Jobs with higher priority are using the resources

    (Resources)

    There are not enough free resources to fulfill your request

    Why is my job taking so long to start? Just waiting in (Priority) or (Resources)

    1. Overall system busy: when tens of thousands of jobs are submitted it total by all users, the time it takes SLURM to process these into the system may increase from the normal almost instantly to a half-hour or more.

    2. Specific resource busy: if you request very specific resources (e.g., a specific processor) you then have to wait for that specific resource to become available while other similar resources may be going unused.

    3. Specified resource not available: if you request something that is not or may never be available, your job will simply wait in the queue. E.g., requesting 64 GB of RAM on a 64 GB node will never run because the system needs at least 1 GB for itself so you should reduce your request to less than 64.

    TRANSFERRING FILES

    How do I transfer big files to/from Oscar?

    Please use the server transfer.ccv.brown.edu

    1. Transfer local file to Oscar:

    2. Transfer remote file on Oscar to the local system:

    Alternatively, Oscar has an endpoint for "Globusonline" (https://www.globus.org) that you can use to more effectively transfer files. See our manual page on how to use Globus Online to transfer files.

    Cloud HPC Options

    The use of cloud resources for HPC varies according to your demands and circumstances. Cloud options are changing rapidly both in service providers and various services being offered. For those who have short-term needs that don't demand the highest of computational performance, a cloud option might be appropriate. For others, a local option customized to individual needs may be better. The cost of cloud services also varies quite a bit and includes not only compute time but data transfer charges. Other issues involved licensing, file synchronization, etc.

    We are actively investigating a number of options to connect Brown users seamlessly to suitable cloud options. We are collecting such information for publishing on the CIS website as part of research services available. At this point, the best course of action is to request an individual consultation to help address your specific needs. Please send email to support@ccv. brown.edu.

    sed -i '' -e '/^oscar/d' -e '/^vscode/d' ~/.ssh/known_hosts
    sed -i -e '/^oscar/d' -e '/^vscode/d' ~/.ssh/known_hosts
    vi ~/.ssh/known_hosts 
    sed -i -e '/^oscar/d' -e '/^vscode/d' ~/.ssh/known_hosts
    less output_filename
    #SBATCH --constraint=haswell
    sftp <username>@transfer.ccv.brown.edu 
    put /path/local_file
    sftp <username>@transfer.ccv.brown.edu 
    get -r filename.txt 

    (JobHeldUser)

    You have put a hold on the job. The job will not run until you lift the hold.

    (ReqNodeNotAvail)

    The resources you have requested are not available. Note this normally means you have requested something impossible, e.g. 100 cores on 1 node, or a 24 core sandy bridge node. Double check your batch script for any errors. Your job will never run if you are requesting something that does not exist on Oscar.

    (PartitionNodeLimit)

    You have asked for more nodes than exist in the partition. For example if you make a typo and have specified -N (nodes) but meant -n (tasks) and have asked for more than 64 nodes. Your job will never run. Double check your batch script.

    System Hardware

    Oscar Specifications

    Compute Nodes

    388

    Total CPU Cores

    20176

    GPU Nodes

    Compute Nodes

    Oscar has compute nodes in the partitions listed below.

    • batch - The batch partition is for programs/jobs which need neither GPUs nor large memory.

    • bigmem - The bigmem partition is for programs/jobs which require large memory.

    • debug - The debug partition is for users to debug programs/jobs.

    Below are node details including cores and memory for all partitions.

    Hardware details

    Hardware details for all partitions. The Features column shows the features available for the --constraint option for SLURM. This includes the available CPU types as well GPUs.

    Partition
    Nodes
    CPUS/ Node
    Total CPUs
    GPUs/ Node
    Total GPUs
    Memory (GB)
    Features

    GPU Features and GPU Memory

    GPU Features
    GPU Memory

    gpu - The gpu partition is for programs/jobs which require GPUs.
  • gpu-debug - The gpu-debug partition is for users to debug gpu programs/jobs.

  • gpu-he -The gpu-he partition is for programs/jobs which need to access high-end GPUs.

  • vnc - The vnc partition is for users to run programs/jobs in an graphical desktop environment.

  • 770-1540

    gpu

    64

    5000

    24-128

    519

    190-1028

    gpu-he

    12

    552

    24-64

    84

    190-1028

    debug

    2

    96

    48

    n/a

    382

    gpu-debug

    1

    48

    48

    8

    1028

    vnc

    303

    13696

    24-192

    40

    102-1540

    viz

    1

    48

    48

    8

    1028

    batch

    122

    48

    5856

    n/a

    n/a

    382

    48core, intel, cascade, edr

    batch

    40

    32

    1280

    n/a

    n/a

    382

    32core, intel, scalable, cascade, edr, cifs

    batch

    10

    192

    1920

    n/a

    n/a

    1540

    192core, amd, genoa, edr

    batch

    4

    64

    256

    n/a

    n/a

    514

    64core, intel, icelake, edr

    batch

    2

    24

    48

    n/a

    n/a

    770

    24core, intel, e5-2670, e5-2600, scalable, skylake, fdr

    batch

    10

    24

    240

    n/a

    n/a

    382

    24core, intel, e5-2670, e5-2600, scalable, skylake, fdr

    bigmem

    4

    32

    128

    n/a

    n/a

    770

    32core, intel, scalable, cascade, edr

    bigmem

    2

    192

    384

    n/a

    n/a

    1540

    192core, amd, genoa, edr

    gpu

    2

    32

    64

    5

    10

    382

    intel, gpu, titanrtx, turing, skylake, 6142

    gpu

    1

    24

    24

    5

    5

    190

    intel, gpu, titanrtx, turing, skylake, 6142

    gpu

    1

    48

    48

    10

    10

    382

    intel, gpu, quadrortx, turing, cascade

    gpu

    10

    32

    320

    10

    100

    382

    intel, gpu, quadrortx, turing, cascade

    gpu

    13

    64

    832

    8

    104

    1028

    amd, gpu, geforce3090, ampere

    gpu

    4

    48

    192

    8

    32

    1028

    amd, gpu, geforce3090, ampere

    gpu

    7

    128

    896

    8

    56

    1028

    amd, cifs, gpu, a5500, ampere

    gpu

    10

    64

    640

    8

    80

    1028

    amd, cifs, gpu, a5000, ampere

    gpu

    10

    128

    1280

    8

    80

    1028

    amd, cifs, gpu, a5000, ampere

    gpu

    1

    64

    64

    2

    2

    1028

    amd, gpu, a5000, ampere

    gpu

    2

    128

    256

    8

    16

    1028

    amd, gpu, a5500, cifs, ampere

    gpu

    3

    128

    384

    8

    24

    1028

    amd, gpu, cifs, a5000, ampere

    gpu-he

    3

    48

    144

    8

    24

    1028

    amd, gpu, a40, ampere

    gpu-he

    3

    24

    72

    4

    12

    190

    intel, gpu, 4gpu, v100, volta, skylake, 6126

    gpu-he

    4

    64

    256

    8

    32

    1028

    amd, gpu, a6000, ampere

    gpu-he

    2

    40

    80

    8

    16

    512

    intel, cifs, gpu, v100, volta, haswell

    debug

    2

    48

    96

    n/a

    n/a

    382

    48core, intel, cascade, edr

    gpu-debug

    1

    48

    48

    8

    8

    1028

    amd, gpu, geforce3090, ampere

    vnc

    100

    32

    3200

    n/a

    n/a

    190

    32core, intel, scalable, cascade, edr

    vnc

    134

    48

    6432

    n/a

    n/a

    382

    48core, intel, cascade, edr

    vnc

    1

    64

    64

    8

    8

    1028

    amd, cifs, gpu, a5000, ampere

    vnc

    2

    128

    256

    16

    32

    102

    amd, gpu, a2, ampere

    vnc

    40

    32

    1280

    n/a

    n/a

    382

    32core, intel, scalable, cascade, edr, cifs

    vnc

    10

    192

    1920

    n/a

    n/a

    1540

    192core, amd, genoa, edr

    vnc

    4

    64

    256

    n/a

    n/a

    514

    64core, intel, icelake, edr

    vnc

    2

    24

    48

    n/a

    n/a

    770

    24core, intel, e5-2670, e5-2600, scalable, skylake, fdr

    vnc

    10

    24

    240

    n/a

    n/a

    382

    24core, intel, e5-2670, e5-2600, scalable, skylake, fdr

    viz

    1

    48

    48

    8

    8

    1028

    amd, gpu, geforce3090, ampere

    titanrtx

    24 GB

    geforce3090

    24 GB

    p100

    12 GB

    titanv

    12 GB

    1000ti

    11 GB

    82

    Total GPUs

    667

    Large Memory Nodes

    6

    Partition

    Total Nodes

    Total Cores

    Cores Per Node

    Total GPUs

    Memory Per Node (GB)

    batch

    288

    12800

    24-192

    n/a

    190-1540

    bigmem

    6

    512

    32-192

    batch

    100

    32

    3200

    n/a

    n/a

    190

    a6000

    48 GB

    a40

    45 GB

    v100

    32 GB

    a5000

    24 GB

    quadrortx

    24 GB

    n/a

    32core, intel, scalable, cascade, edr

    Migration of MPI Apps to Slurm 22.05.7

    In January 2023, Oscar will be migrating to use Slurm version 22.05.7.

    Slurm version 22.05.7

    • improves security and speed,

    • supports boths PMI2 and PMIX, and

    • provides REST APIs

    • allows users to prioritize their jobs via scontrol top <job_id>

    While most applications will be unaffected by these changes, applications built to make use of MPI may need to be rebuilt to work properly. To help facilitate this, we are providing users who use MPI-based applications (either through Oscar's module system or built by users) with advanced access to a test cluster running the new version of Slurm. Instructions for accessing the test cluster, building MPI-based applications, and submitting MPI jobs using the new Slurm, are provided below.

    Please note - some existing modules of MPI-based applications will be deprecated and removed from the system as part of this upgrade. A list of modules that will no longer be available to users following the upgrade is given at the bottom of the page.

    Instructions for Testing Applications with Slurm 22.05.7

    1. Request access to the Slurm 22.05.7 test cluster (email [email protected])

    2. Connect to Oscar via either SSH or Open OnDemand (instructions below)

    3. Build your application using the new MPI applications listed below

    Users must contact [email protected] to obtain access to the test cluster in order to submit jobs using Slurm 22.05.7.

    Connecting via SSH

    1. Connect to Oscar using the ssh command in a terminal window

    2. From Oscar's command line, connect to the test cluster using the command ssh node1947

    3. From the node1947 command line, submit your jobs (either interactive or batch) as follows:

    • For CPU-only jobs: interact -q image-test

    • For GPU jobs: interact -q gpu

    Include the following line within your batch script and then submit using the sbatch command, as usual

    Connecting via Open OnDemand

    1. Open a web browser and connect to poodcit2.services.brown.edu

    2. Login with your Oscar username and password

    3. Start a session using the Advanced Desktop App

    4. Select the gpu partition and click the launch

    • Only the Advanced Desktop App will connect to the test cluster

    • The Advanced Desktop App must connect to the gpu partition

    MPI Applications

    Migrated or New Modules

    If the "Current Module Version" for an application is blank, a new version is built for the application.

    Application
    Current Module Version
    Migrated or New Module Version

    To build custom applications:

    We recommend using following MPI modules to build your custom applications:

    MPI
    Oscar Module

    module load mpi/openmpi_4.0.7_gcc_10.2_slurm22

    module load gcc/10.2 cuda/11.7.1

    CC=mpicc CXX=mpicxx ./configure --prefix=/path/to/install/dir

    module load mpi/openmpi_4.0.7_gcc_10.2_slurm22

    module load gcc/10.2 cuda/11.7.1

    cmake -DCMAKE_C_COMPILER=mpicc DCMAKE_CXX_COMPILER=mpicxx ..

    Deprecated Modules

    A new module might be available for a deprecated application module. Please search the table above to check if a new module is available for an application.

    Application
    Deprecated Module
    Submit your job

    For CPU-only jobs: #SBATCH -p image-test

  • For GPU jobs: #SBATCH -p gpu

  • button.
    • CharMM/c47b1_slurm20

    • CharMM/c47b1

    cp2k

    • 2022.2

    dedalus

    • 2.1905

    • 2.1905_openmpi_4.05_gcc_10.2_slurm20

    • 2.1905_openmpi_4.0.7_gcc_10.2_slurm22

    esmf

    • 8.4.0b12

    • 8.4.0_openmpi_4.0.7_gcc_10.2_slurm22

    fftw

    • 3.3.6

    • 3.3.8

    • 3.3.6_openmpi_4.0.7_gcc_10.2_slurm22

    • 3.3.10_slurm22

    global_arrays

    • 5.8_openmpi_4.0.5_gcc_10.2_slurm20

    • 5.8_openmpi_4.0.7_gcc_10.2_slurm22

    gpaw

    • 21.1.0_hpcx_2.7.0_gcc_10.2_slurm20

    • 21.1.0_openmpi_4.0.5_gcc_10.2_slurm20

    • 21.1.0a_openmpi_4.0.5_gcc_10.2_slurm20

    • 21.1.0_openmpi_4.0.7_gcc_10.2_slurm22

    • 21.1.0_openmpi_4.0.7_gcc_10.2_slurm22

    • 21.1.0_openmpi_4.0.7_gcc_10.2_slurm22

    gromacs

    • 2018.2

    • gromacs/2018.2_mvapich2-2.3.5_gcc_10.2_slurm22

    hdf5

    • 1.10.8_mvapich2_2.3.5_gcc_10.2_slurm22

    • 1.10.8_openmpi_4.0.7_gcc_10.2_slurm22

    • 1.10.8_openmpi_4.0.7_intel_2020.2_slurm22

    • 1.12.2_openmpi_4.0.7_intel_2020.2_slurm22

    ior

    • 3.3.0

    lammps

    • 29Sep21_openmpi_4.0.5_gcc_10.2_slurm20

    • 29Sep21_openmpi_4.0.7_gcc_10.2_slurm22

    meme

    • 5.3.0

    • 5.3.0_slurm22

    Molpro

    • 2021.3.1

    • 2021.3.1_openmpi_4.0.7_gcc_10.2_slurm22

    mpi

    • hpcx_2.7.0_gcc_10.2_slurm20

    • mvapich2-2.3.5_gcc_10.2_slurm20

    • hpcx_2.7.0_gcc_10.2_slurm22

    • mvapich2-2.3.5_gcc_10.2_slurm22

    • openmpi_4.0.7_gcc_10.2_slurm22

    • openmpi_4.0.7_intel_2020.2_slurm22

    mpi4py

    • 3.1.4_py3.9.0_slurm22

    netcdf

    • 4.7.4_gcc_10.2_hdf5_1.10.5

    • 4.7.4_intel_2020.2_hdf5_1.12.0

    • 4.7.4_gcc_10.2_hdf5_1.10.8_slurm22

    • 4.7.4_gcc_10.2_hdf5_1.12.2_slurm22

    netcdf4-python

    • 1.6.2

    osu-mpi

    • 5.6.3_openmpi_4.0.7_gcc_10.2

    petsc

    • petsc/3.18.2_openmpi_4.0.7_gcc_10.2_slurm22

    pnetcdf

    • 1.12.3

    • 1.12.3_openmpi_4.0.7_gcc_10.2_slurm22

    qmcpack

    • 3.9.2_hpcx_2.7.0_gcc_10.2_slurm20

    • 3.9.2_openmpi_4.0.0_gcc_8.3_slurm20

    • 3.9.2_openmpi_4.0.0_gcc_8.3_slurm20_complex

    • 3.9.2_openmpi_4.0.1_gcc

    • 3.9.2_openmpi_4.0.7_gcc_10.2_slurm22

    quantumespresso

    • 6.4_openmpi_4.0.0_gcc_8.3_slurm20

    • 6.4_openmpi_4.0.5_intel_2020.2_slurm20

    • 7.0_openmpi_4.0.5_intel_2020.2_slurm20

    • 6.4_openmpi_4.0.7_gcc_10.2_slurm22

    • 6.4_openmpi_4.0.7_intel_2020.2_slurm22

    • 7.0_openmpi_4.0.7_gcc_10.2_slurm22

    vasp

    • 5.4.1

    • 5.4.1_mvapich2-2.3.5_intel_2020.2_slurm20

    • 5.4.4

    • 5.4.4_intel

    • 5.4.1_slurm22

    • 5.4.4_slurm22

    • 5.4.4_openmpi_4.0.7_gcc_10.2_slurm22

    • 6.1.1_ompi407_yqi27_slurm22

    wrf

    • 4.2.1_hpcx_2.7.0_intel_2020.2_slurm20

    boost

    • 1.55

    • 1.57

    • 1.68

    • 1.44.0

    cabana

    • 1

    • 1.1

    • 1.1_hpcx_2.7.0_gcc_10.2_slurm20

    campari

    • 3.0

    cesm

    • 1.2.1

    • 1.2.2

    • 2.1.1

    cp2k

    • 7.1

    • 7.1_mpi

    • 8.1.0

    • 9.1.0

    dacapo

    • 2.7.16_mvapich2_intel

    dalton

    • 2018

    • 2018.0_mvapich2-2.3.5_intel_2020.2_slurm20

    dice

    • 1

    esmf

    • 7.1.0r

    • 8.0.0

    • 8.0.0b

    • 8.1.0b11

    fenics

    • 2017.1

    • 2018.1.0

    ffte

    • 6.0

    • 6.0/mpi

    fftw

    • 2.1.5

    • 2.1.5_slurm2020

    • 2.1.5-double

    • 3.3.8a

    gerris

    • 1

    global_arrays

    • 5.6.1

    • 5.6.1_i8

    • 5.6.1_openmpi_2.0.3

    gpaw

    • 1.2.0

    • 1.2.0_hpcx_2.7.0_gcc

    • 1.2.0_mvapich2-2.3a_gcc

    • 20.10_hpcx_2.7.0_intel_2020.2_slurm20

    gromacs

    • 2016.6

    • 2020.1

    • 2018.2_gpu

    • 2018.2_hpcx_2.7.0_gcc_10.2_slurm20

    hande

    • 1.1.1

    • 1.1.1_64

    • 1.1.1_debug

    hdf5

    • 1.10.0

    • 1.10.1_parallel

    • 1.10.5

    • 1.10.5_fortran

    hnn

    • 1.0

    hoomd

    • 2.9.0

    horovod

    • 0.19.5

    ior

    • 3.0.1

    • 3.3.0

    lammps

    • 17-Nov-16

    • 11-Aug-17

    • 16-Mar-18

    • 22-Aug-18

    medea

    • 3.2.3.0

    meme

    • 5.0.5

    meshlab

    • 20190129_qt59

    Molpro

    • 2019.2

    • 2020.1

    • 2012.1.15

    • 2015_gcc

    mpi4py

    • 3.0.1_py3.6.8

    multinest

    • 3.1

    n2p2

    • 1.0.0

    • 2.0.0

    • 2.0.0_hpcx

    namd

    • 2.11-multicore

    • 2.13b1-multicore

    netcdf

    • 3.6.3

    • 4.4.1.1_gcc

    • 4.4.1.1_intel

    • 4.7.0_intel2019.3

    nwchem

    • 7

    • 6.8-openmpi

    • 7.0.2_mvapich2-2.3.5_intel_2020.2_slurm20

    • 7.0.2_openmpi_4.0.5_intel_2020.2_slurm20

    openfoam

    • 4.1

    • 7

    • 4.1-openmpi_3.1.6_gcc_10.2_slurm20

    • 4.1a

    openmpi

    • openmpi_4.0.5_gcc_10.2_slurm20

    Openmpi wth Intel compilers

    • openmpi_4.0.5_intel_2020.2_slurm20

    orca

    • 4.0.1.2

    • 4.1.1

    • 4.2.1

    • 5.0.0

    osu-mpi

    • 5.3.2

    paraview

    • 5.1.0

    • 5.1.0_yurt

    • 5.4.1

    • 5.6.0_no_scalable

    paris

    • 1.1.3

    petsc

    • 3.14.2_hpcx_2.7.0_intel_2020.2_slurm20

    • 3.14.2_mpich3.3a3_intel_2020.2

    • 3.7.5

    • 3.7.7

    phyldog

    • 1.0

    plumed

    • 2.7.2

    • 2.7.5

    pmclib

    • 1.1

    polychord

    • 1

    • 2

    polyrate

    • 17C

    potfit

    • 20201014

    • 0.7.1

    prophet

    • augustegm_1.2

    pstokes

    • 1.0

    pymultinest

    • 2.9

    qchem

    • 5.0.2

    • 5.0.2-openmpi

    qmcpack

    • 3.10.0_hpcx_2.7.0_intel_2020.2_slurm20

    • 3.10.0_openmpi_4.0.5_intel_2020.2_slurm20

    • 3.7.0

    • 3.9.1

    quantumespresso

    • 6.1

    • 6.4

    • 6.5

    • 6.6

    relion

    • 3.1.3

    rotd

    • 2014-11-15_mvapich2

    scalasca

    • 2.3.1_intel

    scorep

    • 3.0_intel_mvapich2

    siesta

    • 3.2

    • 4.1

    sprng

    • 5

    su2

    • 7.0.2

    trilinos

    • 12.12.1

    vtk

    • 7.1.1

    • 8.1.0

    wrf

    • 3.6.1

    • 4.2.1_hpcx_2.7.0_intel_2020.2_slurm20

    abaqus

    • 2021.1_intel17

    • 2021_slurm22_a

    ambertools

    • amber22

    boost

    • 1.69

    • 1.69_openmpi_4.0.7_gcc_10.2_slurm22

    GCC based OpenMPI

    mpi/openmpi_4.0.7_gcc_10.2_slurm22

    Intel based OpenMPI

    mpi/openmpi_4.0.7_intel_2020.2_slurm22

    MVAPICH

    mpi/mvapich2-2.3.5_gcc_10.2_slurm22

    Mellanox HPC-X

    mpi/hpcx_2.7.0_gcc_10.2_slurm22

    abaqus

    • 2017

    • 2021

    • 2021.1

    • 6.12sp2

    abinit

    • 9.6.2

    abyss

    • 2.1.1

    ambertools

    • amber16

    • amber16-gpu

    • amber17

    • amber17_lic

    • amber21

    bagel

    • 1.2.2

    CharMM

    3.9.2_openmpi_4.0.4_gcc

  • 3.9.2_openmpi_4.0.5_intel_2020.2_slurm20

  • 5.4.4_mvapich2-2.3.5_intel_2020.2_slurm20

  • 5.4.4_openmpi_4.0.5_gcc_10.2_slurm20

  • 5.4.4a

  • 6.1.1_ompi405_yqi27

  • 6.1.1_openmpi_4.0.5_intel_2020.2_yqi27_slurm20

  • 6.1.1_yqi27

  • 6.3.0_cfgoldsm

  • 6.3.2_avandewa

  • 6.3.0_cfgoldsm_slurm22

  • 6.3.2_avandewa_slurm22

  • 1.62.0-intel

  • 1.63.0

  • 1.75.0_openmpi_4.0.5_intel_2020.2_slurm20

  • 1.76.0_hpcx_2.7.0_gcc_10.2_slurm20

  • 1.76.0_hpcx_2.7.0_intel_2020.2_slurm20

  • 8.1.9b17

  • 8.3.0

  • 8.3.1b05

  • 20.10.0_hpcx_2.7.0_intel_2020.2_slurm20

    2020.1_hpcx_2.7.0_gcc_10.2_slurm20

  • 2020.4_gpu

  • 2020.4_gpu_hpcx_2.7.0_gcc_10.2_slurm20

  • 2020.4_hpcx_2.7.0_gcc_10.2_slurm20

  • 2020.6_plumed

  • 2021.5_plumed

  • 1.10.5_mvapich2-2.3.5_intel_2020.2_slurm20

  • 1.10.5_openmpi_3.1.3_gcc

  • 1.10.5_openmpi_3.1.6_gcc

  • 1.10.5_openmpi_4.0.0_gcc

  • 1.10.5_openmpi_4.0.5_gcc_10.2_slurm20

  • 1.10.5_parallel

  • 1.10.7_hpcx_2.7.0_intel_2020.2_slurm20

  • 1.10.7_openmpi_4.0.5_gcc_10.2_slurm20

  • 1.10.7_openmpi_4.0.5_intel_2020.2_slurm20

  • 1.12.0_hpcx_2.7.0_intel_2020.2

  • 1.12.0_hpcx_2.7.0_intel_2020.2_slurm20

  • 1.12.0_openmpi_4.0.5_intel_2020.2_slurm20

  • 7-Aug-19

  • 11Aug17_serial

  • 29Oct20_hpcx_2.7.0_intel_2020.2

  • 29Oct20_openmpi_4.0.5_gcc_10.2_slurm20

  • 2015_serial

  • 2018.2_ga

  • 2019.2_ga

  • 2020.1_ga

  • 2020.1_openmpi_4.0.5_gcc_10.2_slurm20

  • 2021.3.1_openmpi_4.0.5_gcc_10.2_slurm20

  • 4.7.4_gcc8.3

    7.0.2_openmpi_4.1.1_gcc_10.2_slurm20

    7.0_hpcx_2.7.0_gcc_10.2_slurm20

    5.0.1

    5.6.0_yurt

  • 5.8.0

  • 5.8.0_mesa

  • 5.8.0_release

  • 5.8.1_openmpi_4.0.5_intel_2020.2_slurm20

  • 5.9.0

  • 5.9.0_ui

  • 3.8.3

    3.9.1_openmpi_3.1.6

    6.4_hpcx_2.7.0_intel_2020.02_slurm20

  • 6.4_hpcx_2.7.0_intel_2020.2_slurm20

  • 6.4_openmpi_4.0.5_intel_slurm20

  • 6.4.1

  • 6.5_openmpi_4.0.5_intel_slurm20

  • 6.6_openmpi_4.0.5_intel_2020.2_slurm20

  • 6.7_openmpi_4.0.5_intel_2020.2_slurm20