Archive Data

The archive feature is a very useful tool for researchers. Some typical use cases are:

  • Cold data: Data that is not actively accessed but needs to be preserved (Ex: Patent datasets). Archiving such data also helps in reducing data usage/capacity planning.

  • Snapshot/secondary copy: A point-in-time copy of a research dataset that can be recovered if required (Ex: Simulation datasets for publications).

Following are the steps to archive data using Hibernate service:

Step 1 (Identify the data): Navigate into the directory you wish to archive

Step 2a (Tag the data): Right-click on the particular directory and select Tags > Action Tags > A Hibernate: Archive

Before tagging the data, please check our best practices section for tips and hints that will help the archival process to run smoothly.

Step 2b: Once you select that, the A Hibernate: Archive tag should appear on the directory

Step 2c: Repeat the procedure to tag all relevant folders and files to be archived.

Step 3 (Archive the data): Once tagging is complete, please complete the Hibernate Service Request form and select the appropriate action tag. This will automatically submit a ticket to the storage team to start the appropriate process.

Currently, we are able to archive 6-8TB/day OR about 1 million objects a day (whichever threshold is met first).

Step 4 (Delete the data from source): Once the data is validated on the two Object Archive platforms, the original data on the respective source(s) will be deleted.

Best Practices (Archive)

Archiving data is a very tedious process. A lot of factors like directory/file sizes, number of files in a single directory, etc. play a critical role in determining how efficiently data can be archived. Here are some best practices to facilitate an efficient archive process:

  • Tagging Directories: Always tag folders, not individual files (except compressed folders - see next tip). If a single file(s) needs to be archived, move it to a folder.

  • Delete Empty Directories: It's a best practice to delete empty directories before archiving.

  • Compressing Folders: Folders that are small in size but contain thousands of files should be compressed (zip, tar). For example, archiving a folder of size 10GB containing 500,000 files will be treated as 500,001 objects (files+folder) to archive and the same for retrieval. However, if the same folder is compressed, it is treated as one (1) object to archive.

    • Examples:

      • Given the following path with the number of objects at each level, it's very important to determine where to compress folders: /root_dir(1,000,000 - 10 GB)/

        sub_dir1 (900,000 - 9 GB)/ sub_dir2(800,000 - 8 GB)/ sub_dir3_1(8,000 - 80 MB)/ sub_dir3_2(8,000 - 80 MB)/ ... sub_dir3_100(8,000 - 80 MB)/

      • Compressing at 'root_dir' level

        • Pros: Will consolidate 1 million objects into a single object.

        • Cons: Will lose the flexibility to recover individual sub-folders; need to recover an entire folder, requires more system resources and time to compress 1 million objects (even though they are small in size)

      • Compressing at 'sub_dir3' level

        • Pros: Will consolidate 1 million objects into less than 200 objects while providing great flexibility while recovering. It does not require too many system resources while compressing.

        • Cons: Requires admin's time to analyze and identify the path where compression needs to happen, and automate compressing multiple sub-directories.

Last updated