Archive Data
Last updated
Last updated
The archive feature is a very useful tool for researchers. Some typical use cases are:
Cold data: Data that is not actively accessed but needs to be preserved (Ex: Patent datasets). Archiving such data also helps in reducing data usage/capacity planning.
Snapshot/secondary copy: A point-in-time copy of a research dataset that can be recovered if required (Ex: Simulation datasets for publications).
Following are the steps to archive data using Hibernate service:
Step 1 (Identify the data): Navigate into the directory you wish to archive
Step 2a (Tag the data): Right-click on the particular directory and select Tags > Action Tags > A Hibernate: Archive
Before tagging the data, please check our best practices section for tips and hints that will help the archival process to run smoothly.
Step 2b: Once you select that, the A Hibernate: Archive tag should appear on the directory
Step 2c: Repeat the procedure to tag all relevant folders and files to be archived.
Step 3 (Archive the data): Once tagging is complete, please complete the Hibernate Service Request form and select the appropriate action tag. This will automatically submit a ticket to the storage team to start the appropriate process.
Currently, we are able to archive 6-8TB/day OR about 1 million objects a day (whichever threshold is met first).
Step 4 (Delete the data from source): Once the data is validated on the two Object Archive platforms, the original data on the respective source(s) will be deleted.
Archiving data is a very tedious process. A lot of factors like directory/file sizes, number of files in a single directory, etc. play a critical role in determining how efficiently data can be archived. Here are some best practices to facilitate an efficient archive process:
Tagging Directories: Always tag folders, not individual files (except compressed folders - see next tip). If a single file(s) needs to be archived, move it to a folder.
Delete Empty Directories: It's a best practice to delete empty directories before archiving.
Logically arrange data that might help efficient recoveries.
Compressing Folders: Folders that are small in size but contain thousands of files should be compressed (zip, tar). For example, archiving a folder of size 10GB containing 500,000 files will be treated as 500,001 objects (files+folder) to archive and the same for retrieval. However, if the same folder is compressed, it is treated as one (1) object to archive.
Example:
Given the folder path as shown above with the number of objects and size at each folder level, it's essential to determine where to compress folders.
Compressing at 'root_dir' level
Pros: Will consolidate 1 million objects into a single object.
Cons: Will lose the flexibility to recover individual sub-folders; need to recover an entire folder, requires more system resources and time to compress 1 million objects (even though they are small in size)
Compressing at 'sub_dir3' level
Pros: Will consolidate 1 million objects into less than 200 objects while providing great flexibility while recovering. It does not require too many system resources while compressing.
Cons: Requires admin's time to analyze and identify the path where compression needs to happen, and automate compressing multiple sub-directories.
Once the data is deleted from the source, users can still refer to their archived data through the .