Difference between revisions of "File System"

From CCI User Wiki
Jump to: navigation, search
(Barn (long-term storage))
(Home)
 
(20 intermediate revisions by 2 users not shown)
Line 1: Line 1:
CCI is moving to the unified GPFS file system.
+
CCI utilizes a single unified GPFS file system across all clusters/nodes.
 
 
  
 
== General Layout ==
 
== General Layout ==
The Unified CCI GPFS filesystem is built using a block size of 8MiB. Performance testing indicated this was about optimal for our storage system hardware.  Applications using large-record I/O will benefit most from the large block size. Performance testing shows that applications with small-record I/O perform at least nearly as well with the large block size as in a file system with a much smaller block size.
+
The Unified CCI GPFS file system is built using a block size of 8MiB. Performance testing indicated this was about optimal for our storage system hardware.  Applications using large-record I/O will benefit most from the large block size. Performance testing shows that applications with small-record I/O perform at least nearly as well with the large block size as in a file system with a much smaller block size.
  
The file system is broken into three main areas: [[#Home directories|home]], [[#Scratch (short-term storage)|scratch]], and [[#Barn (long-term storage)|barn]]. Each area of the filesystem has its own purpose and merits.
+
The file system is divided into three main areas: [[#Home directories|home]], [[#Scratch (short-term storage)|scratch]], and [[#Barn (long-term storage)|barn]]. Each area of the file system has its own purpose and tradeoffs.
  
 
=== Tree ===
 
=== Tree ===
Line 23: Line 22:
  
 
== Home ==
 
== Home ==
Home contains user home directories, organized by projects. User home directories are only writable by the user. This area of the filesystem has a 10 GiB quota and is the only area of the filesystem that uses replication to protect data.
+
The '''home''' area contains user home directories, organized by projects. User home directories are only writable by the user. This area of the file system has a 10 GiB quota.
  
'''Please note:''' In GPFS, files are counted twice during quota calculations when they are replicated. This means the home directory limit is effectively 5 GiB per project (shared + every user's home).
+
'''Please note:''' The home directory limit is '''per project''' (sum of every project user's home directory usage).
  
Home directories are intended to store only your "dot files" and maybe a few other configuration files, scripts, or small programs you need to customize your working environment. Program files, data sets, etc. should be stored in your [[#Barn (long-term storage)|barn]].
+
Home directories are intended to store only files that are used by or during interactive sessions: "dot files", configuration files, scripts, or small programs needed to customize the working environment. Program files, data sets, etc. should be stored in the [[#Barn (long-term storage)|barn]].
  
 
== Scratch (short-term storage) ==
 
== Scratch (short-term storage) ==
There is a scratch data directory for each project and the associated users. as well as corresponding links in each user's home directory. This space is meant as a '''temporary''' staging area for performing computation.  Performance in this directory will be better than in the home directory.  This space does not have a quota.
+
There is a '''scratch''' data directory for each project and its associated users, as well as corresponding links in each user's home directory. This space is meant as a '''temporary''' staging area for performing computation.  Performance in this directory will be better than in the '''home''' or '''barn''' areas.  This space does not have a quota.
  
 
Each home directory contains a link, '''scratch''', to the user's personal scratch space, and a link, '''scratch-shared''', to the project's shared scratch space.
 
Each home directory contains a link, '''scratch''', to the user's personal scratch space, and a link, '''scratch-shared''', to the project's shared scratch space.
  
'''Important:''' This space will periodically be purged of files older than 14 days. This policy is subject to change based on filesystem demands. If longer-term storage of data is necessary it should be stored in the barn area.
+
'''Important:''' This space will periodically be purged of files older than 56 days, and if this is not sufficient to maintain enough working space, may be (with advance warning) purged of all files. This policy is subject to change based on file system demand. If longer-term storage of data is necessary it should be stored in the '''barn''' area.
 +
 
 +
'''Important:''' Because scratch space is not replicated, it is vulnerable to data loss or corruption if we suffered a serious storage system failure.  We may remove files before their normal expiration date if we suspect there is data corruption.
  
 
== Barn (long-term storage) ==
 
== Barn (long-term storage) ==
There is a barn directory for each project and the associated users with corresponding links in each user's home directory. This space is meant to allow for longer-term storage of ''working'' data and programs than allowed by the scratch area. It is not meant for long-term retention of results, but rather is the space to store the tools you need to do your work. This space has a 10GiB quota and is not replicated, nor is it purged of old files.  Project users must manage their own space usage in the barn.
+
There is a '''barn''' directory for each project and its associated users with corresponding links in each user's home directory. This space is meant to allow for longer-term storage of ''working'' data and programs than allowed by the '''scratch''' area. It is not meant for long-term retention of results, but rather is the space to store the tools you need to do your work. Nor is it intended to be the area
 +
where your computations run; they will perform better out of '''scratch'''.  You may want to keep your actual executables in the '''barn''', stage up data in '''scratch''' for several jobs using data sets stored in your '''barn''', and copy back final results to your '''barn''' until you can properly retrieve them to your own local long-term storage.
 +
 
 +
Each project's '''barn''' starts with a 10 GiB quota.  Like '''home''' it is never automatically purged of old files.  Project users must manage their own space usage in the barn.
  
 
Each home directory contains a link, '''barn''', to the user's personal barn space, and a link, '''barn-shared''', to the project's shared barn space.
 
Each home directory contains a link, '''barn''', to the user's personal barn space, and a link, '''barn-shared''', to the project's shared barn space.
  
Additional space may be allocated to a project's barn at the discretion of the CCI Director upon written request by the project PI. Any extended quotas are subject to periodic review and potential reduction at the discretion of the Director.
+
Additional space may be allocated to a project's '''barn''' at the discretion of the CCI Director upon written request by the project PI. Any extended quotas are subject to periodic review and potential reduction at the discretion of the Director.

Latest revision as of 13:08, 20 November 2017

CCI utilizes a single unified GPFS file system across all clusters/nodes.

General Layout

The Unified CCI GPFS file system is built using a block size of 8MiB. Performance testing indicated this was about optimal for our storage system hardware. Applications using large-record I/O will benefit most from the large block size. Performance testing shows that applications with small-record I/O perform at least nearly as well with the large block size as in a file system with a much smaller block size.

The file system is divided into three main areas: home, scratch, and barn. Each area of the file system has its own purpose and tradeoffs.

Tree

/gpfs
 /u
  /home
   /PROJ
    /USER
  /scratch
   /PROJ
    /shared
    /USER
  /barn
   /PROJ
    /shared
    /USER

Home

The home area contains user home directories, organized by projects. User home directories are only writable by the user. This area of the file system has a 10 GiB quota.

Please note: The home directory limit is per project (sum of every project user's home directory usage).

Home directories are intended to store only files that are used by or during interactive sessions: "dot files", configuration files, scripts, or small programs needed to customize the working environment. Program files, data sets, etc. should be stored in the barn.

Scratch (short-term storage)

There is a scratch data directory for each project and its associated users, as well as corresponding links in each user's home directory. This space is meant as a temporary staging area for performing computation. Performance in this directory will be better than in the home or barn areas. This space does not have a quota.

Each home directory contains a link, scratch, to the user's personal scratch space, and a link, scratch-shared, to the project's shared scratch space.

Important: This space will periodically be purged of files older than 56 days, and if this is not sufficient to maintain enough working space, may be (with advance warning) purged of all files. This policy is subject to change based on file system demand. If longer-term storage of data is necessary it should be stored in the barn area.

Important: Because scratch space is not replicated, it is vulnerable to data loss or corruption if we suffered a serious storage system failure. We may remove files before their normal expiration date if we suspect there is data corruption.

Barn (long-term storage)

There is a barn directory for each project and its associated users with corresponding links in each user's home directory. This space is meant to allow for longer-term storage of working data and programs than allowed by the scratch area. It is not meant for long-term retention of results, but rather is the space to store the tools you need to do your work. Nor is it intended to be the area where your computations run; they will perform better out of scratch. You may want to keep your actual executables in the barn, stage up data in scratch for several jobs using data sets stored in your barn, and copy back final results to your barn until you can properly retrieve them to your own local long-term storage.

Each project's barn starts with a 10 GiB quota. Like home it is never automatically purged of old files. Project users must manage their own space usage in the barn.

Each home directory contains a link, barn, to the user's personal barn space, and a link, barn-shared, to the project's shared barn space.

Additional space may be allocated to a project's barn at the discretion of the CCI Director upon written request by the project PI. Any extended quotas are subject to periodic review and potential reduction at the discretion of the Director.