Frequently Asked Questions

From CCI User Wiki
Jump to: navigation, search
How do I get an account?
If you're a member of the Rensselaer community, see Project Instructions for RPI Personnel. Otherwise see Project Instructions for CCI Affiliates. All forms for new accounts should be send to accounts[at]ccni.rpi.edu only.
How do I change my password? (You must change your password before you can log into the Landing Pads)
Use the Password Change form. If you have multiple accounts for different projects, this form will change the password for all of your accounts together.
How do I ask for help, report a system problem, or make changes to my existing account?
If you need assistance or would like to report a system problem you can either call CCI support at 518-276-6797 or send an email to support[at]ccni.rpi.edu. Please do not CC the support email address.
What is the meaning of the user name format?
Resource usage is tracked and regulated by the user ID associated with a job or files on disk. If one person is involved with multiple projects they will be given different user IDs for each project that they are involved with. These IDs are formatted to include an identifier of both the project and user.
How do I check my GPFS quota usage?
Executing "df -h ." in a directory will display usage based on the quota enforced for that particular directory tree.
How do I access the systems?
First, use ssh to connect to a Landing Pad. Then connect from a landing pad to a cluster front-end such as q/amos to access the Blue Gene/Q. Please check our List of Available Systems for more information.
How do I get data onto or off of the system from off campus?
Use scp to transfer data to the Landing Pad systems. The file systems mounted there match those present on the compute systems.
See this page for large file transfers.
What MPI implementations are available? How do I use them?
There are several different implementations of MPI available on the clusters including Open MPI, MVAPICH, and MVAPICH2. We use modules to simplify the process of making these libraries available to users.
What is the time limit for running jobs?
The default wall clock time limit varies between systems. Please consult the wiki page for each system for details. If you have a different time limit it will be enforced automatically (you do not have to do anything).
There are enough free nodes - why isn't my job running? / Why are my jobs waiting in the queue for a long time?
Jobs are automatically prioritized based on a number of parameters such as size, project usage, project classification, and once in queue, age or time in queue. The queue is not a simple FIFO queue. Jobs may be inserted into the middle of the queue based on initial priority and even move towards the end of the queue if a project's usage increases significantly while the job is pending. A job will begin once it reached the head of the queue.
When there is a large job in queue that requires many nodes, the scheduler will hold nodes free when jobs complete in anticipation of the large job. Smaller jobs, shorter jobs will fill in nodes (backfill) if they have priority and do not interfere with the large job beginning at the expected time. The scheduler will also hold nodes free in anticipation of a maintenance outage.
On certain systems, such as the Blue Gene/Q, where nodes must be allocated in blocks, some partitioning has been done at the system level to reduce fragmentation and improve overall system throughput. It is important to check the number of nodes in the partition before assuming there is a problem with the scheduler.
It is also possible that a special reservation is necessary for system diagnostics. This type of allocation will result in nodes being listed as idle but they will be unavailable to users and they will not have jobs scheduled on them.
Why does Slurm give me this error
error: Unable to allocate resources: Invalid account or account/partition combination specified
Your account is not authorized to submit to this partition. There may be another partition that your account is authorized to use or your account may not have authorization at all for that particular cluster/system.
Why isn't library or tool foo installed on bar?
The systems have a base set of common libraries and tools installed for their native architecture and operating system. A reasonable effort is made to provide libraries and tools required by most users, but there will always be some library or tool that someone needs that we do not provide. In these cases we can render advice on how to obtain or set up the given package but we will not install it for you, globally or locally. This applies to both free/open source and commercial software.