ITC Linux Clusters
Getting Started on the ITC Linux Clusters
This tutorial is designed for researchers who are new to the Linux
clusters. It covers basic information about the cluster, as well as how to
create and submit batch jobs using the PBS resource management software.
It also contains sample job command files that can be used as templates for
running jobs under PBS. Throughout this tutorial, we shall use the
name aspen or lc0 where necessary to refer to the
cluster frontend, and mst3k to represent the user's login ID.
The user should substitute the appropriate names for the cluster in use
and his or her own login.
Basic information about the current ITC clusters, including their names, can be found at their homepage.
- Cluster Overview
- Logging on to the Cluster
- Configuring your Account
- Using Modules to Load Software
- Compilers
- Compiling Parallel Programs
- Portable Batch Systems (PBS)
- PBS Job Command Files
- Submitting a Job
- Displaying Job Status
- Canceling a Job
- Debugging
- Large Scratch/Output Files
- Sample PBS Command Scripts
- File Transfer to and from the Cluster
The Linux Clusters at UVa
The Linux clusters use ROCKS as their operating system and the Portable Batch System (PBS) software to distribute the computational workload across the nodes. PBS is a batch job scheduling application that provides the facility for building, submitting and processing batch jobs on the cluster.
Jobs are submitted to the cluster by creating a PBS job command file that specifies certain attributes of the job, such as how long the job is expected to run and how many nodes of the cluster are needed (e.g. for parallel programs). PBS then schedules when the job is to start running on the cluster (based in part on those attributes), runs and monitors the job at the scheduled time, and returns any output to the user once the job completes.
Logging on to the Cluster
Logins to the Linux cluster can be done through the frontend, represented in this document by aspen.itc.virginia.edu, via slogin or ssh. For Windows users, we recommend SecureCRT, available from Software Central, as the ssh client. Logging in places you on the head node of the cluster, which acts as the control console for any interactive work such as source code editing, compilation, and submitting jobs through PBS. Using applications such as Matlab or Mathematica interactively should not be done on the frontend, but rather on other machines. When you log on to the cluster you should be in your ITC Home Directory.
Important notice for Windows users: do not use a standard Windows editor such as Notepad to edit files that will be used on the Linux or other UNIX systems. The two systems use different sequences of control characters to mark the end of line (EOL). If you are using the clusters from a Windows system, there are a number of options:
- Log on to the cluster via SecureCRT or equivalent and use one of the many command-oriented text editors available, such as vi, emacs, or nano. This is the preferred option. If you will be doing a lot of editing, you should make the effort to learn vi or emacs; for occasional use nano is sufficient. On the clusters, the vi editor is actually vim (Vi Improved), information about which can be found at www.vim.org. Emacs documentation is at here. An introduction to nano is at www.nano-editor.org.
- Use NoteTab Light, available for download from Software Central. Be sure to export the file to UNIX format. Then you can copy the file to the cluster via SecureFX (see File Transfer to and from the Cluster).
- If you have an X server for Windows, such as Xming or Cygwin/Xorg, you can use the point-and-click editors gedit or nedit on the clusters.
- If you must use a Windows-only editor, transfer the file and then apply dos2unix on the cluster to convert it to the correct UNIX format.Type man dos2unix for more information about this command.
More information about accessing Unix systems from Windows can be found at Windows tools for Unix. .
Configuring Your Account
Use of the Linux clusters assumes familiarity with the UNIX/Linux software environment. In order to use PBS for batch job submission, it may be necessary to configure some of your UNIX account startup files. General information about the UNIX operating system can be found at our introductory page.
When a job is submited to the cluster through PBS a new login to your account is initiated, and any initialization commands in your startup files (.profile, .variables.ksh, .kshrc etc) are executed. In this case (running in batch mode) it is necessary to disable the interactive commands such as setting tset and stty. If these precautions are not taken then error messages will be written to the batch jobs error file and your program may not run.
The recommended procedure to disable the interactive sections of the startup files is to test the environment variable PBS_ENVIRONMENT, which is set when PBS runs. If the variable has been set, meaning a PBS job has initiated the login, the interactive parts of the startup files are skipped. All new users' .profile should already contain this section, but any long-term users of older ITC systems should check their .profiles.
# The following command exports variables set here to your user shell.
set -a
# This command runs your ".variables.ksh" file.
. ${HOME}/.variables.ksh
# Exclude interactive commands if PBS_ENVIRONMENT is set
if [ -z "$PBS_ENVIRONMENT" ] ; then
# Make /home/mst3k initial Linux command prompt directory path
cd /home/$USER
# Interactive lines such control key and terminal settings go here
# Close exclusion of interactive section (SP and PBS batch job requirement)
fi
Similar changes must be made to .login by tcsh users. These should already be in place for all new users. However, if at any time you modify your .profile or .login, you should make sure any stty commands are done inside the PBS exclusion test in the .profile or .login.
Note: if you have trouble using the man command,
in your .variables.ksh file replace the line
PAGER=/usr/bin/more
with
PAGER=more
This should work on all systems since more is normally in the
path automatically.
Note that csh (tcsh) users may get the warning "Warning: no access to tty, thus no job control in this shell" as part of their PBS job output. This is documented in the PBSPro User's Guide and should not affect the job itself.
To allow access to the PBS commands and manual pages, the appropriate paths have been added to the system PATH and MANPATH environment variables. Users should make sure they are including the system PATH and MANPATH variables as part of their account PATH and MANPATH variables (e.g. in .variables.ksh, PATH=${HOME}/bin:${PATH}:/home/loadl/bin:.).
Users may need to modify their PAGER variable (typically in the .variables.ksh file) to be /bin/more so that the man command will work correctly on the cluster.
Using Modules to Load Software
The clusters use modules to manage the setting of paths and
other environment variables for particular software packages, such as
the compilers and the MPI environment. In particular, the systems offers more
than one compiler, as well as MPI environments. At least one module
must be loaded in order to use a compiler or its libraries; for example:
module load pgi
loads the current version of the PGI compiler suite, while a command such as
module load mcc/7.2
loads a specific version (7.2) of the Matlab mcc compiler.
The modules command has a number of options, some of which are similar.
For example, module add is synonymous with module load.
A full listing of the available modules can be obtained by typing
module which
Executing module which on the frontend at a particular time yields
cimsl/7.0 : loads the C IMSL scientific library imsl/6.0 : loads the IMSL scientific library intel/10.1 : loads the Intel Compiler Environment intel/11.0 : loads the Intel Compiler Environment intel32/10.1 : loads the Intel Compiler Environment intel32/11.0 : loads the Intel Compiler Environment pgi/8.0 : loads the PGI Compiler Environment pgi32/8.0 : loads the PGI Compiler Environment R/2.9.0 : loads the R statistics package blender/2.48 : loads the R statistics package hdf4-gnu/4.2 : Sets the environment for the HDF scientific library hdf4-intel/4.2 : Sets the environment for the HDF scientific library hdf4-pgi/4.2 : Sets the environment for the HDF scientific library hdf5-gnu/1.8 : Sets the environment for the HDF5 scientific library hdf5-intel/1.8 : Sets the environment for the HDF5 scientific library hdf5-pgi/1.8 : Sets the environment for the HDF5 scientific library mcc/7.6 : Sets the environment for the Matlab Component Runtime Libraries mpich2-eth-gnu/1.0 : loads the mpich environment for Gnu over Ethernet mpich2-eth-intel/1.0 : loads the mpich environment for Intel over Ethernet mpich2-eth-intel10/1.0: loads the mpich environment for Intel over Ethernet mpich2-eth-pgi/1.0 : loads the mpich environment for PGI over Ethernet ncarg-gnu/5.0 : loads the NCAR Graphics environment for Intel ncarg-intel/5.0 : loads the NCAR Graphics environment for Intel ncarg-pgi/5.0 : loads the NCAR Graphics environment for Intel netcdf-intel/3.6 : Sets the environment for the NetCDF scientific library netcdf-pgi/3.6 : Sets the environment for the NetCDF scientific library ompi-gnu/1.3 : loads the OpenMPI environment for Gnu over Ethernet ompi-intel/1.3 : loads the OpenMPI environment for Intel over Ethernet ompi-pgi/1.3 : loads the OpenMPI environment for PGI over Ethernet scalapack-eth-gnu/1.8: Sets the environment for the Scalapack scientific library scalapack-eth-intel/1.8: Sets the environment for the Scalapack scientific library scalapack-eth-pgi/1.8: Sets the environment for the Scalapack scientific library
Compilers
Programs for which the user has written the source code must first be compiled on the frontend to run on the cluster. Currently, three sets of compilers are supported on ITC clusters: the Portland Group (PGI), the Intel, and the Gnu compilers. All offer C, C++, and Fortran 95 compilers.
Gnu Compiler Collection
The Gnu compilers are available on Linux platforms at the University. The Gnu compilers available on the clusters are:
gcc [options] file.c (C)
g++ [options] file.cpp file.cxx (C++)
gfortran [options] file.f (Fortran 77, fixed-form F95)
gfortran [options] file.f90 (Fortran 90/95)
For a complete list of options consult the relevent compiler man page, e.g.
man gcc from your account on the cluster frontend.
Gnu has documentation for gcc at their
GCCpage.
See also our local documentation for PGI.
PGI
The Portland Group (PGI) Compilers are licensed by ITC to run on Linux platforms at the University. The PGI compilers available on the clusters are:
pgcc [options] file.c (C)
pgCC [options] file.cpp file.cxx (C++)
pgf77 [options] file.f (Fortran 77)
pgf90 [options] file.f90 (Fortran 90/95)
For a complete list of options consult the relevent compiler man page, e.g.
man pgf77 from your account on the clusters. More detailed
information about the PGI compilers can be found in the documentation on the
Web page,
www.pgroup.com/doc/index.htm
See also our local documentation for PGI.
Intel
The Intel compilers are licensed by ITC to run on Linux platforms at the University. The Intel compilers available on the clusters are:
icc [options] file.c (C)
icpc [options] file.cpp file.cxx (C++)
ifort [options] file.f (Fortran 77, fixed-form F95)
ifort [options] file.f90 (Fortran 90/95)
For a complete list of options consult the relevent compiler man page, e.g. man ifort on the frontend. More detailed information about the Intel compilers can be found in the online documentation for Fortran and C/C++. See also our local documentation for some important information about using Intel compilers on our system.
Compiling Parallel Programs
The MPI (Message Passing Interface) is available for parallel programs. We recommend the use of the MPICH2 implementation. A module corresponding to the compiler you wish to use must be loaded in order to set up the correct environment.
To use the Intel compiler with MPICH2 over Ethernet, load the module
module load mpich2-eth-inteland similarly for mpich2-eth-pgi and mpich2-eth-gnu.
For Infiniband on purchased nodes, use MVAPICH2.
module load mvapich2-intelor substitute pgi or gnu in place of intel for the PGI and Gnu compilers, respectively.
Once the module is loaded the following commands should be used to compile programs that use MPI code:
mpicc [options] file.c (C)
mpiCC [options] file.C (C++)
mpif77 [options] file.f (Fortran 77)
mpif90 [options] file.f (Fortran 90)
Information on using the MPI libraries is available at the UVACSE MPI Web page
Once you have an executable version of a program you want to run, whether it's source code you've compiled yourself or a third party software package such as Matlab or Mathematica, you must use the PBS resource management software to run the code on the cluster.
Portable Batch System (PBS)
The PBS resource management system handles the management and monitoring of the computational workload on the cluster. Users submit "jobs" to the resource management system where they are queued up until the system is ready to run them. PBS selects which jobs to run, when, and where, according to a predetermined site policy meant to balance competing user needs and to maximize efficient use of the cluster resources.
To use PBS, you create a batch job command file which you submit to the PBS server to run on the cluster. A batch job file is simply a shell script containing the set of commands you want run on some set of cluster compute nodes. It also contains directives that specify the characteristics (attributes) and resource requirements (e.g. number of compute nodes, maximum runtime, etc.) that your job is requesting. Once you create your PBS job file, you can reuse it if you wish or modify it for subsequent runs.
It is important for users to understand how PBS works. When the job is submitted, the server examines the job script, extracts only the information in the directives, and passes along the resource requests to the scheduler. When the job is initiated ("rolled in"), the PBS system starts a remote shell process that switches to the submitter's user id and logs in to the compute node assigned by the PBS scheduler. As usual for UNIX, the shell starts in the user's home directory. The job script is then executed as a normal shell script. The directives are ignored by the shell on the node because they must begin with a character that denotes a comment to the shell. When the job starts, PBS does assign some environment variables, that the user can make use of in the job script, but PBS makes no assumptions at all about what the user wants to do; the node must be given explicit instructions in the form of a shell script.
PBS also provides a special kind of batch job called interactive-batch. An interactive-batch job is treated just like a regular batch job, in that it is placed into the queue and must wait for resources to become available before it can run. Once it is started, however, the user's terminal input and output are connected to the job in what appears to be an rlogin session to one of the compute nodes. Many users find this useful for debugging their applications or for computational steering.
PBS provides two user interfaces for batch job submission: a command line interface (CLI) and a graphical user interface (GUI). The CLI lets you type commands at the system prompt. The GUI is a graphical point-and-click interface, but it requires the presence of an X server on your local system. Both interfaces provide the same functionality and you can use either one to interact with PBS.
The PBS graphical interface is invoked with the command
xpbs. A screen shot of xpbs is
here.
The xpbs interface is composed of three windows: the first is the
"Hosts Panel" and displays the the hostnames of the machines running PBS
servers to which jobs can be submitted. In the case of our generic cluster, the
PBS server is running on the front-end login host named lc4.
The second window is the "Queues Panel" and displays
information about the queues managed by the server host selected in the
"Hosts Panel". It shows the single queue "workq" on the cluster. The
third window is the "Jobs Panel" and displays information about jobs that are
found in the queue(s) selected from the Queues listbox.
Further information about how to configure and use the xpbs interface can be found in Chapter 5 of the PBS Pro User Guide. The remainder of this tutorial will focus on the PBS command line interface. More detailed information bout using PBS can be found in the PBS Pro User Guide.
PBS Job Command Files
To submit a job to run on the cluster, a PBS job command file must be created. The job command file is a shell script that contains PBS directives; these directives are preceded by #PBS. The following is an example of a PBS command file to run a serial job, which would only require 1 processor on one node. In this example, the executable to be run is named serial_executable.
#!/bin/sh
#PBS -l nodes=1:ppn=1
#PBS -l walltime=12:00:00
#PBS -o output_filename
#PBS -j oe
#PBS -m bea
#PBS -M mst3k@virginia.edu
cd $PBS_O_WORKDIR
./serial_executable
The first line identifies this file as a shell script. The next several lines are PBS directives that must precede any commands to be executed by the shell (e.g. the last two lines). The PBS directives illustrated are explained in the table below:
PBS Directive Function
#PBS -l nodes=1:ppn=1 Specifies a PBS resource requirement of
1 compute node and 1 processor per node.
#PBS -l walltime=12:00:00 Specifies a PBS resource requirement of
12 hours of wall clock time to run the job.
#PBS -o output_filename Specifies the name of the file where job
output is to be saved. May be omitted to
generate filename appended with jobid number.
#PBS -j oe Specifies that job output and error messages
are to be joined in one file.
#PBS -m bea Specifies that PBS send email notification
when the job begins (b), ends (e), or
aborts (a).
#PBS -M mst3k@virginia.edu Specifies the email address where PBS
notification is to be sent.
#PBS -V Specifies that all environment variables
are to be exported to the batch job.
It is not necessary to use the -j (join) directive; sometimes it is helpful to keep the output and error files separate. If -o or -e directives are not specified, PBS will assign a name to each consisting of the name of the script concatenated with .o<jobid> for output and .e<jobid> for error. This makes it possible for several runs to write to their standard output and standard error files without overwriting one another's results.
The following is an example of a PBS email notification to the user at the end of the job:
Date: Mon, 21 Oct 2002 23:06:47 -0400
From: adm <adm@lc0.itc.virginia.edu>
To: mst3k@virginia.edu
Subject: PBS JOB 1187.lc0
PBS Job Id: 1187.lc0
Job Name: script.sh
Execution terminated
Exit_status=0
resources_used.cpupercent=88
resources_used.cput=00:00:52
resources_used.mem=64248kb
resources_used.ncpus=1
resources_used.vmem=81036kb
resources_used.walltime=01:02:14
Note that the walltime-used information in the email should be used to accurately estimate the walltime resource requirement in the PBS job command file for future job submissions so that PBS can more effectively schedule the job. When submitting a particular PBS job for the first time, the walltime requirement should be overestimated to prevent premature job termination. The walltime measurement corresponds closely to the job cpu time since each job is allocated its own processor for execution.
In this example, after the PBS directives in the command file, the shell executes a change directory command to $PBS_O_WORKDIR, a PBS environment variable that resolves to the directory from which the PBS job was submitted. Normally this will also be where the progam executable is located. Other shell commands can be executed as well. In the last line, the executable itself itself is invoked. The ./ prefix is necessary because under Linux, the user's current working directory is not in the default path.
Job Scripts for Parallel Programs
If the executable is a parallel program using the the Message Passing Interface (MPI), then it will require multiple processors of the cluster to run and this is specified in the PBS nodes resource requirement. The script 'mpiexec' is used to invoke the parallel executable. The following is an example of a PBS job command file to run a parallel (MPI) job using MPICH2:
#!/bin/sh
#PBS -l nodes=4:ppn=2
#PBS -l walltime=12:00:00
#PBS -m abe
#PBS -M mst3k@virginia.edu
cd $PBS_O_WORKDIR
mpiexec -comm mpich2-pmi myexec
In this case the PBS nodes resource requirement specifies 2 processor per node on 4 nodes for a total of 8 processors. This number of processors is automatically used by mpiexec, by default.
Parallel jobs should usually specify a nodes requirement of 2 processors per node to efficiently partition the compute nodes for these jobs.
The PBS job command file can be given any name, although it is usually appended with a .sh extension to indicate that it is a shell script. Other common choices include .pbs or .sub. The link pbs_script.sh is an example PBS job script that runs the High Performance Linpack benchmark across 4 nodes using the input file HPL.dat. You can download these to your cluster account and use them to test PBS job submission described below. Remember to change the userid placeholder in the PBS email directive to your own.
Submitting a Job
The PBS qsub command is used to submit job command files for scheduling and execution. For example, to submit your job with a PBS command file called "pbs_script.sh", the syntax would be
lc0: /home/mst3k $ qsub pbs_script.sh
1354.lc0
lc0: /home/mst3k $
Notice that upon successful submission of a job, PBS returns a job identifier of the form <jobid>.lc0, where jobid is an integer number assigned by PBS to that job. You'll need the job identifier for any actions involving the job, such as checking job status, deleting the job, or specifying job dependencies as described below.
There are many options to the qsub command as can be seen by typing man qsub at the Linux command prompt or looking at the PBS Pro User Guide. Three of the more useful ones are the -W option for allowing specification of additional job attributes, the -I option, which declares that the job is to be run "interactively", and the -l option, which allows resource requirements to be listed as part of the qsub command. These are discussed below.
Specifying Job Dependencies
The -W option allows for the specification of additional job attributes. In particular, the "-W depend=dependency_list" option to qsub defines the dependency between multiple jobs, which is useful if the jobs need to execute in a certain order. For example, if pbs_script2.sh should not start executing until pbs_script1.sh successfully completes because it needs a file that pbs_script1.sh creates, then these two jobs should be submitted to PBS in the following manner:
lc0: /home/mst3k $ qsub pbs_script1.sh
543.lc0
lc0: /home/mst3k $ qsub -W depend=afterok:543 pbs_script2.sh
544.lc0
After pbs_script1.sh is submitted, PBS returns the job identifier number which is then used as part of the dependence argument list when pbs_script2.sh is submitted. The "afterok" argument in the dependency list indicates that the job identified as 543 must complete successfully before pbs_script2.sh will start.
Other options for arguments of the dependency list are detailed in the PBS Pro User Guide as well as the online manual page for qsub by typing man qsub at the Linux command prompt.
Submitting an Interactive Job
The -I option of qsub declares that a job has to be run "interactively". The job will be queued and scheduled as any PBS batch job, but when executed, the standard input, output, and error streams of the job are connected through qsub to the terminal session in which qsub is running. Interactive jobs with PBS should be used only for the purposes of testing/debugging the user's code, e.g. in cases using the PGI or TotalView debuggers.
Once the PBS intereactive job is executed, the terminal session will be logged into one of the compute nodes allocated by PBS. The executable can then be invoked manually from the Linux command prompt.
As will be discussed in the next section, the PBS scheduler is configured to favor jobs with shorter walltime and smaller node resource requirements. To insure that a PBS interactive job is executed quickly, these reduced resource requirements can be listed as arguments of qsub with the -l option.
The following is an example of running the High Performance Linpack Benchmark as an interactive PBS job using 4 nodes and requesting 10 minutes of walltime. Note that the terminal session is actually logged into node compute-0-4.
lc0: /home/mst3k $ qsub -I -l nodes=2:ppn=2 -l walltime=00:10:00
qsub: waiting for job 1352.lc0 to start
qsub: job 1352.lc0 ready
localstorage is in /jobtmp/pbstmp.1352.lc0
compute-0-4: /home/mst3k $ mpiexec -comm mpich-p4 \
/opt/hpl-eth/bin/xhpl
============================================================================
HPLinpack 1.0 -- High-Performance Linpack benchmark -- September 27, 2000
Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs., UTK
============================================================================
[further output not shown]
compute-0-4: /home/mst3k $ exit
lc0: /home/mst3k $
An interactive PBS job submission should require no more than 4 processors (2 nodes, 2 processors each) for testing/debugging purposes. In addition, an interactive PBS job will not terminate until the user exits the terminal session. The allocated nodes will remain reserved as long as the terminal session is open, up to the walltime limit, so it is extremely important that users exit their interactive sessions as soon as their debugging is done so that their nodes are returned to the available pool of processors.
The Test Queue
If interactive use is not required but short test jobs need to be run, the the Cedar and Dogwood clusters have a test queues of four nodes. Jobs can be submitted to the testq with the command qsub -q testq myscript. The test queue has a limit of 4 cpus and 1 hour of wallclock time per job, with a maximum of 3 simultaneous jobs per user. The default is 30 minutes of wall time.
Job Submission Policies
Users of any cluster may submit as many jobs to PBS as they like, but not all can be run at the same time. Each queue imposes a maximum number of simultaneous jobs that a single user may run simultaneously. Furthermore, the PBS scheduler will dynamically determine a user's priority based on the the number of jobs of other users and the number of available nodes, in order to maximize cluster usage in an equitable fashion. Any jobs that would cause a user to exceed the allowed upper limit on resources (such as cpus) will wait in the queue until a slot opens after one or more of the user's other jobs finishes.
If no queue is specified, the job is submitted to the default small-memory queue. The scheduler will first sort jobs by giving jobs requiring shorter walltime and smaller node resource requirements higher run priorities. The scheduler further modifies these priorities based on a fair-share algorithm which tries to guarantee that on average, all users will get an equal amount of computing time. Finally, jobs that have been waiting for more than 24 hours to run will be considered "starving" and given higher priority.
PBS is currently configured to limit the maximum amount of walltime a single job can use. When that time limit is reached, the job will be terminated whether it has completed or not. This insures that no one job can monopolize cluster compute nodes indefinitely and underscores the need for users to implement some type of save-restart mechanism in their codes so they can restart the job close to where it was stopped and not lose all the work done up to that point.
PBS also imposes a limit on the number of processors users can require. The maximum number of processors varies by cluster; please see each cluster's home page for its limits. Even when the number requested is under the limit, the full complement of cpus must become available for such a job to start; in practice this is unlikely to occur.
The PBS configuration and scheduling policies used on the cluster will be periodically reviewed and modified as needed to insure efficient and equitable use of this high performance computing resource. For details, please see our Queueing Policies document.
Researchers with extraordinary needs for the cluster, either in terms of extended compute time or number of nodes, should contact UVACSE at uvacse@virginia.edu to discuss making special arrangements to meet those needs.
Displaying Job Status
The qstat -a command is used to obtain status information about jobs submitted to PBS.
lc0: /home/mst3k $ qstat -a Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 1363.lc0 mst3k workq job16x2 19094 16 32 -- 00:20 R 00:02 1364.lc0 teh1m workq job12x2 7149 12 24 -- 00:16 R 00:01 1365.lc0 teh1m workq job8x2 4166 8 16 -- 00:12 R 00:00 1366.lc0 mst3k workq job20x2 -- 20 40 -- 00:28 Q -- 1368.lc0 uconsult workq STDIN 30942 2 4 -- 00:10 R 00:02 lc0: /home/mst3k $
The first five fields of the display are self-explanatory. Note that job ID 1368 has a jobname of STDIN which is short for standard input, indicating that its an interactive job. The sixth and seventh fields titled NDS and TSK in the above display indicate the total number of nodes and processors respectively required by each job. The ninth field indicates the required walltime (hrs:min.) and the last field shows the elapsed runtime. The tenth filed titled S indicates the state of the job. The job state can have the following values:
State Definition E Job is exiting after having run H Job is held Q Job is queued, eligible to run or be routed R Job is Running T Job is in transition (being moved to a new location) W Job is waiting for its requested execution time to be reached S Job is suspended
The following example shows how to use the qstat -f command to get detailed information on a specific job using its job identification number.
lc0: /home/mst3k $ qstat -f 1363 Job Id: 1363.lc0 Job_Name = job16x2 Job_Owner = mst3k@lc0 resources_used.cpupercent = 82 resources_used.cput = 00:01:59 resources_used.mem = 83384kb resources_used.ncpus = 32 resources_used.vmem = 124920kb resources_used.walltime = 00:02:33 job_state = R queue = workq server = lc0 Checkpoint = u ctime = Fri Oct 25 03:00:41 2002 Error_Path = lc0:/h1/u/uc/mst3k/linux_cluster/job16x2.e1363 exec_host = compute-1-0/0+compute-0-15/0+compute-0-14/0+compute-0-13/0+comp ute-0-12/0+compute-0-11/0+compute-0-10/0+compute-0-9/0+compute-0-8/0+co mpute-0-7/0+compute-0-6/0+compute-0-5/0+compute-0-4/0+compute-0-3/0+com pute-0-2/0+compute-0-1/0+compute-1-0/1+compute-0-15/1+compute-0-14/1+co mpute-0-13/1+compute-0-12/1+compute-0-11/1+compute-0-10/1+compute-0-9/1 +compute-0-8/1+compute-0-7/1+compute-0-6/1+compute-0-5/1+compute-0-4/1+ compute-0-3/1+compute-0-2/1+compute-0-1/1 Hold_Types = n Join_Path = oe Keep_Files = n Mail_Points = e mtime = Fri Oct 25 03:00:42 2002 Output_Path = lc0:/h1/u/uc/uconsult/linux_cluster/16x2 Priority = 0 qtime = Fri Oct 25 03:00:41 2002 Rerunable = True Resource_List.ncpus = 32 Resource_List.neednodes = 16:ppn=2 Resource_List.nodect = 16 Resource_List.nodes = 16:ppn=2 Resource_List.walltime = 20:00:00 session_id = 19094 Variable_List = PBS_O_HOME=/home/mst3k,PBS_O_LANG=en_US, PBS_O_LOGNAME=mst3k, PBS_O_PATH=/home/mst3k/bin:/usr/pbs/bin:/usr/share/mpi/bin:/uva/bin :/usr/pgi/linux86/bin:/bin:/usr/bin:/usr/local/bin:/usr/bin/X11:/usr/X1 1R6/bin:.,PBS_O_MAIL=/var/spool/mail/mst3k,PBS_O_SHELL=/bin/ksh, PBS_O_HOST=lc0,PBS_O_WORKDIR=/h1/u/uc/mst3k/linux_cluster, PBS_O_SYSTEM=Linux,PBS_O_QUEUE=workq comment = Job run at started on Fri Oct 25 at 03:00 etime = Fri Oct 25 03:00:41 2002
For further information about the qstat command, type man qstat on the cluster front-end machine aspen.itc or see the PBS Pro User Guide.
Canceling a Job
PBS provides the qdel command for deleting jobs from the system using the job identification number, as shown below.
lc0: /home/mst3k/linux_cluster $ qstat -a Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 1361.lc0 msk3k workq job16x2 18136 16 32 -- 48:00 R 00:01
lc0: /home/mst3k/linux_cluster $ qdel 1361
lc0: /home/mst3k/linux_cluster $ qstat -a Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 1361.lc0 mst3k workq job16x2 18136 16 32 -- 48:00 E 00:01
For further information about the qdel command, type man qdel on the cluster front-end machine lc0.itc or see the PBS Pro User Guide.
Debugging
Debugging is an inevitable part of working with computational tools. Several debuggers are available for compiled languages. The most basic debugger is the Gnu debugger gdb. Gdb is a command-line oriented debugger. The commercial compilers provide their own debuggers as well; idb(c) for Intel and pgdbg for PGI. In their default modes, idb and pgdbg are graphical debuggers and require that the user have an X server running on his or her local machine. See the compiler documentation for more details about these debuggers.
A very powerful graphical debugger is Totalview. ITC provides a large number of licenses for this debugger. It can handle code generated by any of the three compilers. It is generally the best choice for debugging MPI programs. TotalviewTech has recently released a VPN-based client that permits users to run the debugger without an X server installed locally. (If one already has an X server, it can be run on the frontend for short debugging jobs through X.) The client can also submit a job to PBS in order to use nodes for debugging; this is essential for debugging MPI programs since running too many processes on the frontend puts too great a load on it. Totalview's vendor has extensive documentation at their Web site. The UVACSE local documentation discusses local usage and explains how to use the client in our environment.
Large Scratch/Output Files
Each Linux cluster has a storage node with 1 TB of temporary scratch storage. This storage is accessible from the frontend and the nodes as /bigtmp. Once the PBS job has completed, all output files from the master compute node will be automatically copied to /bigtmp/$PBS_JOBID on the frontend node. The variable $PBS_JOBID is assigned when the job begins and contains the ID number, so users should make a note of all their job ID numbers. Files older than 72 hours are removed from /bigtmp, so users should download their output file to their own longer-term storage as soon as possible after the run completes.
File tranfer to and from the cluster frontend should be
done using a secure method such as scp or rsync. The
following are examples of transferring files from /bigtmp on the cluster
front-end node aspen.itc to a remote UNIX host, initiating the transfer either
from aspen.itc or from the remote host. These examples use
the ksh line continuation character \ immediately followed
by a newline.
Tranfer from lc0.itc (local source and remote distination):
/uva/bin/scp /bigtmp/pbstmp.jobid.lc0/* \ mst3k@remote_host.virginia.edu:/home/mst3k/pbs_output/.
or
/uva/bin/rsync -e ssh -a /bigtmp/pbstmp.jobid.lc0/. \ mst3k@remote_host.virginia.edu:/home/mst3k/pbs_output/.
Tranfer to remote_host (remote source and local distination):
/uva/bin/scp2 mst3k@lc0.itc.virginia.edu:/bigtmp/pbstmp.jobid.lc0/* \ /home/mst3k/pbs_output/. /uva/bin/rsync -e ssh -a \ mst3k@lc0.itc.virginia.edu:/bigtmp/pbstmp.jobid.lc0/. \ /home/mst3k/pbs_output/.
Windows users can use an scp client such as SecureFX.
Sample PBS Command Scripts
In this section are a number of sample PBS command files for different types of jobs.
PBS Script Using Scratch (Bigtmp) Storage
#!/bin/sh #PBS -l nodes=2:ppn=2 #PBS -l walltime=00:02:00 #PBS -j oe #PBS -m ea #PBS -M mst3k@virginia.edu # Define variable for local storage on compute nodes associated with the job LS="/jobtmp/pbstmp.$PBS_JOBID" # Copy executable (e.g. xhpl) and data files (e.g. HPL.dat) from your # home directory to local storage on the master compute node cd $LS /bin/cp $HOME/xhpl . /bin/cp $HOME/HPL.dat . tmpsync -scatter # Run parallel program over Ethernet using MPICH2 mpiexec -comm mpich2-pmi ./xhpl > xhpl_outNote: in this script, there should be no spaces around the equals sign in the line LS="/jobtmp/pbstmp.$PBS_JOBID".
PBS Select Syntax, Request No Preemption on Elder
PBS has a newer syntax that allows options to be specified as part of
a resource request. This is the
This is a PBS job command file to run a Matlab batch job. The Matlab
program commands are in the file matlab_script.m (note the .m
extension is not included in the command syntax) and the output
of the program will go to the file matlab_output1 at the end of the
job and to matlab_output2 while the job is running.
This is a PBS job command file to run a Mathematica batch job. The Mathematica
program commands are in the file math_script and the output
of the program will go to the file math_output. These file names
are arbitrary and other names could be used.
It is sometimes useful to make the first line in the math_script
file the command If you have Mathematica commands stored in a notebook that you
would like to transfer to your math_script file, you can use one
of Mathematica's front end features to help you.
A dialog box will appear prompting you to give the file a name and location. You can
use this Package Format file as the input file for your Mathematica batch job.
This is a PBS job command file to run an Ansys batch job. The Ansys
program input is in the file ansys.in and the output
of the program will go to the file ansys.out. Output from PBS is saved
in the file ansys.msg.
This is a PBS job command file to run a Gaussian 03 batch job. The Gaussian 03
program input is in the file gaussian.in and the output
of the program will go to the file gaussian.out. Output from PBS is saved
in the file gaussian.msg.
This is a PBS job command file to run a SAS batch job. The SAS program
commands are in the file myfile.sas and the output of the program
will go to the file myfile.out. The log file will be
myfile.log.
This is a PBS job command file to run a serial job that is compiled with the IMSL libraries.
This is a PBS job command file to run a serial R batch job.
This is a PBS job command file to run a parallel Rmpi batch job.
The R program commands are in the file taskpush.R
and the output of the program will go to the file taskpush.Rout.
Note: in order to use this script you will need to have MPICH2 set up to
run an MPD. In most cases, all that is necessary is to create a
.mpd.conf file in your top-level home directory and set the
correct permissions on it. The MPICH2 distribution recommends
During a run using local storage on the compute nodes, working files
are stored on the disks on the node or nodes. At the end of the run, those
files are transferred to the appropriate directory in /bigtmp. If it is
necessary to examine those files before completion of the run, the owner of
the job can use the commands
getfiles and getstdout.
Getfiles copies all files in /jobtmp/pbstmp.$PBS_JOBID to
/bigtmp/pbstmp.$PBS_JOBID. Similarly, getstdout fetches the standard out
if it has not been renamed by the user, i.e. it uses the PBS naming
convention of <jobname>.o<jobid>. It also returns the file to
/bigtmp/pbstmp.$PBS_JOBID.
Disk space on the home directory is limited, and space on /bigtmp
is temporary. Once your jobs have run, you will need to transfer your
files to your local system for permanent storage.
File tranfer to and from the cluster should be effected using a secure method
such as scp or rsync.
If you are transferring
to and from a UNIX system (this includes Linux), the
following are examples of transferring files from a directory
mydirectory on the cluster
front-end node aspen.itc to a remote host, initiating the transfer either
from aspen.itc or from the remote host. These examples use
the ksh line continuation character \ immediately followed
by a newline.
Note: mst3k@ may be omitted if the user's id is the same on both
systems. The colon after the hostname is essential, however. Also, if you
are using Linux on your local workstation and are running OpenSSH rather than
UVa's commercial SSH you should use sftp to transfer from the workstation
to the clusters (scp will work in the opposite direction); sftp takes exactly
the same form and commands as insecure ftp.
Tranfer to remote_host (remote source and local destination):
Mac OSX with Darwin includes scp and rsync, so these commands can be run
inside the terminal application exactly as in the UNIX examples above.
From a Windows system, use SecureFX,
a commercial product available to students, faculty, and staff.
The cluster runs ssh2; it does not run an ftp daemon, so sftp
is the correct protocol for file transfers to the cluster frontend.
#!/bin/sh
#PBS -l select=2:mpiprocs=2:mem=30GB:nopreempt=true
#PBS -l walltime=00:02:00
#PBS -m bea
#PBS -M mst3k@virginia.edu
# Run parallel program over Ethernet using MPICH2
mpiexec -comm mpich2-pmi ./myprog > myresults
Matlab
#!/bin/sh
#PBS -l nodes=1:ppn=1
#PBS -l walltime=00:02:00
#PBS -o matlab_output1
#PBS -j oe
#PBS -m ea
#PBS -M mst3k@virginia.edu
cd $PBS_O_WORKDIR
matlab -nodesktop -nodisplay -r "matlab_script;exit" -logfile matlab_output2
Mathematica
#!/bin/sh
#PBS -l nodes=1:ppn=1
#PBS -l walltime=00:02:00
#PBS -j oe
#PBS -m ea
#PBS -M mst3k@virginia.edu
cd $PBS_O_WORKDIR
math < math_script > math_output
AppendTo[$Echo, "stdout"] so that the Mathematica input lines will
also be included in the output file.
Ansys
#!/bin/sh
#PBS -l nodes=1:ppn=1
#PBS -l walltime=160:00:00
#PBS -o ansys.msg
#PBS -j oe
# Copy Ansys input file to compute node scratch space
LS="/jobtmp/pbstmp.$PBS_JOBID"
cd $LS
/bin/cp /home/mst3k/ansys/ansys.in .
ansys < $LS/ansys.in > $LS/ansys.out
Gaussian 03
#!/bin/sh
#PBS -l nodes=1:ppn=1
#PBS -l walltime=160:00:00
#PBS -o gaussian.msg
#PBS -j oe
# Copy Gaussian input file to compute node scratch space
LS="/jobtmp/pbstmp.$PBS_JOBID"
cd $LS
/bin/cp /home/mst3k/gaussian/gaussian.in .
# Define Gaussian scratch directory as compute node scratch space
export GAUSS_SCRDIR=$LS
g03 < $LS/gaussian.in > $LS/gaussian.out
SAS
#!/bin/sh
#PBS -l nodes=1:ppn=1
#PBS -l walltime=01:00:00
#PBS -m bea
#PBS -M mst3k@virginia.edu
cd $PBS_O_WORKDIR
sas myfile.sas
IMSL
#!/bin/bash
#PBS -l nodes=1:ppn=1
#PBS -l walltime=00:30:00
source /opt/Modules/default/init/sh
module add imsl
cd $PBS_O_WORKDIR
./myprogram
R
#!/bin/bash
#PBS -l nodes=1:ppn=1
#PBS -l walltime=00:30:00
cd $PBS_O_WORKDIR
source /opt/Modules/default/init/sh
module add R
R --slave CMD BATCH myRscript.R
cd $HOME
touch .mpd.conf
chmod 600 .mpd.conf
and then use an editor to insert a line
MPD_SECRETWORD=mst3k-mpi
into the file, where you should choose the "secret word" to be reasonably
secure, but do not use any of your regular passwords.
Once you have set up MPICH2, you can use the following script with Rmpi.
#!/bin/bash
#PBS -l nodes=5:ppn=1
#PBS -l walltime=00:30:00
cd $PBS_O_WORKDIR
source /opt/Modules/default/init/sh
module add R
NP=`wc -l < $PBS_NODEFILE`
NODEFILE=`mktemp`
sort $PBS_NODEFILE | uniq -c | \
awk '{ printf("%s.local:%s\n", $2, $1); }' > $NODEFILE
NN=`wc -l < $NODEFILE`
mpdboot -f $NODEFILE -n $NN
mpirun R --slave CMD BATCH taskpush.R
mpdallexit
Retrieving Files During a Run
File Transfer to and from the Cluster
Transfer from aspen.itc (local source and remote destination):
/uva/bin/scp mydirectory/* \
mst3k@remote_host.virginia.edu:/home/mst3k/myoutput/.
/uva/bin/rsync -e ssh -a mydirectory/. \
mst3k@remote_host.virginia.edu:/home/mst3k/myoutput/.
/uva/bin/scp2 mst3k@aspen.itc.virginia.edu:mydirectory/* \
/home/mst3k/myoutput/.
/uva/bin/rsync -e ssh -a \
mst3k@aspen.itc.virginia.edu:mydirectory/. \
/home/mst3k/myoutput/.