# Introduction to high-performance computing (HPC)

When dealing with computational problems, various resource and time limitations could arise difficulties and disrupt the solutions. If we have enough time and resources, we might find the answer. A supercomputer or cluster with high level of performance could help us tackle the problem. The followings are some great workshops about HPC:

Using HPC systems often involves the use of a Shell through a command line interface which is necessary for this topic (see here).

This tutorial provides a list of basic scheduling commands, submitting jobs, methods of transferring files from local computers, and installing software on clusters.

Related document:

## Scheduling jobs

On an HPC system, we need a scheduler to manage how jobs running on a cluster. One of the most common schedulers is SLURM. The following are some practical SLURM commands (quick start user guide):

sinfo -s # shows summary info about all patritions
sjstat -c # shows computing resources info
srun # run parallel jobs
sbatch # submit a job to the scheduler
JOB_ID=$(sbatch --parsable file.sh) # keep the JOB ID right after reading the command sbatch --dependency=afterok:JOB_ID file.sh # submit a job file after finishing other jobs sbatch --dependency=singleton # submit a job after ending a job with a same name sacct # displays accounting data for all jobs and job steps in the SLURM job accounting log squeue -u <userid> # check on a user's job status squeue -u <userid> --start # show estimation time to start pending jobs scancel JOBID # cancel the job with JOBID scancel -u <useride> # cancel all the user jobs To see more details about these commands use <command> --help. Let’s connect to the cluster through ssh user@server, and do some practices. For example, use nano example-job.sh to make a job file including: #!/bin/bash #SBATCH --nodes 1 #SBATCH --mem 16G #SBATCH --ntasks 1 #SBATCH --cpus-per-task 4 #SBATCH --partition hpc0 #SBATCH --account general #SBATCH --time 02-05:00 #SBATCH --job-name NewJobName #SBATCH --mail-user your@email.com #SBATCH --mail-type END echo 'This script is running on:' hostname sleep 120 Special characters #! (shebang) at the beginning of scripts specifies what program should be used (i.e. /bin/bash or /usr/bin/python3). SLURM uses #SBATCH special comment to denote special scheduler-specific options. To see more options, use sbatch --help. For example, the above file uses 1 nodes, 16 gigabytes of memory, 1 taks and 4 CPUs per task, and using partition hpc0 with a general account for 2 days and 5 hours of walltime, gives a new name to the job, and email you when the job is ended. Now we can submit the job file by sbatch example-job.sh. We can use squeue -u USER or sacct to check the job file status, and use scancel JOBID to cancel the job. You may find more sbatch options here. To run a single command, we can use srun. For instance, srun -c 2 echo "This job will use 2 CPUs." submits a job and allocates 2 CPUs. Also, we can use srun to open a program in an interaction mode. For example, srun --pty bash will open a Bash shell in a computation node (not specified). Note: in general, when we connect to a cluster we will go to a node, called login node, which is not meant to do heavy computational tasks. So, to do our computations in a proper way, we should always use either sbatch or srun. Usually there are many modules available on the clusters. To find and load these modules use: module avail # shows all avaliable madules (programs) in the cluster module load <name> # to load a module ex. module load R or python module list # shows list of the loaded modules module unload <name> # to unload a module module purge # to unload all modules To create a simple template sbatch job file, use the following steps: 1. generate any files including all codes that we want to run in the cluster (that could be several Python or R or other scripts) 2. generate a Bash file including all modules that are required for the sending job (environment.sh) 3. generate a Bash file to call steps 1 and 2 including all #SBATCH options (job_file.sh) 4. use sbatchto run the file in step 3 For example, let’s run the following Python code called test.py: #!/usr/bin/python3 print("Hello world") Then use nano environment.sh to create the environment file including: #!/bin/bash module load miniconda3 Then use nano job-test.sh to make the job file by: #!/bin/bash #SBATCH --mem 1G #SBATCH --job-name Test1 echo ===$(date)
echo $SLURM_JOB_ID source ./environment.sh module list srun python3 ./test.py echo ===$(date) $(hostname) Now we can use sbach job-test.sh to run this job. If there are some dependencies between jobs, slurm can defer the start of a job until the specified dependencies have been satisfied completed. For instance, let’s create another job called job-test-2.sh: #!/bin/bash #SBATCH --mem 1G #SBATCH --job-name Test2 echo ===$(date)
echo $SLURM_JOB_ID echo === This is a new job echo ===$(date) $(hostname) We need another job, called job-test-3.sh, to run both job-test.sh and job-test-2.sh: #!/bin/bash #SBATCH --mem 1G #SBATCH --job-name Dependency echo ===$(date)
JID=$(sbatch --parsable job-test.sh) echo$JID