Complex submissions

From IBERS Bioinformatics and HPC Wiki
Jump to: navigation, search

In Submitting your job using SGE we looked at the following script;

#specify the shell type
#$ -S /bin/sh

#run in the current working directory
#$ -cwd

#specify which queue you wish to use
#$ -q amd.q

#run a program command to print hostname and uptime
hostname && uptime

As mentioned in Submitting your job using SGE, unless the HPC is quiet, you may find it difficult to get this job to run or that you need more resources. This section discusses how to request resources using limits and gives some examples of some scripts.

Sun Grid Engine and Limits

Sun Grid Engine does not know what you're attempting to do until you tell it. There are three pieces of information that the scheduler needs to best load balance your job in the queue. It needs to know how much memory you need, how many CPU cores (also known as slots) and how long the job is expected to run for. This information is passed to the scheduler from your grid engine script. If you fail to specify the number of cpu cores, memory or time required, it will use the scheduler defaults, which may not be the best for your task.

Requesting CPU Cores

This is probably the most complicated of the requests to make. This will focus on running multi-threaded applications on a single node. To do this you first need to specify a parallel environment and include how many cores (AKA slots) you require. This is achieved within you script like so;

#$ -pe multithread 8

This will submit your job and require 8 slots on a single node to run. However, this is not the only thing you have to do. At this point, if you submit this job, it will wait until 8 slots on a node become available. The problem arises if there are many single core jobs in the queue that keep jumping ahead. This is because the scheduler is attempting to 'backfill' all available space to ensure that the HPC is at 100% utilisation. When this happens your job will wait until smaller jobs have completed, which may never happen. To circumvent this, you can make a request to reserve the resources you require.

You can make this reservation when you submit your job;

qsub -R y myscript.sge

IMPORTANT: You should only make a reservation like this if you're requesting more than one slot. Across the whole of the HPC, only 32 reservations are accepted at any one time. The reason for this, is to allow users to submit many thousands of small, short single core jobs at a low priority and forget about them while other users can submit larger jobs and still get their required resources.

NOTE: Making these types of reservations takes time. Your reservation will stop new shorter jobs from starting in order for yours to begin running. This means that the more cores you reserve, the longer you have to wait for it to begin.

Requesting Memory

Requesting memory is achieved by using the mem_free and h_vmem limits in your script.

Beware that these two have different meanings and are often used incorrectly. mem_free means that the node running your job must have that amount of memory free. h_vmem means that the scheduler should kill the job if its memory usage exceeds this limit, specifying h_vmem alone won't check that there's enough memory free to run your job, it will just stop the job if it exceeds these limits.

You can specify the memory limits like this:

#$ -l mem_free=40G
#$ -l h_vmem=40G

This will request 40G of RAM for the job that you have requested from the available nodes in the queue you've submitted it to and will stop your job if it ever uses more than 40GB. For information about the specification of the nodes available, please see Bert and Ernie - An Overview.

If you submit a job requesting more memory (mem_free) than is available on a single node in the queue then your job will fail to run. e.g. If you submit a job requiring 512GB RAM to the intel queue, it will fail to run as the maximum amount of memory a node has in the intel queue is 192GB.

If your job uses more memory than you've requested, your job will fail.

Specifying a h_vmem of more memory than is available will appear to work as the job isn't using more than this amount of memory, but it will result in the job being scheduled to a node that hasn't got this much memory.

Requesting Time

Requesting memory is achieved by using the h_rt limit in your script (h_rt=hour:minute:seconds), like so;

#$ -l h_rt=24:00:00

This will tell the scheduler that the job will take 24 hours to run.

SGE Defaults

In order for scheduling to work properly, the limits that we've already discussed need to have defaults in case the user does not specify them. If you submit a job using the script from Submitting your job using SGE, then it will assume the following;

h_rt=999999:00:00 /*AKA - 144 Years*/

It will also assume you've only requested a single slot (CPU core).

As you can see, if you do not specify the resources you require, your job may either fail to schedule, fail once it has started to run or overload a node.

An example script

Here is a more realistic script that you may wish to run.

#specify the shell type
#$ -S /bin/sh

#run in the current working directory
#$ -cwd

#specify which queue you wish to use
#$ -q amd.q

#specify the parallel environment and the number of slots you require
#$ -pe multithread 64

#specify how long the job will take
#$ -l h_rt=720:00:00

#$ -l h_vmem=100G

#Load the blast module and then run the blast
module load BLAST/blast-2.2.26
blastall -p blastn -d /ibers/repository/public/blast_db/blast_june/nt -i myfile.fasta -o myfile.blast -a $NSLOTS -m 7

This script uses the amd queue, which we know has four nodes with 64 CPU cores and 256GB RAM and three nodes with 32 CPU cores and 98GB RAM. The run time of this, since it's a large sequence that is to be run, is set to 720 hours, or 30 days. I have requested 100GB for this job and 64 cores. This means that of the 7 AMD nodes available, only four of them are actually able to run it.

To run this script, unless the amd.q queue is empty, you must run the following command;

qsub -R y myscript.sge

rather than just;

qsub myscript.sge

NOTE: You're only using the -R y flag because you need to reserve multiple CPU cores. If you're only running a job which uses a single slot, you don't need to do this.

As a side note to this, notice that I've selected the blast_july database rather than the latest. This is because it is July when I am writing this, and I am unsure if it will be completed before the end of the month when the blast database is updated, see Blast for an explanation.

Finally, you may notice that the blast command uses the -m flag to specify the number of CPU cores required. The $NSLOTS variable is set by specifying the number of slots you require in your parallel environment. Using this notation you only need to specify the number of CPU cores needed in one place.