Array jobs

From IBERS Bioinformatics and HPC Wiki
Revision as of 12:10, 25 January 2016 by Mjv08 (talk | contribs) (Limiting the number of tasks that run at any one time)
Jump to: navigation, search

Array jobs allow for you to send of thousands of jobs at once without sending thousands of separate jobs, therefore reducing the stress placed on the server and keeping it quick.

An example of an array job is

   
#$ -S /bin/sh
#$ -N Usearch
#$ -j y
#$ -m e
#$ -M USER@aber.ac.uk
#$ -q large.q,intel.q,amd.q
#$ -cwd
#$ -pe multithread 8
#$ -l h_vmem=1G
#$ -l h_rt=5:00:00
#$ -V
#$ -t 1-1000


echo $1

echo $2

base=$( basename $1)

i=$(expr $SGE_TASK_ID - 1)

echo $i

usearch -ublast  ../../split-Contigs_${2}.fa-${i}.fa -db $1 -evalue 0.01 -accel 0.5 -blast6out ${i}.${base}.usearch.txt 

    

The "-t 1-1000" command converts this job from a normal job, into an array and produced the variable $SGE_TASK_ID.

In this example usearch -ublast is run 1000 times on different files which are named the same, except for a difference of 1 number. Using the $SGE_TASK_ID you can loop through all the input files.

However $SGE_TASK_ID can only start from 1 and so if your files start from 0 you have to use code as in the example which creates the variable $i which is $SGE_TASK_ID - 1 so that it starts from 0.

Using an array job still provides an output file for each of the jobs which are part of the array and so you can still monitor each separately and check if 1 fails whilst the others work etc.

One of the main benefits is that it reduces the number of jobs you have when using 'qstat', instead of having to scroll through thousands of jobs it is only a few plus each of the jobs currently running which are shown separately.


For a more detailed look at the abilities of array jobs look at: http://wiki.gridengine.info/wiki/index.php/Simple-Job-Array-Howto


Limiting the number of tasks that run at any one time

In the above example, there are 1000 tasks that need to be run. If there is nothing running on the HPC at the time, over 500 may start. While this may seem like a good thing, depending on the type of task you are doing it may cause problems. For example, if each task is very IO intensive, then you may not be able to run 500 at the same time, so while it is easier to manage task arrays in terms of a job submission, it can cause issues.

One way around this is to amend the task line in the sun grid engine script to say that you wish 1-1000 to run, but only 20 to run at any one time. The syntax for this is as follows;

  #$ -t 1-1000 -tc 20

NOTE: the -tc 20 which limits to 20 concurrent tasks.