Array jobs

From IBERS Bioinformatics and HPC Wiki
Revision as of 12:09, 23 February 2015 by Thh32 (talk | contribs)
Jump to: navigation, search

Array jobs allow for you to send of thousands of jobs at once without sending thousands of separate jobs, therefore reducing the stress placed on the server and keeping it quick.

An example of an array job is

   
#$ -S /bin/sh
#$ -N Usearch
#$ -j y
#$ -m e
#$ -M USER@aber.ac.uk
#$ -q large.q,intel.q,amd.q
#$ -cwd
#$ -pe multithread 8
#$ -l h_vmem=1G
#$ -l h_rt=5:00:00
#$ -V
#$ -t 1-1000


echo $1

echo $2

base=$( basename $1)

i=$(expr $SGE_TASK_ID - 1)

echo $i

usearch -ublast  ../../split-Contigs_${2}.fa-${i}.fa -db $1 -evalue 0.01 -accel 0.5 -blast6out ${i}.${base}.usearch.txt 

    

The "-t 1-1000" command converts this job from a normal job, into an array and produced the variable $SGE_TASK_ID.

In this example usearch -ublast is run 1000 times on different files which are named the same, except for a difference of 1 number. Using the $SGE_TASK_ID you can loop through all the input files.

However $SGE_TASK_ID can only start from 1 and so if your files start from 0 you have to use code as in the example which creates the variable $i which is $SGE_TASK_ID - 1 so that it starts from 0.

Using an array job still provides an output file for each of the jobs which are part of the array and so you can still monitor each separately and check if 1 fails whilst the others work etc.

One of the main benefits is that it reduces the number of jobs you have when using 'qstat', instead of having to scroll through thousands of jobs it is only a few plus each of the jobs currently running which are shown separately.