Difference between revisions of "Complex submissions"

From IBERS Bioinformatics and HPC Wiki
Jump to: navigation, search
(SGE Defaults)
 
(21 intermediate revisions by 2 users not shown)
Line 1: Line 1:
In [[Submitting your job using SGE]] we looked at the following script;
+
In [[Submitting your job using Slurm]] we looked at the following script;
  
 
     <nowiki>
 
     <nowiki>
#specify the shell type
+
#!/bin/bash --login
#$ -S /bin/sh
 
  
#run in the current working directory
+
# Specify the queue (also known as a partition)
#$ -cwd
+
#SBATCH --partition=amd
  
#specify which queue you wish to use
+
# run a single task, using a single CPU core
#$ -q amd.q
+
#SBATCH --ntasks=1
 +
 
 +
#SBATCH --output=myScript.o%J
  
 
#run a program command to print hostname and uptime
 
#run a program command to print hostname and uptime
hostname && uptime
+
/bin/hostname && /bin/uptime
 
     </nowiki>
 
     </nowiki>
  
As mentioned in [[Submitting your job using SGE]], unless the HPC is quiet, you may find it difficult to get this job to run or that you need more resources. This section discusses how to request resources using limits and gives some examples of some scripts.
+
As mentioned in [[Submitting your job using Slurm]], unless the HPC is quiet, you may find it difficult to get this job to run or that you need more resources. This section discusses how to request resources using limits and gives some examples of some scripts.
 +
 
 +
== Slurm and Limits ==
  
== Sun Grid Engine and Limits ==
+
Slurm does not know what you're attempting to do until you tell it. There are three pieces of information that the scheduler needs to best load balance your job in the queue. It needs to know how much memory you need, how many CPU cores (also known as slots) and how long the job is expected to run for. This information is passed to the scheduler from your job script. If you fail to specify the number of cpu cores, memory or time required, it will use the scheduler defaults, which may not be the best for your task.
  
Sun Grid Engine does not know what you're attempting to do until you tell it. There are three pieces of information that the scheduler needs to best load balance your job in the queue. It needs to know how much memory you need, how many CPU cores (also known as slots) and how long the job is expected to run for. This information is passed to the scheduler from your grid engine script. If you fail to specify the number of cpu cores, memory or time required, it will use the scheduler defaults, which may not be the best for your task.
 
  
 +
=== Requesting CPU Cores ===
  
''' Requesting Memory '''
+
In the example above we only requested one task, which Slurm will map to using one core. We can request more cores by increasing this number, however there is no guarantee that these will be on the same node (and for many tasks we don't care if the cores are on the same node as each other). Just increasing the number of tasks Slurm allocates doesn't make multiple copies of our job run. To run multiple copies we have to use the srun command to launch our program and tell that how many copies we'd like.
  
 +
    <nowiki>
 +
#!/bin/bash --login
  
Requesting memory is achieved by using the h_vmem limit in your script, like so;
+
# Specify the queue (also known as a partition)
 +
#SBATCH --partition=amd
  
    <nowiki>
+
# run a single task, using a single CPU core
#$ -l h_vmem=40G
+
#SBATCH --ntasks=8
 +
 
 +
#SBATCH --output=myScript.o%J
 +
 
 +
#run 8 copies of the program to print hostname and uptime
 +
srun --ntasks=8 "/bin/hostname && /bin/uptime"
 
     </nowiki>
 
     </nowiki>
  
This will request 40G of RAM from the available nodes in the queue you've submitted it to. For information about the specification of the nodes available, please see [[Bert and Ernie - An Overview]].
 
  
If you submit a job requesting more memory than is available on a single node in the queue then your job will fail to run. e.g. If you submit a job requiring 512GB RAM to the intel queue, it will fail to run as the maximum amount of memory a node has in the intel queue is 192GB.  
+
This will submit your job and require 8 cores on any number of nodes (in the same partition) to run. At this point, if you submit this job, it will wait until 8 cores become available.  
  
If your job uses more memory than you've requested, your job will fail.
 
  
  
''' Requesting Time '''
+
=== Requesting all jobs run on a single node ===
  
 +
Adding the --nodes=1 parameter to your script will force all tasks to run on the same node. Note that the number of tasks requested must be less than or equal to the number of CPU cores available on a node.
  
Requesting memory is achieved by using the h_rt limit in your script (h_rt=hour:minute:seconds), like so;
+
 
 +
=== Requesting Memory ===
 +
 
 +
Requesting memory is achieved by using the --mem option in your script.
 +
 
 +
You can specify the memory limits like this:
  
 
     <nowiki>
 
     <nowiki>
#$ -l h_rt=24:00:00
+
#SBATCH --mem=40G
 
     </nowiki>
 
     </nowiki>
  
This will tell the scheduler that the job will take 24 hours to run.
+
This will request 40G of RAM is allocated on each node the job runs on. If the job exceeds this memory usage then it will be killed by Slurm. For information about the specification of the nodes available, please see [[Bert and Ernie - An Overview]].
 +
 
 +
If you submit a job requesting more memory (mem_free) than is available on a single node in the queue then your job will fail to run. e.g. If you submit a job requiring 512GB RAM to the intel queue, it will fail to run as the maximum amount of memory a node has in the intel queue is 192GB.  
  
 +
=== Requesting Time ===
  
''' Requesting CPU Cores '''
+
You can limit how long a job is allowed to take. When this time limit is exceeded Slurm will stop the job.
  
This is probably the most complicated of the requests to make. This will focus on running multi-threaded applications on a single node. To do this you first need to specify a parallel environment and include how many cores (AKA slots) you require. This is achieved within you script like so;
+
Requesting time limits is achieved by using the --time option in your script. The format options are either minutes, minutes:seconds, hours:minutes:seconds, days-hours, days-hours:minutes or days-hours:minutes:seconds.  
  
 +
for example to request a 24 hour time limit you could either use:
  
 
     <nowiki>
 
     <nowiki>
#$ -pe multithread 8
+
#SBATCH --time=24:00:00
 
     </nowiki>
 
     </nowiki>
  
This will submit your job and require 8 slots on a single node to run. However, this is not the only thing you have to do. At this point, if you submit this job, it will wait until 8 slots on a node become available. The problem arises if there are many single core jobs in the queue that keep jumping ahead. This is because the scheduler is attempting to 'backfill' all available space to ensure that the HPC is at 100% utilisation. When this happens your job will wait until smaller jobs have completed, which may never happen. To circumvent this, you can make a request to reserve the resources you require.
+
or
 
 
You can make this reservation when you submit your job;
 
  
 
     <nowiki>
 
     <nowiki>
qsub -R y myscript.sge
+
#SBATCH --time=1-00
 
     </nowiki>
 
     </nowiki>
  
IMPORTANT: You should only make a reservation like this if you're requesting more than one slot. Across the whole of the HPC, only 32 reservations are accepted at any one time. The reason for this, is to allow users to submit many thousands of small, short single core jobs at a low priority and forget about them while other users can submit larger jobs and still get their required resources.
 
  
== SGE Defaults ==
+
== An example script ==
 +
 
 +
Here is a more realistic script that you may wish to run.
 +
 
 +
  <nowiki>
 +
#!/bin/bash --login
 +
 
 +
#specify which queue you wish to use
 +
#SBATCH --partition=amd
 +
 
 +
#the job name as it appears in the queue
 +
#SBATCH --job-name=blast
 +
 
 +
#output and error files
 +
#SBATCH --output=blast.o%J
 +
#SBATCH --error=blast.e%J
 +
 
 +
 
 +
#specify the number of tasks you require
 +
#SBATCH --ntasks=32
 +
 
 +
#request all tasks run on the same node
 +
#SBATCH --nodes=1
 +
 
 +
#specify how long the job will take
 +
#SBATCH --time=720:00:00
 +
 
 +
#SBATCH --mem=100G
 +
 
 +
#Load the blast module and then run the blast
 +
module load BLAST/blast-2.2.26
 +
blastall -p blastn -d /ibers/repository/public/blast_db/blast_june/nt -i myfile.fasta -o myfile.blast -a $SLURM_NTASKS -m 7
 +
  </nowiki>
 +
 
 +
This script uses the amd queue, which we know has four nodes with 64 CPU cores and 256GB RAM and three nodes with 32 CPU cores and 98GB RAM. The run time of this, since it's a large sequence that is to be run, is set to 720 hours, or 30 days. I have requested 100GB for this job and 32 cores. This means that of the 7 AMD nodes available, only four of them are actually able to run it.
 +
 
 +
To run this script:
  
In order for scheduling to work properly, the limits that we've already discussed need to have defaults in case the user does not specify them. If you submit a job using the script from [[Submitting your job using SGE]], then it will assume the following;
+
  <nowiki>
 +
sbatch myscript.slurm
 +
  </nowiki>
  
h_vmem=2G
+
Finally, you may notice that the blast command uses the -m flag to specify the number of CPU cores required. The $SLURM_TASKS variable is set by specifying the number of tasks you require. Using this notation you only need to specify the number of CPU cores needed in one place.
h_stack=512m
 
h_rt=999999:00:00 /*AKA - 144 Years*/
 

Latest revision as of 16:21, 27 October 2022

In Submitting your job using Slurm we looked at the following script;

   
#!/bin/bash --login

# Specify the queue (also known as a partition)
#SBATCH --partition=amd

# run a single task, using a single CPU core
#SBATCH --ntasks=1

#SBATCH --output=myScript.o%J

#run a program command to print hostname and uptime
/bin/hostname && /bin/uptime
    

As mentioned in Submitting your job using Slurm, unless the HPC is quiet, you may find it difficult to get this job to run or that you need more resources. This section discusses how to request resources using limits and gives some examples of some scripts.

Slurm and Limits

Slurm does not know what you're attempting to do until you tell it. There are three pieces of information that the scheduler needs to best load balance your job in the queue. It needs to know how much memory you need, how many CPU cores (also known as slots) and how long the job is expected to run for. This information is passed to the scheduler from your job script. If you fail to specify the number of cpu cores, memory or time required, it will use the scheduler defaults, which may not be the best for your task.


Requesting CPU Cores

In the example above we only requested one task, which Slurm will map to using one core. We can request more cores by increasing this number, however there is no guarantee that these will be on the same node (and for many tasks we don't care if the cores are on the same node as each other). Just increasing the number of tasks Slurm allocates doesn't make multiple copies of our job run. To run multiple copies we have to use the srun command to launch our program and tell that how many copies we'd like.

   
#!/bin/bash --login

# Specify the queue (also known as a partition)
#SBATCH --partition=amd

# run a single task, using a single CPU core
#SBATCH --ntasks=8

#SBATCH --output=myScript.o%J

#run 8 copies of the program to print hostname and uptime
srun --ntasks=8 "/bin/hostname && /bin/uptime"
    


This will submit your job and require 8 cores on any number of nodes (in the same partition) to run. At this point, if you submit this job, it will wait until 8 cores become available.


Requesting all jobs run on a single node

Adding the --nodes=1 parameter to your script will force all tasks to run on the same node. Note that the number of tasks requested must be less than or equal to the number of CPU cores available on a node.


Requesting Memory

Requesting memory is achieved by using the --mem option in your script.

You can specify the memory limits like this:

   
#SBATCH --mem=40G
    

This will request 40G of RAM is allocated on each node the job runs on. If the job exceeds this memory usage then it will be killed by Slurm. For information about the specification of the nodes available, please see Bert and Ernie - An Overview.

If you submit a job requesting more memory (mem_free) than is available on a single node in the queue then your job will fail to run. e.g. If you submit a job requiring 512GB RAM to the intel queue, it will fail to run as the maximum amount of memory a node has in the intel queue is 192GB.

Requesting Time

You can limit how long a job is allowed to take. When this time limit is exceeded Slurm will stop the job.

Requesting time limits is achieved by using the --time option in your script. The format options are either minutes, minutes:seconds, hours:minutes:seconds, days-hours, days-hours:minutes or days-hours:minutes:seconds.

for example to request a 24 hour time limit you could either use:

   
#SBATCH --time=24:00:00
    

or

   
#SBATCH --time=1-00
    


An example script

Here is a more realistic script that you may wish to run.

 
#!/bin/bash --login

#specify which queue you wish to use
#SBATCH --partition=amd

#the job name as it appears in the queue
#SBATCH --job-name=blast

#output and error files
#SBATCH --output=blast.o%J
#SBATCH --error=blast.e%J


#specify the number of tasks you require
#SBATCH --ntasks=32

#request all tasks run on the same node
#SBATCH --nodes=1

#specify how long the job will take
#SBATCH --time=720:00:00

#SBATCH --mem=100G

#Load the blast module and then run the blast
module load BLAST/blast-2.2.26
blastall -p blastn -d /ibers/repository/public/blast_db/blast_june/nt -i myfile.fasta -o myfile.blast -a $SLURM_NTASKS -m 7
  

This script uses the amd queue, which we know has four nodes with 64 CPU cores and 256GB RAM and three nodes with 32 CPU cores and 98GB RAM. The run time of this, since it's a large sequence that is to be run, is set to 720 hours, or 30 days. I have requested 100GB for this job and 32 cores. This means that of the 7 AMD nodes available, only four of them are actually able to run it.

To run this script:

 
sbatch myscript.slurm
  

Finally, you may notice that the blast command uses the -m flag to specify the number of CPU cores required. The $SLURM_TASKS variable is set by specifying the number of tasks you require. Using this notation you only need to specify the number of CPU cores needed in one place.