Difference between revisions of "Monitoring your jobs"

From IBERS Bioinformatics and HPC Wiki
Jump to: navigation, search
 
(9 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
There are various ways for you to monitor and check up on your running and completed jobs.
 
There are various ways for you to monitor and check up on your running and completed jobs.
  
'''See the status of the nodes'''
 
  
The easiest way to see what is happening on the cluster is to firstly check ganglia. This is a web based monitoring application that displays statistics about the cluster and its nodes. To view this, simply visit;
+
=== Check on you've submitted ===
  
[http://bert.ibers.aber.ac.uk/ganglia http://bert.ibers.aber.ac.uk/ganglia]
+
Once you have submitted your job scripts, you may want to check on the progress of what is running. This is achieved using the <nowiki>qstat</nowiki> command. This will show you your jobs. It might look something like;
 +
 
 +
  <nowiki>
 +
[user@login01(aber) ~]$ squeue
 +
            JOBID PARTITION    NAME    USER ST      TIME  NODES NODELIST(REASON)
 +
            200133      amd myScript      cos  R      0:03      1 node008
 +
            200134      amd myScript      cos  R      0:01      1 node008
 +
            200135      amd myScript      cos  R      0:01      1 node008
 +
            200136      amd myScript      cos  R      0:01      1 node008
 +
            200137      amd myScript      cos  R      0:01      1 node008
 +
            200138      amd myScript      cos  R      0:02      1 node008
 +
            200139      amd myScript      cos  R      0:02      1 node008
 +
            200140      amd myScript      cos  R      0:02    1 node008
 +
   
 +
  </nowiki>
 +
 
 +
 
 +
=== Check a the status of a job ===
  
There are a variety of statistics to view. Most useful is probably <nowiki>load_one</nowiki> which shows you the cpu load average on each node. You can also monitor the overall averages along with memory and network usage.  
+
You can use the <nowiki>squeue -j JOB_ID</nowiki> command to get information about a running or queued job. Below is what you might find on a running job.
  
'''Check on what is running'''
+
  <nowiki>
 +
[user@bert ~]$ qstat -j 200133
 +
            JOBID PARTITION    NAME    USER ST      TIME  NODES NODELIST(REASON)
 +
            200133      amd myScript      cos  R      0:03      1 node008
 +
  </nowiki>
  
Once you have submitted your job scripts, you may want to check on the progress of what is running. This is achieved using the <nowiki>qstat</nowiki> command. This will show you your jobs. It might look something like;
 
  
<nowiki>
+
=== job States ===
  [user@bert ~]$ qstat
+
 
  job-ID  prior  name      user        state submit/start at    queue                          slots ja-task-ID
+
R = Job is running
  -----------------------------------------------------------------------------------------------------------------
+
PD = Job is waiting to run
  758061 0.50042 k2bRC-a1.i user        r    07/20/2014 14:13:33 amd.q@node010.cm.cluster          1       
+
CG = Job is completing
  758062 0.50042 k2bRC-a2.i user        r    07/20/2014 14:13:33 amd.q@node009.cm.cluster          1       
 
  758063 0.50042 k2bRC-a3.i user        r    07/20/2014 14:13:48 amd.q@node009.cm.cluster          1       
 
  758064 0.50042 k2bRC-a4.i user        r    07/20/2014 14:13:48 amd.q@node008.cm.cluster          1       
 
  758065 0.50042 k2bRC-a5.i user        qw    07/20/2014 14:14:03                                    1       
 
  758066 0.60208 k2bRC-a6.i user        qw    07/20/2014 14:14:18                                    1       
 
</nowiki>
 

Latest revision as of 16:28, 27 October 2022

There are various ways for you to monitor and check up on your running and completed jobs.


Check on you've submitted

Once you have submitted your job scripts, you may want to check on the progress of what is running. This is achieved using the qstat command. This will show you your jobs. It might look something like;

  
[user@login01(aber) ~]$ squeue 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            200133       amd myScript      cos  R       0:03      1 node008
            200134       amd myScript      cos  R       0:01      1 node008
            200135       amd myScript      cos  R       0:01      1 node008
            200136       amd myScript      cos  R       0:01      1 node008
            200137       amd myScript      cos  R       0:01      1 node008
            200138       amd myScript      cos  R       0:02      1 node008
            200139       amd myScript      cos  R       0:02      1 node008
            200140       amd myScript      cos  R       0:02     1 node008
     
   


Check a the status of a job

You can use the squeue -j JOB_ID command to get information about a running or queued job. Below is what you might find on a running job.

  
[user@bert ~]$ qstat -j 200133
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            200133       amd myScript      cos  R       0:03      1 node008
   


job States

R = Job is running PD = Job is waiting to run CG = Job is completing