Difference between revisions of "Monitoring your jobs"

From IBERS Bioinformatics and HPC Wiki
Jump to: navigation, search
Line 14: Line 14:
  
 
   <nowiki>
 
   <nowiki>
  [user@bert ~]$ qstat
+
[user@bert ~]$ qstat
  job-ID  prior  name      user        state submit/start at    queue                          slots ja-task-ID  
+
job-ID  prior  name      user        state submit/start at    queue                          slots ja-task-ID  
  -----------------------------------------------------------------------------------------------------------------
+
-----------------------------------------------------------------------------------------------------------------
  758061 0.50042 k2bRC-a1.i user        r    07/20/2014 14:13:33 amd.q@node010.cm.cluster          1         
+
758061 0.50042 k2bRC-a1.i user        r    07/20/2014 14:13:33 amd.q@node010.cm.cluster          1         
  758062 0.50042 k2bRC-a2.i user        r    07/20/2014 14:13:33 amd.q@node009.cm.cluster          1         
+
758062 0.50042 k2bRC-a2.i user        r    07/20/2014 14:13:33 amd.q@node009.cm.cluster          1         
  758063 0.50042 k2bRC-a3.i user        r    07/20/2014 14:13:48 amd.q@node009.cm.cluster          1         
+
758063 0.50042 k2bRC-a3.i user        r    07/20/2014 14:13:48 amd.q@node009.cm.cluster          1         
  758064 0.50042 k2bRC-a4.i user        r    07/20/2014 14:13:48 amd.q@node008.cm.cluster          1         
+
758064 0.50042 k2bRC-a4.i user        r    07/20/2014 14:13:48 amd.q@node008.cm.cluster          1         
  758065 0.50042 k2bRC-a5.i user        qw    07/20/2014 14:14:03                                    1         
+
758065 0.50042 k2bRC-a5.i user        qw    07/20/2014 14:14:03                                    1         
  758066 0.60208 k2bRC-a6.i user        qw    07/20/2014 14:14:18                                    1         
+
758066 0.60208 k2bRC-a6.i user        qw    07/20/2014 14:14:18                                    1         
 
   </nowiki>
 
   </nowiki>

Revision as of 11:35, 21 July 2014

There are various ways for you to monitor and check up on your running and completed jobs.

See the status of the nodes

The easiest way to see what is happening on the cluster is to firstly check ganglia. This is a web based monitoring application that displays statistics about the cluster and its nodes. To view this, simply visit;

http://bert.ibers.aber.ac.uk/ganglia

There are a variety of statistics to view. Most useful is probably load_one which shows you the cpu load average on each node. You can also monitor the overall averages along with memory and network usage.

Check on what is running

Once you have submitted your job scripts, you may want to check on the progress of what is running. This is achieved using the qstat command. This will show you your jobs. It might look something like;

  
[user@bert ~]$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
 758061 0.50042 k2bRC-a1.i user         r     07/20/2014 14:13:33 amd.q@node010.cm.cluster           1        
 758062 0.50042 k2bRC-a2.i user         r     07/20/2014 14:13:33 amd.q@node009.cm.cluster           1        
 758063 0.50042 k2bRC-a3.i user         r     07/20/2014 14:13:48 amd.q@node009.cm.cluster           1        
 758064 0.50042 k2bRC-a4.i user         r     07/20/2014 14:13:48 amd.q@node008.cm.cluster           1        
 758065 0.50042 k2bRC-a5.i user         qw    07/20/2014 14:14:03                                    1        
 758066 0.60208 k2bRC-a6.i user         qw    07/20/2014 14:14:18                                    1