What next?

Overview

Teaching: min
Exercises: min

Questions

What are the best practices for using an HPC system?

How can I take what i’ve learned so far forward and put it into practice?

Objectives

Understand HPC best practice

Know what steps are needed to use Bert for research

HPC Best Practice

When you start using the machines in your own research, bear the following points in mind:

Don’t run jobs on login nodes
Don’t run too many jobs at once.
Don’t use all the disk space on scratch. Don’t leave old files on there.
Request data you want stored long term to be transferred to the archive. You will still have read-only access to it.
Try to use all the cores on a node, especially if you take the node exclusively. Sometimes with large memory jobs this isn’t possible.
Make jobs that last at least a few minutes, lots of small jobs creates excess load on the scheduler. Use something like GNU Parallel to make each Slurm job do several things.

Again, working on a cluster is working in a big sandbox, with people of all ages and skills. So it is important to work carefully and be considerate. These pages from Harvard University discuss some more detail about common pitfalls and fair use on HPC systems.

Common Pitfalls Fair Use/Responsibilities:

Supercomputing Wales Research Software Engineers

While this training course is aimed at giving you enough experience and knowledge to get started, it can’t cover all possible use cases. The Research Software Engineers who have written and delivered today’s training also work with individual researchers and research groups to advise and assist on making optimal use of the available facilities. Things that they can provide assistance with include:

Converting existing software to run on the HPC system
Optimising code to run more efficiently on HPC systems
Writing new software
Helping with training, on-boarding and project development

If you feel you’d benefit from more bespoke support from your local RSE team, then speak to one of them before you leave and they will let you know the best way to proceed.

Using Supercomputing Wales instead

Supercomputing Wales is a joint partnership of Aberystwyth, Bangor, Cardiff and Swansea Universities. It offers access to shared HPC facilities based in Cardiff and Swansea. These have more CPU cores and more GPUs than Bert, although the most memory on one node is 384GB, on Bert it is 1TB. To access Supercomputing Wales you will need to apply via the My Supercomputing Wales page.

Key Points

Remember that HPCs are shared systems and try avoid allocating resources which you don’t use

Don’t make millions of files

Make use of the Research Software Engineers to help you use the system effectively

previous episode

Introduction to Slurm on Bert

next episode