Intermittent login failures on the HPC and some VMs

From IBERS Bioinformatics and HPC Wiki
Jump to: navigation, search

This issue has been affecting users of Bert, the repository and some virtual machines in early 2018.

Update - April 12th 2018, Information Services say the problem is now fixed. You should not need to do any of the following anymore. If you deployed the permanent fix for windows, please remove it by following the instructions in the "Removing the entry" section.

Background

A number of you have reported intermittent problems logging into bert or having your sessions disconnected. This causes timeouts when trying to login or "network error: Software caused connection abort" messages in putty when a login had worked and then gets dropped. The cause has been identified as a problem with a network switch in the Visualisation Centre which includes the HPC, Repository and some VMs. Information Services are aware of the problem and have been in discussion with the switch manufacture and hope to have a fix for this soon.

Workarounds

There are two workarounds for this problem, one temporary and one (semi) permanent. These fixues will NOT work on wireless/eduroam connections, VPN connections or any other connections outside the IBERS network.

Temporary fix

  • Login to central.aber.ac.uk
  • Login to bert.ibers.aber.ac.uk
  • ping the IP address of your computer (which will be of the format 144.124.1XX.XXX)
  • Login directly to Bert

This will probably timeout at some point, depending on what OS you're using it might only last a few minutes. Leaving the login via central open with ping running should prevent this. If you're having problems with a VM then login to the VM instead of bert.


Permanent(ish) fixes:

These are for bert only, if you've got a problem with a VM host then you'll have to change the addresses. Email ibers-cs@aber.ac.uk if you need help finding this out.

Linux/Mac

  • Open a terminal and run the command:
   sudo arp -s bert.ibers.aber.ac.uk d4:be:d9:b3:b7:45

This only lasts until you reboot.


Windows

Permanent Fix

    netsh -c interface ipv4 add neighbors “Local Area Connection” “144.124.106.138” “d4-be-d9-b3-b7-45” store=persistent
  • On some systems the network interface won't be called "Local Area Connection". Go to the "Network Connections" page from the "Network and Internet" section in Control Panel or run the "ipconfig" command to find the name of your network interface. It seems that on Windows 8/10 it will just be called "Ethernet" instead of "Local Area Connection".
  • This should permanently fix the problem.
Removing the entry

When IS have fixed the network this entry can be removed by doing:

   netsh -c interface ipv4 add neighbors "Local Area Connection" "144.124.106.138" "d4-be-d9-b3-b7-45" store=persistent


Temporary Alternative Method

If the method above fails you can temporarily add an entry by running the command:

   arp -s bert.ibers.aber.ac.uk d4-be-d9-b3-b7-45

This will be reset when your system is rebooted.