Project

General

Profile

New computing cluster in Koeniginstrasse

Introduction

Since January 2022 we have a new computing cluster which is installed int he server room of the physiscs department at Koeniginstrasse.

Hardware

  • there are in total 9 compute nodes available;
  • eight nodes are named "usm-cl-bt01n[1-4]" and "usm-cl-bt02n[1-4]";
  • there is one new node (Nov. 2022) named "usm-cl-1024us01";
  • each node usm-cl-bt[01, 02]n[1-4] has 128 logical cores (64 physical cores) and 512GB RAM available;
  • the node "usm-cl-1024us01" has 256 logical (126 physical cores) and 1024GB RAM available;
  • one storage for our group has 686Tb (/project/ls-mohr);

Login

  • public login server (for non-graphic, e.g. ssh): login.physik.uni-muenchen.de;
  • Jupyterhub: https://jupyter.physik.uni-muenchen.de;
  • both the server and the Jupyterhub require a two-factor-authentication;
  • for the second factor you need to register with a smartphone app such as Google Authenticator (or any other app that generates time-based one-time-passwords) here: https://otp.physik.uni-muenchen.de. You need to create a so called "soft-token".
  • for all logins you need to provide:
    • the user name of your physics account;
    • the pwd of your physics account;
    • the 6 digit number (soft-token) you read from the smartphone app;

Graphic Remote Login

A graphical remote login from outside the LMU network require a VPN connection. From June 2022 the only VPN connection is provided by eduVPN. After establishing a VPN connection the login is then done with X2GO as explained here. I was pointed to using the following logins:
  • cip-sv-login01.cip.physik.uni-muenchen.de
  • cip-sv-login02.cip.physik.uni-muenchen.de

but I am assuming the other connections recommended on the web page of the physics department (e.g. Garching) work as well. X2GO opens a KDE desktop, and of course the machine can connect to our cluster.

Processing

  • as on our local cluster "slurm" is being used as the job scheduling system. Access to the computing nodes and running jobs requires starting a corresponding slurm job;
  • the partition of our cluster is "usm-cl";
  • from the login node you can start an interactive job via "intjob --partition=usm-cl" (additional slurm arguments are accepted as well);
  • I created a python script which provides information on our partition (which jobs are running on which node, the owner of the job and so on);
  • I have also put together a rather silly slurm script which can be used as a starting point;
  • note that it is possible to directly "ssh" to all nodes on which one of your batch jobs is running. This can help to supervise the processing;

The command sacct can be used to monitor the jobs and their status which is running on the cluster. As usual with slurm commands there is a ton of options to re-define the search and format the desired output. The command:

sacct --format Start,End,User,JobID,state,partition  -u <user>  --starttime 2023-07-01

lists all jobs of the user <user> since 1st July 2023. The information provided includes the end state and the end time and so on.
For more details please use "man sacct".

Disk space

  • users can create their own disk space under "/project/ls-mohr/users/" such as "/project/ls-mohr/users/martin.kuemmel";

Installed software

We use a package manager called spack to download and install software that is not directly available from the linux distribution. To see what is already installed, do the following on a computing node:

  • "module load spack"
  • "module avail"

Adding more software is not a problem.

Euclid processing on the cluster

While OS, libraries and setup is different from EDEN-?.?, it is possible to load and run in an EDEN-3.0/3.1 environment using a container solution. The cluster offers singularity as a container solution. While singularity is not officially supported in Euclid, it is being used in a limited role, and singularity is able to run docker images, which is the supported container format in Euclid. To work in an EDEN-3.0/3.1 on the new cluster you need to get the docker image doing:
  • load singularity via:
      $ module load spack
      $ module load singularity
    Note that the singularity version which is directly available on the computing nodes at "/usr/bin/singularity" does not work. The correct version loaded via the modules is at "/software/opt/focal/x86_64/singularity/v3.8.1/bin/singularity".
  • it is recommended to move the singularity cache to somewhere under "/scratch-local", e.g. via:
    $ mkdir -p /scratch-local/$USER/singularity
    $ export SINGULARITY_CACHEDIR=/scratch-local/$USER/singularity
    On the default cache location "/home/$HOME/.cache/singularity" there are problems deleting the entire cache when leaving singularity.

There are docker images available on cvmfs, and one image can be run interactively via:

singularity run --bind /cvmfs/euclid.in2p3.fr:/cvmfs/euclid.in2p3.fr --bind /cvmfs/euclid-dev.in2p3.fr:/cvmfs/euclid-dev.in2p3.fr /cvmfs/euclid-dev.in2p3.fr/WORKNODE/CentOS7.sif

It is also possible to directly issue a command in EDEN-3.0:
$ singularity run --bind /cvmfs/euclid.in2p3.fr:/cvmfs/euclid.in2p3.fr --bind /cvmfs/euclid-dev.in2p3.fr:/cvmfs/euclid-dev.in2p3.fr /cvmfs/euclid-dev.in2p3.fr/WORKNODE/CentOS7.sif <command_name>

In both cases the relevant EDEN environment must first be loaded with:
$ source /cvmfs/euclid-dev.in2p3.fr/CentOS7/EDEN-3.0/bin/activate
$ source /cvmfs/euclid-dev.in2p3.fr/EDEN-3.1/bin/activate

Information on the usage of singularity in Euclid is available at the Euclid Redmine.

Problems with the cvmfs

All Euclid related software is centrally installed and deployed via cvmfs. This means that on the host machine the two directories:

martin.kuemmel@usm-cl-bt02n4:~$ ls -ltr /cvmfs/
drwxr-xr-x 2 cvmfs cvmfs 0 Feb 13 16:09 euclid-dev.in2p3.fr
drwxr-xr-x 2 cvmfs cvmfs 0 Feb 13 17:22 euclid.in2p3.fr

must exist on the host machine such that they can be mounted in singularity as indicated above. It looks like cvmfs "sometimes get stuck" and needs to be re-installed or re-mounted. I there are problem mounting cvmfs in singularity and the above directories do not exist on the host, please write a ticket to the sysadmins below and they will fix it.

Old

In 2022 the docker image could/had to be downloaded via
  • pull the Euclid docker image via:
    singularity pull --docker-login docker://gitlab.euclid-sgs.uk:4567/st-tools/ct_xodeen_builder/dockeen
    With the gitlab credentials the docker image is stored in the file "dockeen_latest.sif"

Now (July 2023) I am not sure whether this is still possible.

Support

Support is provided by the IT support (Rechnerbetriebsgruppe) of the LMU faculty of physics with the helpdesk email: . Please keep Joe Mohr and me (Martin Kuemmel: ) in the loop such that we can maintain an overview on the cluster performance.

test.slurm (434 Bytes) Martin Kuemmel, 03/17/2022 02:09 PM

scontrol.py Magnifier (9.97 KB) Martin Kuemmel, 05/11/2022 09:52 AM

Redmine Appliance - Powered by TurnKey Linux