Project

General

Profile

New cluster » History » Version 12

« Previous - Version 12/23 (diff) - Next » - Current version
Martin Kuemmel, 05/12/2022 01:16 PM


New computing cluster in Koenigstrasse

Introduction

Since January 2022 we have a new computing cluster which is installed int he server room of the physiscs department at Koenigstrasse. Temporarily attached to the cluster is a 10TB disk for processing. We are currently (17th March 2022) waiting for a large amount of storage (40TB) which will then replace this temporary solution.

Hardware

  • there are in total 8 compute nodes avalable;
  • the compute nodes are named "usm-cl-bt01n[1-4]" and "usm-cl-bt02n[1-4]";
  • each node has 128 cores;
  • each node has 500Gb available;

Login

  • public login server: login.physik.uni-muenchen.de;
  • Jupyterhub: https://workshop.physik.uni-muenchen.de;
  • both the server and the Jupyterhub require a two-factor-authentication with your physics account pwd as the first authentication. Then you can use a smartphone app like Google Authenticator (or any other app that generates time-based one-time-passwords). The app needs to be registered here: https://otp.physik.uni-muenchen.de, it is there called a soft-token.

Graphical access to the cluster or its login nodes is possible, and I am currently trying to figure out the most efficient way for me.

Processing

  • as on our local cluster "slurm" is being used as the job scheduling system. Access to the computing nodes and running jobs requires starting a corresponding slurm job;
  • the partition of our cluster is "usm-cl";
  • from the login node you can start an interactive job via "intjob --partition=usm-cl" (additional slurm arguments are accepted as well);
  • I created a python script which provides information on our partition (which jobs are running on which node, the owner of the job and so on);
  • I have also put together a rather silly slurm script which can be used as a starting point;
  • note that it is possible to directly "ssh" to all nodes on which one of your batch jobs is running. This can help to supervise the processing;

Disk space

  • users can create their own disk space under "/project/ls-mohr/users/" such as "/project/ls-mohr/users/martin.kuemmel";

Installed software

We use a package manager called spack to download and install software that is not directly available from the linux distribution. To see what is already installed, do the following on a computing node:

  • "module load spack"
  • "module avail"

Adding more software is not a problem.

Euclid processing on the cluster

While OS, libraries and setup is different from EDEN-?.?, it is possible to load and run in an EDEN-3.0 environment using a container solution. The cluster offers singularity as a container solution. While singularity is not officially supported in Euclid, it is being used in a limited role, and singularity is able to run docker images, which is the supported container format in Euclid. To work in an EDEN-3.0 on the new cluster you need to get the docker image doing:
  • load singularity via:
      $ module load spack
      $ module load singularity
    Note that the singularity version which is directly available on the computing nodes at "/usr/bin/singularity" does not work. The correct version loaded via the modules is at "/software/opt/focal/x86_64/singularity/v3.8.1/bin/singularity".
  • pull the Euclid docker image via:
    singularity pull --docker-login docker://gitlab.euclid-sgs.uk:4567/st-tools/ct_xodeen_builder/dockeen
    With the gitlab credentials the docker image is stored in the file "dockeen_latest.sif"

The docker image can be run interactively:

$ singularity run --bind /cvmfs/euclid.in2p3.fr:/cvmfs/euclid.in2p3.fr --bind /cvmfs/euclid-dev.in2p3.fr:/cvmfs/euclid-dev.in2p3.fr <path_to>dockeen_latest.sif

It is also possible to directly issue a command in EDEN-3.0:
$ singularity exec --bind /cvmfs/euclid.in2p3.fr:/cvmfs/euclid.in2p3.fr --bind /cvmfs/euclid-dev.in2p3.fr:/cvmfs/euclid-dev.in2p3.fr <path_to>dockeen_latest.sif  <command_name>

In both cases the relevant EDEN environment must first be loaded with:
$ source /cvmfs/euclid-dev.in2p3.fr/CentOS7/EDEN-3.0/bin/activate

Information on the usage of singularity in Euclid is available at the Euclid Redmine.

Support

Support is provided by the IT support (Rechnerbetriebsgruppe) of the LMU faculty of physics with the helpdesk email: . Please keep Joe Mohr and me (Martin Kuemmel: ) in the loop such that we can maintain an overview on the cluster performance.

test.slurm (434 Bytes) Martin Kuemmel, 03/17/2022 02:09 PM

scontrol.py Magnifier (9.97 KB) Martin Kuemmel, 05/11/2022 09:52 AM

Redmine Appliance - Powered by TurnKey Linux