Project

General

Profile

New cluster » History » Version 10

Version 9 (Martin Kuemmel, 05/12/2022 09:05 AM) → Version 10/23 (Martin Kuemmel, 05/12/2022 09:39 AM)

h1. New computing cluster in Koenigstrasse

h2. Introduction

Since January 2022 we have a new computing cluster which is installed int he server room of the physiscs department at Koenigstrasse. Temporarily attached to the cluster is a 10TB disk for processing. We are currently (17th March 2022) waiting for a large amount of storage (40TB) which will then replace this temporary solution.

h2. Hardware

* there are in total 8 compute nodes avalable;
* the compute nodes are named "usm-cl-bt01n[1-4]" and "usm-cl-bt02n[1-4]";
* each node has 128 cores;
* each node has 500Gb available;

h2. Login

* public login server: login.physik.uni-muenchen.de;
* Jupyterhub: https://workshop.physik.uni-muenchen.de;
* both the server and the Jupyterhub require a two-factor-authentication with your physics account pwd as the first authentication. Then you can use a smartphone app like Google Authenticator (or any other app that generates time-based one-time-passwords). The app needs to be registered here: https://otp.physik.uni-muenchen.de, it is there called a soft-token.

Graphical access to the cluster or its login nodes is possible, and I am currently trying to figure out the most efficient way for me.

h2. Processing

* as on our local cluster "slurm" is being used as the job scheduling system. Access to the computing nodes and running jobs requires starting a corresponding slurm job;
* the partition of our cluster is "usm-cl";
* from the login node you can start an interactive job via "intjob --partition=usm-cl" (additional slurm arguments are accepted as well);
* I created a "python script":https://cosmofs3.kosmo.physik.uni-muenchen.de/attachments/download/285/scontrol.py which provides information on our partition (which jobs are running on which node, the owner of the job and so on);
* I have also put together a rather silly "slurm script":https://cosmofs3.kosmo.physik.uni-muenchen.de/attachments/download/283/test.slurm

h2. Disk space

* users can create their own disk space under "/project/ls-mohr/users/" such as "/project/ls-mohr/users/martin.kuemmel";

h2. Installed software

We use a package manager called spack to download and install software that is not directly available from the linux distribution. To see what is already installed, do the following on a computing node:

* "module load spack"
* "module avail"

Adding more software is not a problem.

h2. Euclid processing on the cluster

While OS, libraries and setup is different from EDEN-?.?, it is possible to load and run in an EDEN-3.0 environment using a container solution. The cluster offers "singularity":https://sylabs.io/guides/3.0/user-guide/quick_start.html as a container solution. While singularity is not officially supported in Euclid, it is being used in a limited role, and singularity is able to run docker images, which is the supported container format in Euclid. To work in an EDEN-3.0 on the new cluster you need to get the docker image doing:
* load singularity via:
<pre>
$ module load spack
$ module load singularity</pre> Note that the singularity version which is directly available on the computing nodes at "/usr/bin/singularity" does *not* work. The correct version loaded via the modules is at "/software/opt/focal/x86_64/singularity/v3.8.1/bin/singularity".
* pull the Euclid docker image via: <pre>singularity pull --docker-login docker://gitlab.euclid-sgs.uk:4567/st-tools/ct_xodeen_builder/dockeen</pre> With the gitlab credentials the docker image is stored in the file "dockeen_latest.sif"

The docker image can be run interactively:
<pre>$ singularity run --bind /cvmfs/euclid.in2p3.fr:/cvmfs/euclid.in2p3.fr --bind /cvmfs/euclid-dev.in2p3.fr:/cvmfs/euclid-dev.in2p3.fr <path to/>dockeen_latest.sif</pre>
It is also possible to directly issue a command in EDEN-3.0:
<pre>$ singularity run --bind /cvmfs/euclid.in2p3.fr:/cvmfs/euclid.in2p3.fr --bind /cvmfs/euclid-dev.in2p3.fr:/cvmfs/euclid-dev.in2p3.fr <path to/>dockeen_latest.sif <command_name></pre>
In both cases the relevant EDEN environment must first be loaded with:
<pre>
$ source /cvmfs/euclid-dev.in2p3.fr/CentOS7/EDEN-3.0/bin/activate
</pre>

Information on the usage of singularity in Euclid is available at the "Euclid Redmine":https://euclid.roe.ac.uk/projects/codeen-users/wiki/EDEN_SINGULARITY.

h2.
Support

Support is provided by the IT support (Rechnerbetriebsgruppe) of the LMU faculty of physics with the helpdesk email: helpdesk@physik.uni-muenchen.de. Please keep Joe Mohr and me (Martin Kuemmel: mkuemmel@usm.lmu.de) in the loop such that we can maintain an overview on the cluster performance.

Redmine Appliance - Powered by TurnKey Linux