Project

General

Profile

New cluster » History » Version 16

Martin Kuemmel, 11/23/2022 09:31 AM

1 15 Igor Zinchenko
h1. New computing cluster in Koeniginstrasse
2 1 Martin Kuemmel
3 1 Martin Kuemmel
h2. Introduction
4 1 Martin Kuemmel
5 15 Igor Zinchenko
Since January 2022 we have a new computing cluster which is installed int he server room of the physiscs department at Koeniginstrasse.
6 1 Martin Kuemmel
7 3 Martin Kuemmel
h2. Hardware
8 3 Martin Kuemmel
9 16 Martin Kuemmel
* there are in total 9 compute nodes available;
10 16 Martin Kuemmel
* eight nodes are named "usm-cl-bt01n[1-4]" and "usm-cl-bt02n[1-4]";
11 16 Martin Kuemmel
* there is one new node (Nov. 2022) named "usm-cl-1024us01";
12 16 Martin Kuemmel
* each node usm-cl-bt[01, 02]n[1-4] has 128 logical cores (64 physical cores) and 512GB RAM available;
13 16 Martin Kuemmel
* the node "usm-cl-1024us01" has 256 logical (126 physical cores) and 1024GB RAM available;
14 15 Igor Zinchenko
* one storage for our group has 686Tb (/project/ls-mohr);
15 3 Martin Kuemmel
16 1 Martin Kuemmel
h2. Login
17 1 Martin Kuemmel
18 1 Martin Kuemmel
* public login server: login.physik.uni-muenchen.de;
19 1 Martin Kuemmel
* Jupyterhub: https://workshop.physik.uni-muenchen.de;
20 1 Martin Kuemmel
* both the server and the Jupyterhub require a two-factor-authentication with your physics account pwd as the first authentication. Then you can use a smartphone app like Google Authenticator (or any other app that generates time-based one-time-passwords). The app needs to be registered here: https://otp.physik.uni-muenchen.de, it is there called a soft-token.
21 1 Martin Kuemmel
22 13 Martin Kuemmel
h2. Graphic Remote Login
23 13 Martin Kuemmel
24 13 Martin Kuemmel
A graphical remote login from outside the LMU network require a VPN connection. From June 2022 the only VPN connection  is provided by "eduVPN":https://doku.lrz.de/display/PUBLIC/VPN+-+eduVPN+-+Installation+und+Konfiguration. After establishing a VPN connection the login is then done with X2GO as explained "here":https://www.en.it.physik.uni-muenchen.de/dienste/netzwerk/rechnerzugriff/zugriff3/remote_login/index.html. I was pointed to using the following logins:
25 13 Martin Kuemmel
* cip-sv-login01.cip.physik.uni-muenchen.de
26 13 Martin Kuemmel
* cip-sv-login02.cip.physik.uni-muenchen.de
27 13 Martin Kuemmel
28 13 Martin Kuemmel
but I am assuming the connections for Garching work as well. X2GO opens a KDE desktop, and of course the machine can connect to our cluster.
29 13 Martin Kuemmel
30 1 Martin Kuemmel
31 1 Martin Kuemmel
h2. Processing
32 1 Martin Kuemmel
33 1 Martin Kuemmel
* as on our local cluster "slurm" is being used as the job scheduling system. Access to the computing nodes and running jobs requires starting a corresponding slurm job;
34 1 Martin Kuemmel
* the partition of our cluster is "usm-cl";
35 1 Martin Kuemmel
* from the login node you can start an interactive job via "intjob --partition=usm-cl" (additional slurm arguments are accepted as well);
36 8 Martin Kuemmel
* I created a "python script":https://cosmofs3.kosmo.physik.uni-muenchen.de/attachments/download/285/scontrol.py which provides information on our partition (which jobs are running on which node, the owner of the job and so on);
37 11 Martin Kuemmel
* I have also put together a rather silly "slurm script":https://cosmofs3.kosmo.physik.uni-muenchen.de/attachments/download/283/test.slurm which can be used as a starting point;
38 11 Martin Kuemmel
* note that it is possible to directly "ssh" to all nodes on which one of your batch jobs is running. This can help to supervise the processing;
39 1 Martin Kuemmel
40 1 Martin Kuemmel
h2. Disk space
41 1 Martin Kuemmel
42 1 Martin Kuemmel
* users can create their own disk space under "/project/ls-mohr/users/" such as "/project/ls-mohr/users/martin.kuemmel";
43 1 Martin Kuemmel
44 1 Martin Kuemmel
h2. Installed software
45 1 Martin Kuemmel
46 1 Martin Kuemmel
We use a package manager called spack to download and install software that is not directly available from the linux distribution. To see what is already installed, do the following on a computing node:
47 1 Martin Kuemmel
48 1 Martin Kuemmel
* "module load spack"
49 1 Martin Kuemmel
* "module avail"
50 1 Martin Kuemmel
51 1 Martin Kuemmel
Adding more software is not a problem.
52 9 Martin Kuemmel
53 10 Martin Kuemmel
h2. Euclid processing on the cluster
54 10 Martin Kuemmel
55 10 Martin Kuemmel
While OS, libraries and setup is different from EDEN-?.?, it is possible to load and run in an EDEN-3.0 environment using a container solution. The cluster offers "singularity":https://sylabs.io/guides/3.0/user-guide/quick_start.html as a container solution. While singularity is not officially supported in Euclid, it is being used in a limited role, and singularity is able to run docker images, which is the supported container format in Euclid. To work in an EDEN-3.0 on the new cluster you need to get the docker image doing:
56 10 Martin Kuemmel
* load singularity via:
57 10 Martin Kuemmel
  <pre>
58 10 Martin Kuemmel
  $ module load spack
59 10 Martin Kuemmel
  $ module load singularity</pre> Note that the singularity version which is directly available on the computing nodes at "/usr/bin/singularity" does *not* work. The correct version loaded via the modules is at "/software/opt/focal/x86_64/singularity/v3.8.1/bin/singularity".
60 14 Martin Kuemmel
* it is *recommended* to move the singularity cache to somewhere under "/scratch-local", e.g. via:<pre>$ mkdir -p /scratch-local/$USER/singularity
61 14 Martin Kuemmel
$ export SINGULARITY_CACHEDIR=/scratch-local/$USER/singularity</pre> On the default cache location "/home/$HOME/.cache/singularity" there are problems deleting the entire cache when leaving singularity.
62 10 Martin Kuemmel
* pull the Euclid docker image via: <pre>singularity pull --docker-login docker://gitlab.euclid-sgs.uk:4567/st-tools/ct_xodeen_builder/dockeen</pre> With the gitlab credentials the docker image is stored in the file "dockeen_latest.sif"
63 10 Martin Kuemmel
64 10 Martin Kuemmel
The docker image can be run interactively:
65 12 Martin Kuemmel
 <pre>$ singularity run --bind /cvmfs/euclid.in2p3.fr:/cvmfs/euclid.in2p3.fr --bind /cvmfs/euclid-dev.in2p3.fr:/cvmfs/euclid-dev.in2p3.fr <path_to>dockeen_latest.sif</pre>
66 10 Martin Kuemmel
It is also possible to directly issue a command in EDEN-3.0:
67 12 Martin Kuemmel
 <pre>$ singularity exec --bind /cvmfs/euclid.in2p3.fr:/cvmfs/euclid.in2p3.fr --bind /cvmfs/euclid-dev.in2p3.fr:/cvmfs/euclid-dev.in2p3.fr <path_to>dockeen_latest.sif  <command_name></pre>
68 10 Martin Kuemmel
In both cases the relevant EDEN environment must first be loaded with:
69 10 Martin Kuemmel
<pre>
70 10 Martin Kuemmel
$ source /cvmfs/euclid-dev.in2p3.fr/CentOS7/EDEN-3.0/bin/activate
71 10 Martin Kuemmel
</pre>
72 10 Martin Kuemmel
73 10 Martin Kuemmel
Information on the usage of singularity in Euclid is available at the "Euclid Redmine":https://euclid.roe.ac.uk/projects/codeen-users/wiki/EDEN_SINGULARITY.
74 10 Martin Kuemmel
75 9 Martin Kuemmel
h2. Support
76 9 Martin Kuemmel
77 9 Martin Kuemmel
Support is provided by the IT support (Rechnerbetriebsgruppe) of the LMU faculty of physics with the helpdesk email: helpdesk@physik.uni-muenchen.de. Please keep Joe Mohr and me (Martin Kuemmel: mkuemmel@usm.lmu.de) in the loop such that we can maintain an overview on the cluster performance.
Redmine Appliance - Powered by TurnKey Linux