Project

General

Profile

Slurm » History » Version 64

Martin Kuemmel, 04/24/2017 11:12 AM

1 21 Kerstin Paech
{{toc}}
2 21 Kerstin Paech
3 53 Sebastian Bocquet
h1. Hardware overview
4 53 Sebastian Bocquet
5 59 Martin Kuemmel
You access the Euclid cluster through cosmofs1@usm.uni-muenchen.de
6 53 Sebastian Bocquet
7 59 Martin Kuemmel
* cosmofs1 is the file server and should *not* be used for computing
8 64 Martin Kuemmel
* There are 10 compute nodes named euclides01--euclides10
9 64 Martin Kuemmel
* euclides10 is only available for debugging, see below
10 53 Sebastian Bocquet
* each node has 32 logical CPUs and 64GB of RAM
11 53 Sebastian Bocquet
12 46 Roy Henderson
h1. How to run jobs on the euclides nodes (using Slurm)
13 1 Kerstin Paech
14 59 Martin Kuemmel
Use slurm to submit jobs or login to the euclides nodes (euclides01-11).
15 42 Kerstin Paech
16 9 Kerstin Paech
*Please read through this entire wikipage so everyone can make efficient use of this cluster*
17 9 Kerstin Paech
18 60 Martin Kuemmel
h2. cosmofs1
19 1 Kerstin Paech
20 59 Martin Kuemmel
*Please do not use cosmofs1 as a compute node* - it's hardware is different from the nodes. It hosts our file server and other services that are important to us. 
21 1 Kerstin Paech
22 59 Martin Kuemmel
You should use cosmofs1 to
23 1 Kerstin Paech
* transfer files
24 51 Sebastian Bocquet
* compile your code
25 51 Sebastian Bocquet
* submit jobs to the nodes
26 51 Sebastian Bocquet
27 51 Sebastian Bocquet
If you need to debug and would like to login to a node, please start an interactive job to one of the nodes using slurm. For instructions see below.
28 51 Sebastian Bocquet
29 51 Sebastian Bocquet
h2. euclides nodes
30 51 Sebastian Bocquet
31 1 Kerstin Paech
32 1 Kerstin Paech
Job submission to the euclides nodes is handled by the slurm jobmanager (see http://slurm.schedmd.com and https://computing.llnl.gov/linux/slurm/). 
33 52 Sebastian Bocquet
*Important: In order to run jobs, you need to be added to the slurm accounting system - please contact the admin*
34 1 Kerstin Paech
35 4 Kerstin Paech
All slurm commands listed below have very helpful man pages (e.g. man slurm, man squeue, ...). 
36 4 Kerstin Paech
37 4 Kerstin Paech
If you are already familiar with another jobmanager the following information may be helpful to you http://slurm.schedmd.com/rosetta.pdf‎.
38 1 Kerstin Paech
39 1 Kerstin Paech
h3. Scheduling of Jobs
40 1 Kerstin Paech
41 9 Kerstin Paech
At this point there are two queues, called partitions in slurm: 
42 9 Kerstin Paech
* *normal* which is the default partition your jobs will be sent to if you do not specify it otherwise. At this point there is a time limit of
43 9 Kerstin Paech
two days. Jobs at this point can only run on 1 node.
44 16 Kerstin Paech
* *debug* which is meant for debugging, you can only run one job at a time, other jobs submitted will remain in the queue. Time limit is
45 16 Kerstin Paech
12 hours.
46 1 Kerstin Paech
47 38 Kerstin Paech
The default memory per core used is 2GB, if you need more or less, please specify with the --mem or --mem-per-cpu option.
48 38 Kerstin Paech
49 9 Kerstin Paech
We have also set up a scheduler that goes beyond the first come first serve - some jobs will be favoured over others depending
50 9 Kerstin Paech
on how much you or your group have been using euclides in the past 2 weeks, how long the job has been queued and how much
51 9 Kerstin Paech
resources it will consume.
52 9 Kerstin Paech
53 9 Kerstin Paech
This is serves as a starting point, we may have to adjust parameters once the slurm jobmanager is used. Job scheduling is a complex
54 9 Kerstin Paech
issue and we still need to build expertise and gain experience what are the user needs in our groups. Please feel free to speak out if
55 9 Kerstin Paech
there is something that can be improved without creating an unfair disadvantage for other users.
56 9 Kerstin Paech
57 9 Kerstin Paech
You can run interactive jobs on both partitions.
58 9 Kerstin Paech
59 41 Kerstin Paech
h3. Running an interactive job with slurm (a.k.a. logging in)
60 1 Kerstin Paech
61 9 Kerstin Paech
To run an interactive job with slurm in the default partition, use
62 1 Kerstin Paech
63 1 Kerstin Paech
<pre>
64 14 Kerstin Paech
srun -u --pty bash
65 1 Kerstin Paech
</pre>
66 9 Kerstin Paech
67 15 Shantanu Desai
If you want to use tcsh use
68 15 Shantanu Desai
69 15 Shantanu Desai
<pre>
70 15 Shantanu Desai
srun -u --pty tcsh
71 15 Shantanu Desai
</pre>
72 15 Shantanu Desai
73 30 Shantanu Desai
If you want to use a larger memory per job do
74 30 Shantanu Desai
75 30 Shantanu Desai
<pre>
76 31 Shantanu Desai
srun -u --mem-per-cpu=8000 --pty tcsh
77 30 Shantanu Desai
</pre>
78 30 Shantanu Desai
79 20 Kerstin Paech
In case you want to open x11 applications, use the --x11=first option, e.g.
80 20 Kerstin Paech
<pre>
81 20 Kerstin Paech
srun --x11=first -u   --pty  bash
82 20 Kerstin Paech
</pre>
83 20 Kerstin Paech
84 9 Kerstin Paech
In case the 'normal' partition is overcrowded, to use the 'debug' partition, use:
85 9 Kerstin Paech
<pre>
86 14 Kerstin Paech
srun --account cosmo_debug -p debug -u --pty bash # if you are part of the Cosmology group
87 14 Kerstin Paech
srun --account euclid_debug -p debug -u --pty bash  # if you are part of the EuclidDM group
88 12 Kerstin Paech
</pre> As soon as a slot is open, slurm will log you in to an interactive session on one of the nodes.
89 1 Kerstin Paech
90 44 Kerstin Paech
h3. limited ssh access
91 44 Kerstin Paech
92 44 Kerstin Paech
If you have an active job (batch or interactive), you can login to the node the job is running on. Your ssh session will be killed if the job terminates. Your ssh session will be restricted to the same resources as your job (so you cannot accidentally bypass the job scheduler and harm other user's jobs).
93 44 Kerstin Paech
94 10 Kerstin Paech
h3. Running a simple once core batch job with slurm using the default partition
95 1 Kerstin Paech
96 1 Kerstin Paech
* To see what queues are available to you (called partitions in slurm), run:
97 1 Kerstin Paech
<pre>
98 1 Kerstin Paech
sinfo
99 1 Kerstin Paech
</pre>
100 1 Kerstin Paech
101 1 Kerstin Paech
* To run slurm, create a myjob.slurm containing the following information:
102 1 Kerstin Paech
<pre>
103 1 Kerstin Paech
#!/bin/bash
104 1 Kerstin Paech
#SBATCH --output=slurm.out
105 1 Kerstin Paech
#SBATCH --error=slurm.err
106 1 Kerstin Paech
#SBATCH --mail-user <put your email address here>
107 1 Kerstin Paech
#SBATCH --mail-type=BEGIN
108 8 Kerstin Paech
#SBATCH -p normal
109 1 Kerstin Paech
110 1 Kerstin Paech
/bin/hostname
111 1 Kerstin Paech
</pre>
112 1 Kerstin Paech
113 1 Kerstin Paech
* To submit a batch job use:
114 1 Kerstin Paech
<pre>
115 1 Kerstin Paech
sbatch myjob.slurm
116 1 Kerstin Paech
</pre>
117 1 Kerstin Paech
118 1 Kerstin Paech
* To see the status of you job, use 
119 1 Kerstin Paech
<pre>
120 1 Kerstin Paech
squeue
121 1 Kerstin Paech
</pre>
122 1 Kerstin Paech
123 11 Kerstin Paech
* To kill a job use:
124 11 Kerstin Paech
<pre>
125 11 Kerstin Paech
scancel <jobid>
126 11 Kerstin Paech
</pre> the <jobid> you can get from using squeue.
127 11 Kerstin Paech
128 1 Kerstin Paech
* For some more information on your job use
129 1 Kerstin Paech
<pre>
130 1 Kerstin Paech
scontrol show job <jobid>
131 11 Kerstin Paech
</pre>the <jobid> you can get from using squeue.
132 1 Kerstin Paech
133 10 Kerstin Paech
h3. Running a simple once core batch job with slurm using the debug partition
134 10 Kerstin Paech
135 10 Kerstin Paech
Change the partition to debug and add the appropriate account depending if you're part of
136 10 Kerstin Paech
the euclid or cosmology group.
137 10 Kerstin Paech
138 10 Kerstin Paech
<pre>
139 10 Kerstin Paech
#!/bin/bash
140 10 Kerstin Paech
#SBATCH --output=slurm.out
141 10 Kerstin Paech
#SBATCH --error=slurm.err
142 10 Kerstin Paech
#SBATCH --mail-user <put your email address here>
143 10 Kerstin Paech
#SBATCH --mail-type=BEGIN
144 57 Martin Kuemmel
#SBATCH --account [cosmo_debug/euclid_debug]
145 10 Kerstin Paech
#SBATCH -p debug
146 10 Kerstin Paech
147 10 Kerstin Paech
/bin/hostname
148 10 Kerstin Paech
</pre>
149 10 Kerstin Paech
150 22 Kerstin Paech
h3. Accessing a node where a job is running or starting additional processes on a node
151 22 Kerstin Paech
152 25 Kerstin Paech
You can attach an srun command to an already existing job (batch or interactive). This
153 22 Kerstin Paech
means you can start an interactive session on a node where a job of yours is running
154 26 Kerstin Paech
or start an additional process.
155 22 Kerstin Paech
156 22 Kerstin Paech
First determine the jobid of the desired job using squeue, then use 
157 22 Kerstin Paech
158 22 Kerstin Paech
<pre>
159 22 Kerstin Paech
srun  --jobid <jobid> [options] <executable> 
160 22 Kerstin Paech
</pre>
161 22 Kerstin Paech
Or more concrete
162 22 Kerstin Paech
<pre>
163 22 Kerstin Paech
srun  --jobid <jobid> -u --pty  bash # to start an interactive session
164 22 Kerstin Paech
srun  --jobid <jobid> ps -eaFAl  # to start get detailed process information 
165 22 Kerstin Paech
</pre>
166 22 Kerstin Paech
167 24 Kerstin Paech
The processes will only run on cores that have been allocated to you. This works 
168 24 Kerstin Paech
for batch as well as interactive jobs. 
169 23 Kerstin Paech
*Important: If the original job that was submitted is finished, any process 
170 23 Kerstin Paech
attached in this fashion will be killed.*
171 22 Kerstin Paech
172 10 Kerstin Paech
173 6 Kerstin Paech
h3. Batch script for running a multi-core job
174 6 Kerstin Paech
175 61 Martin Kuemmel
mpi is installed on cosmofs1.
176 17 Kerstin Paech
177 18 Kerstin Paech
To run a 4 core job for an executable compiled with mpi you can use
178 6 Kerstin Paech
<pre>
179 6 Kerstin Paech
#!/bin/bash
180 6 Kerstin Paech
#SBATCH --output=slurm.out
181 6 Kerstin Paech
#SBATCH --error=slurm.err
182 6 Kerstin Paech
#SBATCH --mail-user <put your email address here>
183 6 Kerstin Paech
#SBATCH --mail-type=BEGIN
184 6 Kerstin Paech
#SBATCH -n 4
185 1 Kerstin Paech
186 18 Kerstin Paech
mpirun <programname>
187 1 Kerstin Paech
188 1 Kerstin Paech
</pre>
189 18 Kerstin Paech
and it will automatically start on the number of nodes specified.
190 1 Kerstin Paech
191 18 Kerstin Paech
To ensure that the job is being executed on only one node, add
192 18 Kerstin Paech
<pre>
193 18 Kerstin Paech
#SBATCH -n 4
194 18 Kerstin Paech
</pre>
195 18 Kerstin Paech
to the job script.
196 17 Kerstin Paech
197 19 Kerstin Paech
If you would like to run a program that itself starts processes, you can use the
198 19 Kerstin Paech
environment variable $SLURM_NPROCS that is automatically defined for slurm
199 19 Kerstin Paech
jobs to explicitly pass the number of cores the program can run on.
200 19 Kerstin Paech
201 17 Kerstin Paech
To check if your job is acutally running on the specified number of cores, you can check
202 17 Kerstin Paech
the PSR column of
203 17 Kerstin Paech
<pre>
204 17 Kerstin Paech
ps -eaFAl
205 17 Kerstin Paech
# or ps -eaFAl | egrep "<yourusername>|UID" if you just want to see your jobs
206 6 Kerstin Paech
</pre>
207 27 Jiayi Liu
208 28 Kerstin Paech
h3. environment for jobs
209 27 Jiayi Liu
210 29 Kerstin Paech
By default, slurm does not initialize the environment (using .bashrc, .profile, .tcshrc, ...)
211 29 Kerstin Paech
212 28 Kerstin Paech
To use your usual system environment, add the following line in the submission script:
213 27 Jiayi Liu
<pre>
214 27 Jiayi Liu
#SBATCH --get-user-env
215 1 Kerstin Paech
</pre>
216 1 Kerstin Paech
217 58 Martin Kuemmel
h2. desdb node
218 58 Martin Kuemmel
219 58 Martin Kuemmel
Some specific jobs in cosmodb, such as the "catalog ingest", need to be performed on the machines desdb1/2. For those jobs there is the slurm account "euclid_cat_ing" with the partition "cat_ing". Only selected persons from the Euclid group have access to this node. Please specify "-p cat_ing" and "--account euclid_cat_ing" on the command line or in the slurm script.
220 28 Kerstin Paech
221 28 Kerstin Paech
h2. Software specific setup
222 28 Kerstin Paech
223 28 Kerstin Paech
h3. Python environment 
224 28 Kerstin Paech
225 28 Kerstin Paech
You can use the python 2.7.3 installed on the euclides cluster by using
226 27 Jiayi Liu
227 27 Jiayi Liu
<pre>
228 27 Jiayi Liu
source /data2/users/ccsoft/etc/setup_all
229 37 Kerstin Paech
source  /data2/users/ccsoft/etc/setup_python2.7.3
230 33 Shantanu Desai
</pre>
231 32 Shantanu Desai
232 32 Shantanu Desai
233 34 Shantanu Desai
h2. Notes For Euclid users
234 32 Shantanu Desai
235 35 Shantanu Desai
For those submitting jobs to euclides* nodes through Cosmo DM pipeline  here are some things which need to be specified for customized job submissions,
236 35 Shantanu Desai
since a different interface to slurm is used.
237 34 Shantanu Desai
238 34 Shantanu Desai
* To use larger memory per block , specify max_memory = 6000 (for 6G) and so on. inside block definition or in the submit file (in
239 34 Shantanu Desai
case you want to use it for all blocks)
240 34 Shantanu Desai
241 34 Shantanu Desai
* If you want to run on multiple cores/cores then use 
242 34 Shantanu Desai
nodes='<number of nodes>:ppn=<number of cores> inside the block definition of a particular block or in the submit file in case you want
243 1 Kerstin Paech
to use it for all blocks.
244 34 Shantanu Desai
245 35 Shantanu Desai
* If you want to use a larger wall time then specify wall_mod=<wall time in minutes> inside the module definition
246 39 Shantanu Desai
247 61 Martin Kuemmel
* note that queue=serial does not work on cosmofs1 (we usually use it for c2pap)
248 45 Roy Henderson
249 45 Roy Henderson
h1. Admin
250 45 Roy Henderson
251 49 Martin Kuemmel
There is a user "slurm" which however is not really necessary for the administration work. The slurm administrator needs sudo access. Some script for adding a user and similar things are in "/data1/users/slurm". With the sudo access the admin can execute those scripts. In the mysql database there is the username "slurmdb" with password.
252 48 Martin Kuemmel
253 63 Martin Kuemmel
254 63 Martin Kuemmel
h2. Slurm configuration
255 63 Martin Kuemmel
256 63 Martin Kuemmel
h3. Slurm configuration file
257 63 Martin Kuemmel
258 63 Martin Kuemmel
The currently valid version of the configuration file is "/data1/users/slurm/slurm.conf". To apply a modified slurm configuration, the script "newconfig.sh" can be used. 
259 63 Martin Kuemmel
260 63 Martin Kuemmel
The script 
261 63 Martin Kuemmel
262 63 Martin Kuemmel
* copies the configuration file to the submit node and restarts the submit service;
263 63 Martin Kuemmel
* copies the configuration file to all computing nodes and triggers the reconfiguration there;
264 63 Martin Kuemmel
265 63 Martin Kuemmel
Then the slurm daemon needs to be started on the submit and all computing nodes with the script "restart.sh". 
266 63 Martin Kuemmel
267 62 Martin Kuemmel
h2. User management
268 1 Kerstin Paech
269 62 Martin Kuemmel
h3. Overview over users, accounts, etc.
270 62 Martin Kuemmel
271 50 Sebastian Bocquet
No sudo access needed:
272 50 Sebastian Bocquet
<pre>
273 50 Sebastian Bocquet
/usr/local/bin/sacctmgr show account withassoc
274 1 Kerstin Paech
</pre>
275 1 Kerstin Paech
276 62 Martin Kuemmel
h3. Adding a new user
277 45 Roy Henderson
278 62 Martin Kuemmel
As root on @cosmofs1@,
279 45 Roy Henderson
280 45 Roy Henderson
<pre>
281 55 Sebastian Bocquet
cd /data1/users/slurm/
282 1 Kerstin Paech
./add_user.sh UserName account(cosmo or euclid)
283 45 Roy Henderson
/usr/local/bin/.scontrol reconfigure
284 45 Roy Henderson
</pre>
285 62 Martin Kuemmel
286 45 Roy Henderson
h3. To increase memory, cores etc for a user
287 45 Roy Henderson
288 45 Roy Henderson
Inside script above, various commands for changing user settings, e.g.
289 1 Kerstin Paech
290 1 Kerstin Paech
<pre>
291 1 Kerstin Paech
/usr/local/bin/sacctmgr -i modify user  name=$1 set GrpCPUs=32
292 45 Roy Henderson
/usr/local/bin/sacctmgr -i modify user  name=$1 set GrpMem=128000
293 45 Roy Henderson
</pre>
294 62 Martin Kuemmel
295 62 Martin Kuemmel
h2. Trouble shooting
296 1 Kerstin Paech
297 63 Martin Kuemmel
h3. Information on a particular node
298 1 Kerstin Paech
299 63 Martin Kuemmel
The command "/usr/local/bin/scontrol show node <nodename>" gives detailed information on a particular node (status, reason for being down and so on)
300 63 Martin Kuemmel
301 63 Martin Kuemmel
h3. Node in state "drain"
302 63 Martin Kuemmel
303 50 Sebastian Bocquet
When a node is in "drain" state when calling <pre>sinfo</pre>
304 50 Sebastian Bocquet
run
305 50 Sebastian Bocquet
<pre>
306 50 Sebastian Bocquet
/usr/local/bin/scontrol update nodename=NODE_NAME state=resume
307 50 Sebastian Bocquet
</pre>
308 50 Sebastian Bocquet
to put it back to operation.
309 48 Martin Kuemmel
310 48 Martin Kuemmel
h2. Nodes down
311 48 Martin Kuemmel
312 1 Kerstin Paech
Sometimes nodes are reported as "down". This seems to happen as a result of network problems. Here is some "troubleshooting":https://computing.llnl.gov/linux/slurm/troubleshoot.html#nodes for this situation. Also after a re-boot of cosmofs1 some manual work on slurm might be necessary to get going again.
313 63 Martin Kuemmel
314 63 Martin Kuemmel
h2. History
315 63 Martin Kuemmel
316 63 Martin Kuemmel
* April 07th 2017: Applying "/usr/local/bin/scontrol show node euclides11" for the debug partition euclides11 says "Reason=Node unexpectedly rebooted [root@2016-12-14T13:25:01]"; internet research suggested to change "ReturnToService=" from 1 to 2 in the configuration file; after applying and restarting the new configuration file the debug nodes works again.;
317 63 Martin Kuemmel
318 63 Martin Kuemmel
* April 06th 2017: After the reconfiguration of the cluster the slurm confguration file was adjusted (to reflect the new machine names); also minor changes had to be applied to the scripts "newconfig.sh" and "restart.sh" to loop over the new names; the new configuration files were applied and slurm restarted; all computing nodes for the normal partition came up, the debug partition stayed down;
319 63 Martin Kuemmel
320 63 Martin Kuemmel
* March 29th 2017: euclides7 is in drain state;  "/usr/local/bin/scontrol show node euclides2" says "Reason=Epilog error"; when resumed, seems to work normal;
321 63 Martin Kuemmel
322 63 Martin Kuemmel
* March 28th 2017: euclides2 is in drain state; when resumed, it goes into drain state when using it the next time; "/usr/local/bin/scontrol show node euclides2" says "Reason=Prolog error"; after a reboot the machine was in status "idle*"; when resumed, it worked again;
Redmine Appliance - Powered by TurnKey Linux