Jobs (both batch and interactive sessions) on EDI should be run through slurm resource manager. For the quick overview of slurm you can refer to the video: link
- Two partitions are available -
- GPU partition has higher priority.
- No limits are currently enforced on the time of execution.
- Constraints (
gtx1080) can be used to select certain GPU architectures.
To get the information on the currently running jobs run
~$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 87719 gpu interact username R 11-18:07:21 1 edi08
~$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST cpu up infinite 1 drain* edi03 cpu up infinite 1 mix edi08 cpu up infinite 7 idle edi[00-02,04-07] gpu up infinite 1 drain* edi03 gpu up infinite 1 mix edi08 gpu up infinite 6 idle edi[01-02,04-07]
EDI is commonly used for interactive work with data, e.g. performing ad-hoc analyses and visualizations
with python and jupyter-notebooks.
To facilitate allocating resources for interactive sessions a convenient wrapper (
alloc) has been prepared.
You can tweak your allocation depending on work needs, see the following table for details and examples.
alloc options are as follows:
|-n||Number of cores used allocated for the job (default = 1, max = 36)|
|-g||Number of GPUs allocated for the job (default = 0, max = 2)|
|-m||Amount of memory (in GBs) per allocated core allocated for the job (default = 1, max = 60)|
|-w||Host to start your session (default = host you are running alloc on)|
To obtain an allocation on edi02 with 1 gpu and 6 cores and a total of 12 GB of memory:
alloc -n 6 -w edi02 -g 1 -m 2
Longer, resource demanding jobs typically should be scheduled in SLURM batch mode. Below you can find the example of the SLURM batch script that you can use to schedule a job:
Suppose the following
job.sh batch file:
#!/bin/bash #SBATCH -p gpu # GPU partition #SBATCH -n 8 # 8 cores #SBATCH --gres=gpu:1 # 1 GPU #SBATCH --mem=30GB # 30 GB of RAM #SBATCH -J job_name # name of your job your_program -i input_file -o output_path
~$ sbatch job.sh Submitted batch job 1234