Chaining jobs between machines means launching two dependent jobs, with the second one starting after the first one has finished, but on different systems (Rosa and Julier in this case).
First you have to compile the programs that you need on the two machines: let's call myexe.rosa
the program running on Rosa and myexe.julier
the one that will run on Julier.
Then you can use the following scripts as templates for your chain of jobs: the first script (rosa.slurm
) will run on Rosa; after ending without errors, it will submit the second one (julier.slurm
), which will run on Julier instead.
Here you have a template for the first script:#!/bin/bash
#SBATCH --ntasks=32
#SBATCH --time=00:15:00
#SBATCH --job-name="production_run"
#SBATCH --output=rosa.out
aprun -n 32 ./myexe.rosa
sbatch -M julier --dependency=afterok:$SLURM_JOB_ID julier.slurm
The second script (julier.slurm
) will look like the following:#!/bin/bash
#SBATCH --cluster=julier
#SBATCH --ntasks=4
#SBATCH --time=00:01:00
#SBATCH --job-name="postproc"
#SBATCH --output=julier.out
# === do not delete === #
. /etc/bash.bashrc.local
. /etc/profile.d/modules.sh
export MODULEPATH=/apps/julier/modulefiles:/usr/share/Modules/default/modulefiles:/apps/ela/modulefiles
export LOADEDMODULES=
export _LMFILES_=
# ===================== #
module load slurm
module load julier
module load PrgEnv-gnu
srun -n 4 myexe.julier
exit 0
In order to submit the first one, just type sbatch rosa.slurm
on Rosa, as usual. The second one will be submitted on Julier when the first exits successfully on Rosa.