Running Distributed Experiments on the DAS5ΒΆ

In the previous tutorials we have devised experiments that spawn multiple instances on a single computer. In this tutorial, we will show how to run an experiment on the DAS5 compute cluster. The DAS5 is a Dutch nation-wide compute infrastructure that consists of multiple clusters, managed by different universities. This tutorial assumes that the reader has access to the DAS5 head nodes.

The experiment we will run on the DAS5 is the same experiment that we described in the previous tutorial. In this experiment, we will spawn multiple (synchronized) instances and each instance will write its ID to a file five seconds after the experiment starts. The experiment is started on the DAS5 head node and before the instances spawn, Gumby automatically reserves a certain number of compute nodes. Each compute node then spawns a certain number of instances, depending on the experiment configuration. When the experiment ends, all data generated by instances is collected by the head node.

The configuration file for this DAS5 experiment looks as follows:

experiment_name = synchronized_instances_das5
instances_to_run = 16
local_instance_cmd = das4_reserve_and_run.sh
post_process_cmd = post_process_write_ids.sh
scenario_file = write_ids.scenario
sync_port = __unique_port__

# The command that is executed prior to starting the experiment. This script prepares the DAS5 environment.
local_setup_cmd = das4_setup.sh

# We use a venv on the DAS5 since installing packages might lead to conflicts with other experiments.
use_local_venv = TRUE

# The number of DAS5 compute nodes to use.
node_amount = 2

# The experiment timeout after which the connection with the compute node is closed.
node_timeout = 20

# What command do we want to run?
das4_node_command = launch_scenario.py

The new configuration options are annotated with some explanation. It includes a local_setup_cmd configuration option that is executed before the experiment starts. The das4_setup.sh script checks the user quote on the DAS5 and invokes the build_virtualenv.sh script that prepares a virtual environment with various Python packages. To use this virtual environment, the use_local_venv option is set.

Additionally, there are a few configuration options that are specific to DAS5 experiments. The node_amount configuration option indicates how many DAS5 compute nodes are used. The maximum number of compute nodes in each cluster can be found here. In our experiment, we spawn 16 instances and use 2 compute nodes. Gumby automatically balances instances over compute nodes and in our experiment, each compute node hosts 8 instances. The node_timeout configuration option indicates the timeout of the experiment. To prevent premature termination of an experiment, we recommend to set this value a bit higher than the time of the latest event in the scenario file.

To run this experiment, execute the following command on one of the DAS5 head nodes:

$ gumby/run.py gumby/docs/tutorials/synchronized_instances_das5.conf