[ARVADOS] updated: 728faddf2628bf1ee9123a0673cb75f4d2ce74fe

Git user git at public.curoverse.com
Wed Aug 10 11:09:11 EDT 2016


Summary of changes:
 doc/_includes/_install_compute_docker.liquid               |  4 ++--
 .../crunch2-slurm/install-dispatch.html.textile.liquid     | 14 +++++++-------
 doc/install/crunch2-slurm/install-test.html.textile.liquid | 10 +++++-----
 3 files changed, 14 insertions(+), 14 deletions(-)

  discards  f662db70c8c5707cc51d55b4feaba6e0f74b5ef4 (commit)
  discards  2b89981afe484af4335a079580a0619b8997a27e (commit)
  discards  368fce7c8f4db5cd32427acb62dcc1ce146d0c37 (commit)
       via  728faddf2628bf1ee9123a0673cb75f4d2ce74fe (commit)
       via  19b9a0324dd6ffd179e09fa11e97d3d2c4b98fcb (commit)

This update added new revisions after undoing existing revisions.  That is
to say, the old revision is not a strict subset of the new revision.  This
situation occurs when you --force push a change and generate a repository
containing something like this:

 * -- * -- B -- O -- O -- O (f662db70c8c5707cc51d55b4feaba6e0f74b5ef4)
            \
             N -- N -- N (728faddf2628bf1ee9123a0673cb75f4d2ce74fe)

When this happens we assume that you've already had alert emails for all
of the O revisions, and so we here report only the revisions in the N
branch from the common base, B.

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.


commit 728faddf2628bf1ee9123a0673cb75f4d2ce74fe
Author: Brett Smith <brett at curoverse.com>
Date:   Tue Aug 9 13:58:41 2016 -0400

    9705: Add docker-cleaner unit file to Install Guide.

diff --git a/doc/_includes/_install_docker_cleaner.liquid b/doc/_includes/_install_docker_cleaner.liquid
index e26b2be..5671a54 100644
--- a/doc/_includes/_install_docker_cleaner.liquid
+++ b/doc/_includes/_install_docker_cleaner.liquid
@@ -6,34 +6,36 @@ The arvados-docker-cleaner program removes least recently used Docker images as
 This also removes all containers as soon as they exit, as if they were run with @docker run --rm at . If you need to debug or inspect containers after they stop, temporarily stop arvados-docker-cleaner or run it with @--remove-stopped-containers never at .
 {% include 'notebox_end' %}
 
-Install runit to supervise the Docker cleaner daemon.  {% include 'install_runit' %}
-
-Configure runit to run the image cleaner using a suitable quota for your compute nodes and workload:
+Create a file @/etc/systemd/system/arvados-docker-cleaner.service@ in an editor.  Include the text below as its contents.  Make sure to edit the @ExecStart@ line appropriately for your compute node.
 
 <notextile>
-<pre><code>~$ <span class="userinput">sudo mkdir -p /etc/sv</span>
-~$ <span class="userinput">cd /etc/sv</span>
-/etc/sv$ <span class="userinput">sudo mkdir arvados-docker-cleaner; cd arvados-docker-cleaner</span>
-/etc/sv/arvados-docker-cleaner$ <span class="userinput">sudo mkdir log log/main</span>
-/etc/sv/arvados-docker-cleaner$ <span class="userinput">sudo sh -c 'cat >log/run' <<'EOF'
-#!/bin/sh
-exec svlogd -tt main
-EOF</span>
-/etc/sv/arvados-docker-cleaner$ <span class="userinput">sudo sh -c 'cat >run' <<'EOF'
-#!/bin/sh
-if [ -d /opt/rh/python33 ]; then
-  source scl_source enable python33
-fi
-exec python3 -m arvados_docker.cleaner --quota <b>50G</b>
-EOF</span>
-/etc/sv/arvados-docker-cleaner$ <span class="userinput">sudo chmod +x run log/run</span>
-/etc/sv/arvados-docker-cleaner$ <span class="userinput">sudo ln -s "$(pwd)" /etc/service/</span>
+<pre><code>[Service]
+# Most deployments will want a quota that's at least 10G.  From there,
+# a larger quota can help reduce compute overhead by preventing reloading
+# the same Docker image repeatedly, but will leave less space for other
+# files on the same storage (usually Docker volumes).  Make sure the quota
+# is less than the total space available for Docker images.
+# If your deployment uses a Python 3 Software Collection, uncomment the
+# ExecStart line below, and delete the following one:
+# ExecStart=scl enable python33 "python3 -m arvados_docker.cleaner --quota <span class="userinput">20G</span>"
+ExecStart=python3 -m arvados_docker.cleaner --quota <span class="userinput">20G</span>
+Restart=always
+RestartPreventExitStatus=2
+
+[Install]
+WantedBy=default.target
+
+[Unit]
+After=docker.service
 </code></pre>
 </notextile>
 
-If you are using a different daemon supervisor, or if you want to test the daemon in a terminal window, an equivalent shell command to run arvados-docker-cleaner is:
+Then enable and start the service:
 
 <notextile>
-<pre><code><span class="userinput">python3 -m arvados_docker.cleaner --quota <b>50G</b></span>
+<pre><code>~$ <span class="userinput">sudo systemctl enable arvados-docker-cleaner.service</span>
+~$ <span class="userinput">sudo systemctl start arvados-docker-cleaner.service</span>
 </code></pre>
 </notextile>
+
+If you are using a different daemon supervisor, or if you want to test the daemon in a terminal window, use the command on the @ExecStart@ line above.

commit 19b9a0324dd6ffd179e09fa11e97d3d2c4b98fcb
Author: Brett Smith <brett at curoverse.com>
Date:   Mon Aug 8 13:56:08 2016 -0400

    9705: Add crunch-dispatch-slurm to the Install Guide.

diff --git a/doc/_config.yml b/doc/_config.yml
index b3b213b..8fb2ff7 100644
--- a/doc/_config.yml
+++ b/doc/_config.yml
@@ -162,6 +162,12 @@ navbar:
       - install/configure-azure-blob-storage.html.textile.liquid
       - install/install-keepproxy.html.textile.liquid
       - install/install-keep-web.html.textile.liquid
+    - Install Crunch v2 on SLURM:
+      - install/crunch2-slurm/install-prerequisites.html.textile.liquid
+      - install/crunch2-slurm/install-compute-node.html.textile.liquid
+      - install/crunch2-slurm/install-dispatch.html.textile.liquid
+      - install/crunch2-slurm/install-test.html.textile.liquid
+    - Install Crunch v1:
       - install/install-crunch-dispatch.html.textile.liquid
       - install/install-compute-node.html.textile.liquid
     - Helpful hints:
diff --git a/doc/_includes/_install_compute_docker.liquid b/doc/_includes/_install_compute_docker.liquid
index 915db02..1a2e21c 100644
--- a/doc/_includes/_install_compute_docker.liquid
+++ b/doc/_includes/_install_compute_docker.liquid
@@ -4,7 +4,7 @@ Compute nodes must have Docker installed to run containers.  This requires a rel
 
 For Debian-based systems, the Arvados package repository includes a backported @docker.io@ package with a known-good version you can install.
 
-h2. Configure the Docker daemon
+h2(#configure_docker_daemon). Configure the Docker daemon
 
 Crunch runs Docker containers with relatively little configuration.  You may need to start the Docker daemon with specific options to make sure these jobs run smoothly in your environment.  This section highlights options that are useful to most installations.  Refer to the "Docker daemon reference":https://docs.docker.com/reference/commandline/daemon/ for complete information about all available options.
 
@@ -31,14 +31,14 @@ To enable cgroups accounting, you must boot Linux with the command line paramete
 On Debian-based systems, open the file @/etc/default/grub@ in an editor.  Find where the string @GRUB_CMDLINE_LINUX@ is set.  Add @cgroup_enable=memory swapaccount=1@ to that string.  Save the file and exit the editor.  Then run:
 
 <notextile>
-<pre><code>$ <span class="userinput">sudo update-grub</span>
+<pre><code>~$ <span class="userinput">sudo update-grub</span>
 </code></pre>
 </notextile>
 
 On Red Hat-based systems, run:
 
 <notextile>
-<pre><code>$ <span class="userinput">sudo grubby --update-kernel=ALL --args='cgroup_enable=memory swapaccount=1'</span>
+<pre><code>~$ <span class="userinput">sudo grubby --update-kernel=ALL --args='cgroup_enable=memory swapaccount=1'</span>
 </code></pre>
 </notextile>
 
diff --git a/doc/install/crunch2-slurm/install-compute-node.html.textile.liquid b/doc/install/crunch2-slurm/install-compute-node.html.textile.liquid
new file mode 100644
index 0000000..19f8662
--- /dev/null
+++ b/doc/install/crunch2-slurm/install-compute-node.html.textile.liquid
@@ -0,0 +1,39 @@
+---
+layout: default
+navsection: installguide
+title: Set up a compute node
+...
+
+h2. Install dependencies
+
+First, "add the appropriate package repository for your distribution":{{ site.baseurl }}/install/install-manual-prerequisites.html#repos.
+
+{% include 'note_python_sc' %}
+
+On CentOS 6 and RHEL 6:
+
+<notextile>
+<pre><code>~$ <span class="userinput">sudo yum install python27-python-arvados-fuse crunch-run arvados-docker-cleaner</span>
+</code></pre>
+</notextile>
+
+On other Red Hat-based systems:
+
+<notextile>
+<pre><code>~$ <span class="userinput">echo 'exclude=python2-llfuse' | sudo tee -a /etc/yum.conf</span>
+~$ <span class="userinput">sudo yum install python-arvados-fuse crunch-run arvados-docker-cleaner</span>
+</code></pre>
+</notextile>
+
+On Debian-based systems:
+
+<notextile>
+<pre><code>~$ <span class="userinput">sudo apt-get install python-arvados-python-client crunch-run arvados-docker-cleaner</span>
+</code></pre>
+</notextile>
+
+{% include 'install_compute_docker' %}
+
+{% include 'install_compute_fuse' %}
+
+{% include 'install_docker_cleaner' %}
diff --git a/doc/install/crunch2-slurm/install-dispatch.html.textile.liquid b/doc/install/crunch2-slurm/install-dispatch.html.textile.liquid
new file mode 100644
index 0000000..1b4edc9
--- /dev/null
+++ b/doc/install/crunch2-slurm/install-dispatch.html.textile.liquid
@@ -0,0 +1,114 @@
+---
+layout: default
+navsection: installguide
+title: Install the SLURM dispatcher
+
+...
+
+The SLURM dispatcher can run on any node that can submit requests to both the Arvados API server and the SLURM controller.  It is not resource-intensive, so you can run it on the API server node.
+
+h2. Install the dispatcher
+
+First, "add the appropriate package repository for your distribution":{{ site.baseurl }}/install/install-manual-prerequisites.html#repos.
+
+On Red Hat-based systems:
+
+<notextile>
+<pre><code>~$ <span class="userinput">sudo yum install crunch-dispatch-slurm</span>
+~$ <span class="userinput">sudo systemctl enable crunch-dispatch-slurm</span>
+</code></pre>
+</notextile>
+
+On Debian-based systems:
+
+<notextile>
+<pre><code>~$ <span class="userinput">sudo apt-get install crunch-dispatch-slurm</span>
+</code></pre>
+</notextile>
+
+h2. Create a dispatcher token
+
+Create a privileged Arvados API token for use by the dispatcher. If you have multiple dispatch processes, you should give each one a different token.  *On the API server*, run:
+
+<notextile>
+<pre><code>apiserver:~$ <span class="userinput">cd /var/www/arvados-api/current</span>
+apiserver:/var/www/arvados-api/current$ <span class="userinput">sudo -u <b>webserver-user</b> RAILS_ENV=production bundle exec script/create_superuser_token.rb</span>
+zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
+</code></pre>
+</notextile>
+
+h2. Configure the dispatcher
+
+Set up crunch-dispatch-slurm's configuration directory:
+
+<notextile>
+<pre><code>~$ <span class="userinput">sudo mkdir -p /etc/arvados</span>
+~$ <span class="userinput">sudo install -d -o -root -g <b>crunch</b> -m 0750 /etc/arvados/crunch-dispatch-slurm</span>
+</code></pre>
+</notextile>
+
+Edit @/etc/arvados/crunch-dispatch-slurm/config.json@ to authenticate to your Arvados API server, using the token you generated in the previous step.  Follow this JSON format:
+
+<notextile>
+<pre><code class="userinput">{
+  "Client": {
+    "APIHost": <b>"zzzzz.arvadosapi.com"</b>,
+    "AuthToken": <b>"zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"</b>
+  }
+}
+</code></pre>
+</notextile>
+
+This is the only configuration required by crunch-dispatch-slurm.  The subsections below describe optional configuration flags you can set inside the main configuration object.
+
+h3. PollPeriod
+
+crunch-dispatch-slurm polls the API server periodically for new containers to run.  The @PollPeriod@ option controls how often this poll happens.  Set this to a string of numbers suffixed with one of the time units @ns@, @us@, @ms@, @s@, @m@, or @h at .  For example:
+
+<notextile>
+<pre><code class="userinput">"PollPeriod": "3m30s"
+</code></pre>
+</notextile>
+
+h3. SbatchArguments
+
+When crunch-dispatch-slurm invokes @sbatch@, you can add switches to the command by specifying @SbatchArguments at .  You can use this to send the jobs to specific cluster partitions or add resource requests.  Set @SbatchArguments@ to an array of strings.  For example:
+
+<notextile>
+<pre><code class="userinput">"SbatchArguments": ["--partition=PartitionName"]
+</code></pre>
+</notextile>
+
+h3. CrunchRunCommand: Dispatch to SLURM cgroups
+
+If your SLURM cluster uses the @task/cgroup@ TaskPlugin, you can configure Crunch's Docker containers to be dispatched inside SLURM's cgroups.  This provides consistent enforcement of resource constraints.  To do this, add the following to your crunch-dispatch-slurm configuration:
+
+<notextile>
+<pre><code class="userinput">"CrunchRunCommand": ["crunch-run", "-cgroup-parent-subsystem=<b>memory</b>"]
+</code></pre>
+</notextile>
+
+The choice of subsystem ("memory" in this example) must correspond to one of the resource types enabled in SLURM's @cgroup.conf at . Limits for other resource types will also be respected.  The specified subsystem is singled out only to let Crunch determine the name of the cgroup provided by SLURM.
+
+{% include 'notebox_begin' %}
+
+Some versions of Docker (at least 1.9), when run under systemd, require the cgroup parent to be specified as a systemd slice.  This causes an error when specifying a cgroup parent created outside systemd, such as those created by SLURM.
+
+You can work around this issue by disabling the Docker daemon's systemd integration.  This makes it more difficult to manage Docker services with systemd, but Crunch does not require that functionality, and it will be able to use SLURM's cgroups as container parents.  To do this, "configure the Docker daemon on all compute nodes":install-compute-node.html#configure_docker_daemon to run with the option @--exec-opt native.cgroupdriver=cgroupfs at .
+
+{% include 'notebox_end' %}
+
+h2. Restart the dispatcher
+
+{% include 'notebox_begin' %}
+
+The crunch-dispatch-slurm package includes configuration files for systemd.  If you're using a different init system, you'll need to configure a service to start and stop a @crunch-dispatch-slurm@ process as desired.  The process should run from a directory where the @crunch@ user has write permission on all compute nodes, such as its home directory or @/tmp at .  You do not need to specify any additional switches or environment variables.
+
+{% include 'notebox_end' %}
+
+Restart the dispatcher to run with your new configuration:
+
+<notextile>
+<pre><code>~$ <span class="userinput">sudo systemctl restart crunch-dispatch-slurm</span>
+</code></pre>
+</notextile>
diff --git a/doc/install/crunch2-slurm/install-prerequisites.html.textile.liquid b/doc/install/crunch2-slurm/install-prerequisites.html.textile.liquid
new file mode 100644
index 0000000..c4dc929
--- /dev/null
+++ b/doc/install/crunch2-slurm/install-prerequisites.html.textile.liquid
@@ -0,0 +1,9 @@
+---
+layout: default
+navsection: installguide
+title: Crunch v2 SLURM prerequisites
+...
+
+Crunch v2 containers can be dispatched to a SLURM cluster.  The dispatcher sends work to the cluster using SLURM's @sbatch@ command, so it works in a variety of SLURM configurations.
+
+In order to run containers, you must run the dispatcher as a user that has permission to set up FUSE mounts and run Docker containers on each compute node.  This install guide refers to this user as the @crunch@ user.  We recommend you create this user on each compute node with the same UID and GID, and add it to the @fuse@ and @docker@ system groups to grant it the necessary permissions.  However, you can run the dispatcher under any account with sufficient permissions across the cluster.
diff --git a/doc/install/crunch2-slurm/install-test.html.textile.liquid b/doc/install/crunch2-slurm/install-test.html.textile.liquid
new file mode 100644
index 0000000..4d101ee
--- /dev/null
+++ b/doc/install/crunch2-slurm/install-test.html.textile.liquid
@@ -0,0 +1,109 @@
+---
+layout: default
+navsection: installguide
+title: Test SLURM dispatch
+...
+
+h2. Test compute node setup
+
+You should now be able to submit SLURM jobs that run in Docker containers.  On the node where you're running the dispatcher, you can test this by running:
+
+<notextile>
+<pre><code>~$ <span class="userinput">sudo -u <b>crunch</b> srun -N1 docker run busybox echo OK
+</code></pre>
+</notextile>
+
+If it works, this command should print @OK@ (it may also show some status messages from SLURM and/or Docker).  If it does not print @OK@, double-check your compute node setup, and that the @crunch@ user can submit SLURM jobs.
+
+h2. Test the dispatcher
+
+On the dispatch node, start monitoring the crunch-dispatch-slurm logs:
+
+<notextile>
+<pre><code>~$ <span class="userinput">sudo journalctl -o cat -fu crunch-dispatch-slurm.service</span>
+</code></pre>
+</notextile>
+
+*On your shell server*, submit a simple container request:
+
+<notextile>
+<pre><code>shell:~$ <span class="userinput">arv container_request create --container-request '{
+  "name":            "test",
+  "state":           "Committed",
+  "priority":        1,
+  "container_image": "arvados/jobs:latest",
+  "command":         ["echo", "Hello, Crunch!"],
+  "output_path":     "/out",
+  "mounts": {
+    "/out": {
+      "kind":        "tmp",
+      "capacity":    1000
+    }
+  },
+  "runtime_constraints": {
+    "vcpus": 1,
+    "ram": 8388608
+  }
+}'</span>
+</code></pre>
+</notextile>
+
+This command should return a record with a @container_uuid@ field.  Once crunch-dispatch-slurm polls the API server for new containers to run, you should see it dispatch that same container.  It will log messages like:
+
+<notextile>
+<pre><code>2016/08/05 13:52:54 Monitoring container zzzzz-dz642-hdp2vpu9nq14tx0 started
+2016/08/05 13:53:04 About to submit queued container zzzzz-dz642-hdp2vpu9nq14tx0
+2016/08/05 13:53:04 sbatch succeeded: Submitted batch job 8102
+</code></pre>
+</notextile>
+
+If you do not see crunch-dispatch-slurm try to dispatch the container, double-check that it is running and that the API hostname and token in @/etc/arvados/crunch-dispatch-slurm/config.json@ are correct.
+
+Before the container finishes, SLURM's @squeue@ command will show the new job in the list of queued and running jobs.  For example, you might see:
+
+<notextile>
+<pre><code>~$ <span class="userinput">squeue --long</span>
+Fri Aug  5 13:57:50 2016
+  JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
+   8103   compute zzzzz-dz   crunch  RUNNING       1:56 UNLIMITED      1 compute0
+</code></pre>
+</notextile>
+
+The job's name corresponds to the container's UUID.  You can get more information about it by running, e.g., <notextile><code>scontrol show job Name=<b>UUID</b></code></notextile>.
+
+When the container finishes, the dispatcher will log that, with the final result:
+
+<notextile>
+<pre><code>2016/08/05 13:53:14 Container zzzzz-dz642-hdp2vpu9nq14tx0 now in state "Complete" with locked_by_uuid ""
+2016/08/05 13:53:14 Monitoring container zzzzz-dz642-hdp2vpu9nq14tx0 finished
+</code></pre>
+</notextile>
+
+After the container finishes, you can get the container record by UUID *from a shell server* to see its results:
+
+<notextile>
+<pre><code>shell:~$ <span class="userinput">arv get <b>zzzzz-dz642-hdp2vpu9nq14tx0</b></span>
+{
+ ...
+ "exit_code":0,
+ "log":"a01df2f7e5bc1c2ad59c60a837e90dc6+166",
+ "output":"d41d8cd98f00b204e9800998ecf8427e+0",
+ "state":"Complete",
+ ...
+}
+</code></pre>
+</notextile>
+
+You can use standard Keep tools to view the container's output and logs from their corresponding fields.  For example, to see the logs from the collection referenced in the @log@ field:
+
+<notextile>
+<pre><code>~$ <span class="userinput">arv keep ls <b>a01df2f7e5bc1c2ad59c60a837e90dc6+166</b></span>
+./crunch-run.txt
+./stderr.txt
+./stdout.txt
+~$ <span class="userinput">arv keep get <b>a01df2f7e5bc1c2ad59c60a837e90dc6+166</b>/stdout.txt</span>
+2016-08-05T13:53:06.201011Z Hello, Crunch!
+</code></pre>
+</notextile>
+
+If the container does not dispatch successfully, refer to the crunch-dispatch-slurm logs for information about why it failed.

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list