[ARVADOS] created: 2.1.0-1828-g7f2bd2f6e

Thu Jan 13 22:21:46 UTC 2022

at  7f2bd2f6ed4081252e650ad0e6c0eea35433e132 (commit)


commit 7f2bd2f6ed4081252e650ad0e6c0eea35433e132
Author: Peter Amstutz <peter.amstutz at curii.com>
Date:   Thu Jan 13 17:20:39 2022 -0500

    18326: Update documentation to cover configuring GPU support
    
    Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz at curii.com>

diff --git a/doc/_includes/_install_cuda.liquid b/doc/_includes/_install_cuda.liquid
new file mode 100644
index 000000000..cb1519a61
--- /dev/null
+++ b/doc/_includes/_install_cuda.liquid
@@ -0,0 +1,21 @@
+{% comment %}
+Copyright (C) The Arvados Authors. All rights reserved.
+
+SPDX-License-Identifier: CC-BY-SA-3.0
+{% endcomment %}
+
+h2(#cuda). Install NVIDA CUDA Toolkit (optional)
+
+If you want to use NVIDIA GPUs, "install the CUDA toolkit.":https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
+
+In addition, you also must install the NVIDIA Container Toolkit:
+
+<pre>
+DIST=$(. /etc/os-release; echo $ID$VERSION_ID)
+curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | \
+  sudo apt-key add -
+curl -s -L https://nvidia.github.io/libnvidia-container/$DIST/libnvidia-container.list | \
+  sudo tee /etc/apt/sources.list.d/libnvidia-container.list
+sudo apt-get update
+apt-get install libnvidia-container1 libnvidia-container-tools nvidia-container-toolkit
+</pre>
diff --git a/doc/install/crunch2-cloud/install-compute-node.html.textile.liquid b/doc/install/crunch2-cloud/install-compute-node.html.textile.liquid
index a56519fb1..6b8c85f09 100644
--- a/doc/install/crunch2-cloud/install-compute-node.html.textile.liquid
+++ b/doc/install/crunch2-cloud/install-compute-node.html.textile.liquid
@@ -113,10 +113,16 @@ Options:
       Path to the public key file that a-d-c will use to log into the compute node
   --mksquashfs-mem (default: 256M)
       Only relevant when using Singularity. This is the amount of memory mksquashfs is allowed to use.
-  --debug
-      Output debug information (default: false)
+  --nvidia-gpu-support (default: false)
+      Install all the necessary tooling for Nvidia GPU support
+  --debug (default: false)
+      Output debug information
 </code></pre></notextile>
 
+h2(#building). NVIDIA GPU support
+
+If you plan on using instance types with NVIDIA GPUs, add @--nvidia-gpu-support@ to the build command line.  Arvados uses the same compute image for both GPU and non-GPU instance types.  The GPU tooling is ignored when using the image with a non-GPU instance type.
+
 h2(#aws). Build an AWS image
 
 <notextile><pre><code>~$ <span class="userinput">./build.sh --json-file arvados-images-aws.json \
diff --git a/doc/install/crunch2-cloud/install-dispatch-cloud.html.textile.liquid b/doc/install/crunch2-cloud/install-dispatch-cloud.html.textile.liquid
index b4987f443..cb3143b02 100644
--- a/doc/install/crunch2-cloud/install-dispatch-cloud.html.textile.liquid
+++ b/doc/install/crunch2-cloud/install-dispatch-cloud.html.textile.liquid
@@ -74,6 +74,28 @@ Add or update the following portions of your cluster configuration file, @config
 </code></pre>
 </notextile>
 
+h4. NVIDIA GPU support
+
+To specify instance types with NVIDIA GPUs, you must include an additional @CUDA@ section:
+
+<notextile>
+<pre><code>    Services:
+    InstanceTypes:
+      g4dn:
+        ProviderType: g4dn.xlarge
+        VCPUs: 4
+        RAM: 16GiB
+        IncludedScratch: 125GB
+        Price: 0.56
+	CUDA:
+          DriverVersion: "11.4"
+          HardwareCapability: "7.5"
+          DeviceCount: 1
+</code></pre>
+</notextile>
+
+The @DriverVersion@ is the version of the CUDA toolkit installed in your compute image (in X.Y format, do not include the patchlevel).  The @HardwareCapability@ is the CUDA compute capability of the GPUs available for this instance type.  The @DeviceCount@ is the number of GPU cores available for this instance type.
+
 h4. Minimal configuration example for Amazon EC2
 
 The <span class="userinput">ImageID</span> value is the compute node image that was built in "the previous section":install-compute-node.html#aws.
diff --git a/doc/install/crunch2-lsf/install-dispatch.html.textile.liquid b/doc/install/crunch2-lsf/install-dispatch.html.textile.liquid
index 7e44c8ec4..46588897e 100644
--- a/doc/install/crunch2-lsf/install-dispatch.html.textile.liquid
+++ b/doc/install/crunch2-lsf/install-dispatch.html.textile.liquid
@@ -64,17 +64,40 @@ Alternatively, you can arrange for the arvados-dispatch-lsf process to run as an
 
 h3(#SbatchArguments). Containers.LSF.BsubArgumentsList
 
-When arvados-dispatch-lsf invokes @bsub@, you can add arguments to the command by specifying @BsubArgumentsList at .  You can use this to send the jobs to specific cluster partitions or add resource requests.  Set @BsubArgumentsList@ to an array of strings.  For example:
+When arvados-dispatch-lsf invokes @bsub@, you can add arguments to the command by specifying @BsubArgumentsList at .  You can use this to send the jobs to specific cluster partitions or add resource requests.  Set @BsubArgumentsList@ to an array of strings.
+
+Template variables starting with % will be substituted as follows:
+
+%U uuid
+%C number of VCPUs
+%M memory in MB
+%T tmp in MB
+%G number of GPU devices (@runtime_constraints.cuda.device_count@)
+
+Use %% to express a literal %. The %%J in the default will be changed
+to %J, which is interpreted by @bsub@ itself.
+
+For example:
 
 <notextile>
 <pre>    Containers:
       LSF:
-        <code class="userinput">BsubArgumentsList: <b>["-C", "0", "-o", "/tmp/crunch-run.%J.out", "-e", "/tmp/crunch-run.%J.err"]</b></code>
+        <code class="userinput">BsubArgumentsList: <b>["-o", "/tmp/crunch-run.%%J.out", "-e", "/tmp/crunch-run.%%J.err", "-J", "%U", "-n", "%C", "-D", "%MMB", "-R", "rusage[mem=%MMB:tmp=%TMB] span[hosts=1]", "-R", "select[mem>=%MMB]", "-R", "select[tmp>=%TMB]", "-R", "select[ncpus>=%C]"]</b></code>
 </pre>
 </notextile>
 
 Note that the default value for @BsubArgumentsList@ uses the @-o@ and @-e@ arguments to write stdout/stderr data to files in @/tmp@ on the compute nodes, which is helpful for troubleshooting installation/configuration problems. Ensure you have something in place to delete old files from @/tmp@, or adjust these arguments accordingly.
 
+h3(#SbatchArguments). Containers.LSF.BsubCUDAArguments
+
+If the container requests access to GPUs (@runtime_constraints.cuda.device_count@ of the container request is greater than zero), the command line arguments in @BsubCUDAArguments@ will be added to the command line _after_ @BsubArgumentsList at .  This should consist of the additional @bsub@ flags your site requires to schedule the job on a node with GPU support.  Set @BsubCUDAArguments@ to an array of strings.  For example:
+
+<notextile>
+<pre>    Containers:
+      LSF:
+        <code class="userinput">BsubCUDAArguments: <b>["-gpu", "num=%G"]</b></code>
+</pre>
+</notextile>
 
 h3(#PollPeriod). Containers.PollInterval
 
diff --git a/doc/install/crunch2/install-compute-node-docker.html.textile.liquid b/doc/install/crunch2/install-compute-node-docker.html.textile.liquid
index 66bd85b7c..6204d524f 100644
--- a/doc/install/crunch2/install-compute-node-docker.html.textile.liquid
+++ b/doc/install/crunch2/install-compute-node-docker.html.textile.liquid
@@ -31,6 +31,8 @@ h2(#docker). Set up Docker
 
 See "Set up Docker":../install-docker.html
 
+{% include 'install_cuda' %}
+
 {% assign arvados_component = 'python-arvados-fuse crunch-run arvados-docker-cleaner' %}
 
 {% include 'install_compute_fuse' %}
diff --git a/doc/install/crunch2/install-compute-node-singularity.html.textile.liquid b/doc/install/crunch2/install-compute-node-singularity.html.textile.liquid
index 14f95e48f..e61b6cbe3 100644
--- a/doc/install/crunch2/install-compute-node-singularity.html.textile.liquid
+++ b/doc/install/crunch2/install-compute-node-singularity.html.textile.liquid
@@ -32,6 +32,8 @@ This page describes how to configure a compute node so that it can be used to ru
 
 {% include 'install_packages' %}
 
+{% include 'install_cuda' %}
+
 h2(#singularity). Set up Singularity
 
 Follow the "Singularity installation instructions":https://sylabs.io/guides/3.7/user-guide/quick_start.html. Make sure @singularity@ and @mksquashfs@ are working:
diff --git a/doc/user/cwl/cwl-extensions.html.textile.liquid b/doc/user/cwl/cwl-extensions.html.textile.liquid
index 0580dca28..dd78e989f 100644
--- a/doc/user/cwl/cwl-extensions.html.textile.liquid
+++ b/doc/user/cwl/cwl-extensions.html.textile.liquid
@@ -58,7 +58,7 @@ hints:
       property1: value1
       property2: $(inputs.value2)
 
-  arv:CUDARequirement:
+  cwltool:CUDARequirement:
     cudaVersionMin: "11.0"
     cudaComputeCapabilityMin: "9.0"
     deviceCountMin: 1
@@ -153,7 +153,7 @@ table(table table-bordered table-condensed).
 |_. Field |_. Type |_. Description |
 |processProperties|key-value map, or list of objects with the fields {propertyName, propertyValue}|The properties that will be set on the container request.  May include expressions that reference `$(inputs)` of the current workflow or tool.|
 
-h2(#CUDARequirement). arv:CUDARequirement
+h2(#CUDARequirement). cwltool:CUDARequirement
 
 Request support for Nvidia CUDA GPU acceleration in the container.  Assumes that the CUDA runtime (SDK) is installed in the container, and the host will inject the CUDA driver libraries into the container (equal or later to the version requested).
 
diff --git a/doc/user/cwl/cwl-style.html.textile.liquid b/doc/user/cwl/cwl-style.html.textile.liquid
index bd07161ce..853ed3b3e 100644
--- a/doc/user/cwl/cwl-style.html.textile.liquid
+++ b/doc/user/cwl/cwl-style.html.textile.liquid
@@ -11,9 +11,36 @@ SPDX-License-Identifier: CC-BY-SA-3.0
 
 h2(#performance). Performance
 
-To get the best perfomance from your workflows, be aware of the following Arvados features, behaviors, and best practices:
+To get the best perfomance from your workflows, be aware of the following Arvados features, behaviors, and best practices.
 
-If you have a sequence of short-running steps (less than 1-2 minutes each), use the Arvados extension "arv:RunInSingleContainer":cwl-extensions.html#RunInSingleContainer to avoid scheduling and data transfer overhead by running all the steps together at once.  To use this feature, @cwltool@ must be installed in the container image.
+Does your application support NVIDIA GPU acceleration?  Use "cwltool:CUDARequirement":cwl-extensions.html#CUDARequirement to request nodes with GPUs.
+
+If you have a sequence of short-running steps (less than 1-2 minutes each), use the Arvados extension "arv:RunInSingleContainer":cwl-extensions.html#RunInSingleContainer to avoid scheduling and data transfer overhead by running all the steps together in the same container on the same node.  To use this feature, @cwltool@ must be installed in the container image.  Example:
+
+{% codeblock as yaml %}
+class: Workflow
+cwlVersion: v1.0
+$namespaces:
+  arv: "http://arvados.org/cwl#"
+inputs:
+  file: File
+outputs: []
+requirements:
+  SubworkflowFeatureRequirement: {}
+steps:
+  subworkflow-with-short-steps:
+    in:
+      file: file
+    out: [out]
+    # This hint indicates that the subworkflow should be bundled and
+    # run in a single container, instead of the normal behavior, which
+    # is to run each step in a separate container.  This greatly
+    # reduces overhead if you have a series of short jobs, without
+    # requiring any changes the CWL definition of the sub workflow.
+    hints:
+      - class: arv:RunInSingleContainer
+    run: subworkflow-with-short-steps.cwl
+{% endcodeblock %}
 
 Avoid declaring @InlineJavascriptRequirement@ or @ShellCommandRequirement@ unless you specifically need them.  Don't include them "just in case" because they change the default behavior and may add extra overhead.
 
@@ -123,7 +150,7 @@ To write workflows that are easy to modify and portable across CWL runners (in t
 
 Workflows should always provide @DockerRequirement@ in the @hints@ or @requirements@ section.
 
-Build a reusable library of components.  Share tool wrappers and subworkflows between projects.  Make use of and contribute to "community maintained workflows and tools":https://github.com/common-workflow-language/workflows and tool registries such as "Dockstore":http://dockstore.org .
+Build a reusable library of components.  Share tool wrappers and subworkflows between projects.  Make use of and contribute to "community maintained workflows and tools":https://github.com/common-workflow-library and tool registries such as "Dockstore":http://dockstore.org .
 
 CommandLineTools wrapping custom scripts should represent the script as an input parameter with the script file as a default value.  Use @secondaryFiles@ for scripts that consist of multiple files.  For example:
 

-----------------------------------------------------------------------


hooks/post-receive
--