[arvados] updated: 2.1.0-2616-g013413d9b

git repository hosting git at public.arvados.org
Thu Jun 23 13:47:09 UTC 2022


Summary of changes:
 doc/_config.yml                          |  1 +
 doc/architecture/hpc.html.textile.liquid | 29 +++++++++++++++++++++++++++++
 lib/dispatchcloud/worker/worker.go       |  6 ++++++
 3 files changed, 36 insertions(+)
 create mode 100644 doc/architecture/hpc.html.textile.liquid

       via  013413d9b0b9fa61ac2aefadb43e6b3dc6c2c7b1 (commit)
       via  13e15fd1404d82b0ee35e9dcc36a686f71a9ffbb (commit)
      from  c2ac4e7b6cd0e0ab4d8ae5dfe0d426a35d5ff875 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.


commit 013413d9b0b9fa61ac2aefadb43e6b3dc6c2c7b1
Author: Tom Clegg <tom at curii.com>
Date:   Thu Jun 23 09:42:25 2022 -0400

    19166: Allow multiple clusters to use loopback driver on same host.
    
    If they don't ignore foreign UUIDs, they kill one another's processes
    because A's container is never in B's queue.
    
    Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom at curii.com>

diff --git a/lib/dispatchcloud/worker/worker.go b/lib/dispatchcloud/worker/worker.go
index 1c8d62c20..277448c74 100644
--- a/lib/dispatchcloud/worker/worker.go
+++ b/lib/dispatchcloud/worker/worker.go
@@ -418,6 +418,12 @@ func (wkr *worker) probeRunning() (running []string, reportsBroken, ok bool) {
 			// empty string following final newline
 		} else if s == "broken" {
 			reportsBroken = true
+		} else if !strings.HasPrefix(s, wkr.cluster.ClusterID) {
+			// Ignore crunch-run processes that belong to
+			// a different cluster (this arises in
+			// multi-cluster test cases that use the
+			// loopback driver)
+			continue
 		} else if toks := strings.Split(s, " "); len(toks) == 1 {
 			running = append(running, s)
 		} else if toks[1] == "stale" {

commit 13e15fd1404d82b0ee35e9dcc36a686f71a9ffbb
Author: Tom Clegg <tom at curii.com>
Date:   Thu Jun 23 00:56:31 2022 -0400

    19166: Explain HPC container shell in architecture docs.
    
    Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom at curii.com>

diff --git a/doc/_config.yml b/doc/_config.yml
index 7c5e6d986..2f3133618 100644
--- a/doc/_config.yml
+++ b/doc/_config.yml
@@ -161,6 +161,7 @@ navbar:
     - Computation with Crunch:
       - api/execution.html.textile.liquid
       - architecture/dispatchcloud.html.textile.liquid
+      - architecture/hpc.html.textile.liquid
       - architecture/singularity.html.textile.liquid
     - Other:
       - api/permission-model.html.textile.liquid
diff --git a/doc/architecture/hpc.html.textile.liquid b/doc/architecture/hpc.html.textile.liquid
new file mode 100644
index 000000000..03a464971
--- /dev/null
+++ b/doc/architecture/hpc.html.textile.liquid
@@ -0,0 +1,29 @@
+---
+layout: default
+navsection: architecture
+title: Dispatching containers to HPC
+...
+{% comment %}
+Copyright (C) The Arvados Authors. All rights reserved.
+
+SPDX-License-Identifier: CC-BY-SA-3.0
+{% endcomment %}
+
+Arvados can be configured to run containers on an HPC cluster using Slurm or LSF, as an alternative to "dispatching to cloud VMs":dispatchcloud.html.
+
+In this configuration, the appropriate Arvados dispatcher service -- @crunch-dispatch-slurm@ or @arvados-dispatch-lsf@ -- picks up each container as it appears in the Arvados queue and submits a short shell script as a batch job to the HPC job queue. The shell script executes the @crunch-run@ container supervisor which retrieves the container specification from the Arvados controller, starts an arv-mount process, runs the container using @docker exec@ or @singularity exec@, and sends updates (logs, outputs, exit code, etc.) back to the Arvados controller.
+
+h2. Container communication channel (reverse https tunnel)
+
+The crunch-run program runs a gateway server to facilitate the “container shell” feature. However, depending on the site's network topology, the Arvados controller may not be able to connect directly to the compute node where a given crunch-run process is running.
+
+Instead, in the HPC configuration, crunch-run connects to the Arvados controller at startup and sets up a multiplexed tunnel, allowing the controller process to connect to crunch-run's gateway server without initiating a connection to the compute node, or even knowing the compute node's IP address.
+
+This means that when a client requests a container shell connection, the traffic goes through two or three servers:
+# The client connects to a controller host C1.
+# If the multiplexed tunnel is connected to a different controller host C2, then C1 proxies the incoming request to C2, using C2's InternalURL.
+# The controller host (C1 or C2) uses the multiplexed tunnel to connect to crunch-run's container gateway.
+
+h2. Scaling
+
+The @API.MaxConcurrentRequests@ configuration should not be set too low, or the long-lived tunnel connections can starve other clients.

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list