[ARVADOS] updated: 1.1.3-73-g52e26b4

Git user git at public.curoverse.com
Mon Feb 26 11:30:50 EST 2018


Summary of changes:
 doc/_includes/_mount_types.liquid                  |  2 +-
 .../install-dispatch.html.textile.liquid           | 38 +++++++++-------------
 services/crunch-dispatch-slurm/priority.go         | 14 ++++++--
 services/crunch-dispatch-slurm/priority_test.go    | 22 ++++++-------
 services/crunch-dispatch-slurm/squeue_test.go      |  9 ++---
 5 files changed, 44 insertions(+), 41 deletions(-)

       via  52e26b4e8bbbf505d6641becc435a939cee8c285 (commit)
       via  6664c6b18ded2d97e53b0b0a853e2c7a1a86fe1c (commit)
      from  a837c67de6903827f7dfb3b19adfc82c30a87861 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.


commit 52e26b4e8bbbf505d6641becc435a939cee8c285
Author: Tom Clegg <tclegg at veritasgenetics.com>
Date:   Mon Feb 26 11:14:18 2018 -0500

    12552: Fix accidental textile link syntax.
    
    Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tclegg at veritasgenetics.com>

diff --git a/doc/_includes/_mount_types.liquid b/doc/_includes/_mount_types.liquid
index fc8a799..edf8edf 100644
--- a/doc/_includes/_mount_types.liquid
+++ b/doc/_includes/_mount_types.liquid
@@ -64,7 +64,7 @@ When a container's output_path is a tmp mount backed by local disk, this output
 
 1. Only mount points of kind @collection@ are supported.
 
-2. Mount points underneath output_path which have "writable":true are copied into output_path during container initialization and may be updated, renamed, or deleted by the running container.  The original collection is not modified.  On container completion, files remaining in the output are saved to the output collection.   The mount at output_path must be big enough to accommodate copies of the inner writable mounts.
+2. Mount points underneath output_path which have @"writable":true@ are copied into output_path during container initialization and may be updated, renamed, or deleted by the running container.  The original collection is not modified.  On container completion, files remaining in the output are saved to the output collection.   The mount at output_path must be big enough to accommodate copies of the inner writable mounts.
 
 3. If any such mount points are configured as @exclude_from_output":true@, they will be excluded from the output.
 

commit 6664c6b18ded2d97e53b0b0a853e2c7a1a86fe1c
Author: Tom Clegg <tclegg at veritasgenetics.com>
Date:   Mon Feb 26 10:42:16 2018 -0500

    12552: Document PrioritySpread. Default to 10 if not configured.
    
    Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tclegg at veritasgenetics.com>

diff --git a/doc/install/crunch2-slurm/install-dispatch.html.textile.liquid b/doc/install/crunch2-slurm/install-dispatch.html.textile.liquid
index 27f15b1..b3b59cb 100644
--- a/doc/install/crunch2-slurm/install-dispatch.html.textile.liquid
+++ b/doc/install/crunch2-slurm/install-dispatch.html.textile.liquid
@@ -69,8 +69,8 @@ Override Keep service discovery with a predefined list of Keep URIs. This can be
 
 <notextile>
 <pre><code class="userinput">Client:
-  APIHost: <b>zzzzz.arvadosapi.com</b>
-  AuthToken: <b>zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz</b>
+  APIHost: zzzzz.arvadosapi.com
+  AuthToken: zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
   KeepServiceURIs:
   - <b>http://127.0.0.1:25107</b>
 </code></pre>
@@ -81,10 +81,16 @@ h3. PollPeriod
 crunch-dispatch-slurm polls the API server periodically for new containers to run.  The @PollPeriod@ option controls how often this poll happens.  Set this to a string of numbers suffixed with one of the time units @ns@, @us@, @ms@, @s@, @m@, or @h at .  For example:
 
 <notextile>
-<pre><code class="userinput">Client:
-  APIHost: <b>zzzzz.arvadosapi.com</b>
-  AuthToken: <b>zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz</b>
-PollPeriod: <b>3m30s</b>
+<pre><code class="userinput">PollPeriod: <b>3m30s</b>
+</code></pre>
+</notextile>
+
+h3. PrioritySpread
+
+crunch-dispatch-slurm adjusts the "nice" values of its SLURM jobs to ensure containers are prioritized correctly relative to one another. If non-Arvados jobs run on your SLURM cluster, a lower @PrioritySpread@ helps Arvados containers compete with them. If you have an older SLURM system that limits nice values to 10000, a smaller @PrioritySpread@ can help avoid reaching that limit. In other cases, a larger value is beneficial because it reduces the total number of adjustments made by executing @scontrol at . The smallest usable value is @1 at . The default value of @10@ is used if this option is zero or negative. Example:
+
+<notextile>
+<pre><code class="userinput">PrioritySpread: <b>1000</b>
 </code></pre>
 </notextile>
 
@@ -93,10 +99,7 @@ h3. SbatchArguments
 When crunch-dispatch-slurm invokes @sbatch@, you can add switches to the command by specifying @SbatchArguments at .  You can use this to send the jobs to specific cluster partitions or add resource requests.  Set @SbatchArguments@ to an array of strings.  For example:
 
 <notextile>
-<pre><code class="userinput">Client:
-  APIHost: <b>zzzzz.arvadosapi.com</b>
-  AuthToken: <b>zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz</b>
-SbatchArguments:
+<pre><code class="userinput">SbatchArguments:
 - <b>"--partition=PartitionName"</b>
 </code></pre>
 </notextile>
@@ -106,10 +109,7 @@ h3. CrunchRunCommand: Dispatch to SLURM cgroups
 If your SLURM cluster uses the @task/cgroup@ TaskPlugin, you can configure Crunch's Docker containers to be dispatched inside SLURM's cgroups.  This provides consistent enforcement of resource constraints.  To do this, use a crunch-dispatch-slurm configuration like the following:
 
 <notextile>
-<pre><code class="userinput">Client:
-  APIHost: <b>zzzzz.arvadosapi.com</b>
-  AuthToken: <b>zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz</b>
-CrunchRunCommand:
+<pre><code class="userinput">CrunchRunCommand:
 - <b>crunch-run</b>
 - <b>"-cgroup-parent-subsystem=memory"</b>
 </code></pre>
@@ -130,10 +130,7 @@ h3. CrunchRunCommand: Using host networking for containers
 Older Linux kernels (prior to 3.18) have bugs in network namespace handling which can lead to compute node lockups.  This by is indicated by blocked kernel tasks in "Workqueue: netns cleanup_net".   If you are experiencing this problem, as a workaround you can disable use of network namespaces by Docker across the cluster.  Be aware this reduces container isolation, which may be a security risk.
 
 <notextile>
-<pre><code class="userinput">Client:
-  APIHost: <b>zzzzz.arvadosapi.com</b>
-  AuthToken: <b>zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz</b>
-CrunchRunCommand:
+<pre><code class="userinput">CrunchRunCommand:
 - <b>crunch-run</b>
 - <b>"-container-enable-networking=always"</b>
 - <b>"-container-network-mode=host"</b>
@@ -145,10 +142,7 @@ h3. MinRetryPeriod: Rate-limit repeated attempts to start containers
 If SLURM is unable to run a container, the dispatcher will submit it again after the next PollPeriod. If PollPeriod is very short, this can be excessive. If MinRetryPeriod is set, the dispatcher will avoid submitting the same container to SLURM more than once in the given time span.
 
 <notextile>
-<pre><code class="userinput">Client:
-  APIHost: <b>zzzzz.arvadosapi.com</b>
-  AuthToken: <b>zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz</b>
-MinRetryPeriod: <b>30s</b>
+<pre><code class="userinput">MinRetryPeriod: <b>30s</b>
 </code></pre>
 </notextile>
 
diff --git a/services/crunch-dispatch-slurm/priority.go b/services/crunch-dispatch-slurm/priority.go
index 8c27043..2312ce5 100644
--- a/services/crunch-dispatch-slurm/priority.go
+++ b/services/crunch-dispatch-slurm/priority.go
@@ -4,19 +4,27 @@
 
 package main
 
+const defaultSpread int64 = 10
+
 // wantNice calculates appropriate nice values for a set of SLURM
 // jobs. The returned slice will have len(jobs) elements.
 //
-// spread is a non-negative amount of space to leave between adjacent
+// spread is a positive amount of space to leave between adjacent
 // priorities when making adjustments. Generally, increasing spread
 // reduces the total number of adjustments made. A smaller spread
 // produces lower nice values, which is useful for old SLURM versions
 // with a limited "nice" range and for sites where SLURM is also
 // running non-Arvados jobs with low nice values.
+//
+// If spread<1, a sensible default (10) is used.
 func wantNice(jobs []*slurmJob, spread int64) []int64 {
 	if len(jobs) == 0 {
 		return nil
 	}
+
+	if spread < 1 {
+		spread = defaultSpread
+	}
 	renice := make([]int64, len(jobs))
 
 	// highest usable priority (without going out of order)
@@ -27,13 +35,13 @@ func wantNice(jobs []*slurmJob, spread int64) []int64 {
 			// priority container gets the highest
 			// possible slurm priority.
 			target = job.priority + job.nice
-		} else if space := target - job.priority; space >= 0 && space < spread*10 {
+		} else if space := target - job.priority; space >= 0 && space < (spread-1)*10 {
 			// Ordering is correct, and interval isn't too
 			// large. Leave existing nice value alone.
 			renice[i] = job.nice
 			target = job.priority
 		} else {
-			target -= spread
+			target -= (spread - 1)
 			if possible := job.priority + job.nice; target > possible {
 				// renice[i] is already 0, that's the
 				// best we can do
diff --git a/services/crunch-dispatch-slurm/priority_test.go b/services/crunch-dispatch-slurm/priority_test.go
index bc9c4dc..e80984c 100644
--- a/services/crunch-dispatch-slurm/priority_test.go
+++ b/services/crunch-dispatch-slurm/priority_test.go
@@ -41,7 +41,7 @@ func (s *PrioritySuite) TestReniceCorrect(c *C) {
 				{priority: 4294000111, nice: 10000},
 				{priority: 4294000111, nice: 10000},
 			},
-			[]int64{0, 11, 22, 33},
+			[]int64{0, 10, 20, 30},
 		},
 		{ // smaller spread than necessary, but correctly ordered => leave nice alone
 			10,
@@ -56,10 +56,10 @@ func (s *PrioritySuite) TestReniceCorrect(c *C) {
 			10,
 			[]*slurmJob{
 				{priority: 4294000144, nice: 0},
-				{priority: 4294000122, nice: 22},
-				{priority: 4294000111, nice: 33},
+				{priority: 4294000122, nice: 20},
+				{priority: 4294000111, nice: 30},
 			},
-			[]int64{0, 22, 33},
+			[]int64{0, 20, 30},
 		},
 		{ // > 10x spread => reduce nice to achieve spread=10
 			10,
@@ -68,7 +68,7 @@ func (s *PrioritySuite) TestReniceCorrect(c *C) {
 				{priority: 3000, nice: 999},  // max pri 3999
 				{priority: 2000, nice: 1998}, // max pri 3998
 			},
-			[]int64{0, 10, 20},
+			[]int64{0, 9, 18},
 		},
 		{ // > 10x spread, but spread=10 is impossible without negative nice
 			10,
@@ -77,19 +77,19 @@ func (s *PrioritySuite) TestReniceCorrect(c *C) {
 				{priority: 3000, nice: 500},  // max pri 3500
 				{priority: 2000, nice: 2000}, // max pri 4000
 			},
-			[]int64{0, 0, 511},
+			[]int64{0, 0, 510},
 		},
-		{ // reorder
-			10,
+		{ // default spread, needs reorder
+			0,
 			[]*slurmJob{
 				{priority: 4000, nice: 0}, // max pri 4000
 				{priority: 5000, nice: 0}, // max pri 5000
 				{priority: 6000, nice: 0}, // max pri 6000
 			},
-			[]int64{0, 1011, 2022},
+			[]int64{0, 1000 + defaultSpread, 2000 + defaultSpread*2},
 		},
-		{ // zero spread
-			0,
+		{ // minimum spread
+			1,
 			[]*slurmJob{
 				{priority: 4000, nice: 0}, // max pri 4000
 				{priority: 5000, nice: 0}, // max pri 5000
diff --git a/services/crunch-dispatch-slurm/squeue_test.go b/services/crunch-dispatch-slurm/squeue_test.go
index 4df469b..f1ffda9 100644
--- a/services/crunch-dispatch-slurm/squeue_test.go
+++ b/services/crunch-dispatch-slurm/squeue_test.go
@@ -23,24 +23,25 @@ func (s *SqueueSuite) TestReniceAll(c *C) {
 		expect [][]string
 	}{
 		{
-			spread: 0,
+			spread: 1,
 			squeue: uuids[0] + " 10000 4294000000\n",
 			want:   map[string]int64{uuids[0]: 1},
 			expect: [][]string{{uuids[0], "0"}},
 		},
 		{ // fake0 priority is too high
-			spread: 0,
+			spread: 1,
 			squeue: uuids[0] + " 10000 4294000777\n" + uuids[1] + " 10000 4294000444\n",
 			want:   map[string]int64{uuids[0]: 1, uuids[1]: 999},
 			expect: [][]string{{uuids[1], "0"}, {uuids[0], "334"}},
 		},
-		{ // non-zero spread
+		{ // specify spread
 			spread: 100,
 			squeue: uuids[0] + " 10000 4294000777\n" + uuids[1] + " 10000 4294000444\n",
 			want:   map[string]int64{uuids[0]: 1, uuids[1]: 999},
-			expect: [][]string{{uuids[1], "0"}, {uuids[0], "434"}},
+			expect: [][]string{{uuids[1], "0"}, {uuids[0], "433"}},
 		},
 		{ // ignore fake2 because SetPriority() not called
+			spread: 1,
 			squeue: uuids[0] + " 10000 4294000000\n" + uuids[1] + " 10000 4294000111\n" + uuids[2] + " 10000 4294000222\n",
 			want:   map[string]int64{uuids[0]: 999, uuids[1]: 1},
 			expect: [][]string{{uuids[0], "0"}, {uuids[1], "112"}},

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list