[ARVADOS] updated: 3d371ed190546daf09f044f3a1dbcecf21aa28c7
git at public.curoverse.com
git at public.curoverse.com
Mon Jun 9 15:12:54 EDT 2014
Summary of changes:
services/api/script/crunch-dispatch.rb | 30 +++++++++++-----------
.../arvados/v1/collections_controller_test.rb | 29 ++++++++++++++++-----
2 files changed, 37 insertions(+), 22 deletions(-)
via 3d371ed190546daf09f044f3a1dbcecf21aa28c7 (commit)
via 139728cc017e87f424b52d93f20ed680f0adfc62 (commit)
via f0d8ab52b77f74e9294fe634207ce6e1ff9748a1 (commit)
via 488a811374ff4bdeed9f2f2f57d9ef31d9369b5b (commit)
via 2f3d49bde80526060d3337f13dfa91cd581ac222 (commit)
via cea92754dfacf2b409d1f5b45dd0775fc44c842d (commit)
from 34b16e07d26438d6e7736cfa5a2e0c100e9a67c7 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
commit 3d371ed190546daf09f044f3a1dbcecf21aa28c7
Merge: 139728c f0d8ab5
Author: Brett Smith <brett at curoverse.com>
Date: Mon Jun 9 15:13:43 2014 -0400
Merge branch 'master' into 2880-crunch-dispatch-node-constraints-wip
commit 139728cc017e87f424b52d93f20ed680f0adfc62
Author: Brett Smith <brett at curoverse.com>
Date: Mon Jun 9 15:10:50 2014 -0400
2880: Avoid long sleeps in crunch-dispatch.
From feedback in refs #2880. Now instead of sleeping, we set a
deadline that decides whether to break or continue through start_jobs'
main loop.
diff --git a/services/api/script/crunch-dispatch.rb b/services/api/script/crunch-dispatch.rb
index 247c0e6..59e3aff 100755
--- a/services/api/script/crunch-dispatch.rb
+++ b/services/api/script/crunch-dispatch.rb
@@ -143,28 +143,22 @@ class Dispatcher
def nodes_available_for_job(job)
# Check if there are enough idle nodes with the Job's minimum
# hardware requirements to run it. If so, return an array of
- # their names. If not, we'll wait a little bit to see if the Node
- # Manager makes some available--up to five minutes every
- # hour--before returning nil.
+ # their names. If not, up to once per hour, signal start_jobs to
+ # hold off launching Jobs. This delay is meant to give the Node
+ # Manager an opportunity to make new resources available for new
+ # Jobs.
#
# The exact timing parameters here might need to be adjusted for
# the best balance between helping the longest-waiting Jobs run,
# and making efficient use of immediately available resources.
# These are all just first efforts until we have more data to work
# with.
- if nodelist = nodes_available_for_job_now(job)
- nodelist
- elsif did_recently(:wait_for_available_nodes, 3600)
- nil
- else
+ nodelist = nodes_available_for_job_now(job)
+ if nodelist.nil? and not did_recently(:wait_for_available_nodes, 3600)
$stderr.puts "dispatch: waiting for nodes for #{job.uuid}"
- deadline = Time.now + 300
- while (Time.now < deadline) and not $signal[:term]
- sleep(60)
- break if nodelist = nodes_available_for_job_now(job)
- end
- nodelist
+ @node_wait_deadline = Time.now + 5.minutes
end
+ nodelist
end
def start_jobs
@@ -177,7 +171,13 @@ class Dispatcher
cmd_args = []
when :slurm_immediate
nodelist = nodes_available_for_job(job)
- next if nodelist.nil?
+ if nodelist.nil?
+ if Time.now < @node_wait_deadline
+ break
+ else
+ next
+ end
+ end
cmd_args = ["salloc",
"--chdir=/",
"--immediate",
-----------------------------------------------------------------------
hooks/post-receive
--
More information about the arvados-commits
mailing list