[ARVADOS] updated: 3d371ed190546daf09f044f3a1dbcecf21aa28c7

git at public.curoverse.com git at public.curoverse.com
Mon Jun 9 15:12:54 EDT 2014


Summary of changes:
 services/api/script/crunch-dispatch.rb             | 30 +++++++++++-----------
 .../arvados/v1/collections_controller_test.rb      | 29 ++++++++++++++++-----
 2 files changed, 37 insertions(+), 22 deletions(-)

       via  3d371ed190546daf09f044f3a1dbcecf21aa28c7 (commit)
       via  139728cc017e87f424b52d93f20ed680f0adfc62 (commit)
       via  f0d8ab52b77f74e9294fe634207ce6e1ff9748a1 (commit)
       via  488a811374ff4bdeed9f2f2f57d9ef31d9369b5b (commit)
       via  2f3d49bde80526060d3337f13dfa91cd581ac222 (commit)
       via  cea92754dfacf2b409d1f5b45dd0775fc44c842d (commit)
      from  34b16e07d26438d6e7736cfa5a2e0c100e9a67c7 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.


commit 3d371ed190546daf09f044f3a1dbcecf21aa28c7
Merge: 139728c f0d8ab5
Author: Brett Smith <brett at curoverse.com>
Date:   Mon Jun 9 15:13:43 2014 -0400

    Merge branch 'master' into 2880-crunch-dispatch-node-constraints-wip


commit 139728cc017e87f424b52d93f20ed680f0adfc62
Author: Brett Smith <brett at curoverse.com>
Date:   Mon Jun 9 15:10:50 2014 -0400

    2880: Avoid long sleeps in crunch-dispatch.
    
    From feedback in refs #2880.  Now instead of sleeping, we set a
    deadline that decides whether to break or continue through start_jobs'
    main loop.

diff --git a/services/api/script/crunch-dispatch.rb b/services/api/script/crunch-dispatch.rb
index 247c0e6..59e3aff 100755
--- a/services/api/script/crunch-dispatch.rb
+++ b/services/api/script/crunch-dispatch.rb
@@ -143,28 +143,22 @@ class Dispatcher
   def nodes_available_for_job(job)
     # Check if there are enough idle nodes with the Job's minimum
     # hardware requirements to run it.  If so, return an array of
-    # their names.  If not, we'll wait a little bit to see if the Node
-    # Manager makes some available--up to five minutes every
-    # hour--before returning nil.
+    # their names.  If not, up to once per hour, signal start_jobs to
+    # hold off launching Jobs.  This delay is meant to give the Node
+    # Manager an opportunity to make new resources available for new
+    # Jobs.
     #
     # The exact timing parameters here might need to be adjusted for
     # the best balance between helping the longest-waiting Jobs run,
     # and making efficient use of immediately available resources.
     # These are all just first efforts until we have more data to work
     # with.
-    if nodelist = nodes_available_for_job_now(job)
-      nodelist
-    elsif did_recently(:wait_for_available_nodes, 3600)
-      nil
-    else
+    nodelist = nodes_available_for_job_now(job)
+    if nodelist.nil? and not did_recently(:wait_for_available_nodes, 3600)
       $stderr.puts "dispatch: waiting for nodes for #{job.uuid}"
-      deadline = Time.now + 300
-      while (Time.now < deadline) and not $signal[:term]
-        sleep(60)
-        break if nodelist = nodes_available_for_job_now(job)
-      end
-      nodelist
+      @node_wait_deadline = Time.now + 5.minutes
     end
+    nodelist
   end
 
   def start_jobs
@@ -177,7 +171,13 @@ class Dispatcher
         cmd_args = []
       when :slurm_immediate
         nodelist = nodes_available_for_job(job)
-        next if nodelist.nil?
+        if nodelist.nil?
+          if Time.now < @node_wait_deadline
+            break
+          else
+            next
+          end
+        end
         cmd_args = ["salloc",
                     "--chdir=/",
                     "--immediate",

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list