[ARVADOS] updated: fc5257c18b24ab0e28b248655dcabfafe9665bf3

git at public.curoverse.com git at public.curoverse.com
Tue Dec 1 16:22:23 EST 2015


Summary of changes:
 services/api/lib/crunch_dispatch.rb | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

       via  fc5257c18b24ab0e28b248655dcabfafe9665bf3 (commit)
       via  44995fc2895a304737e324ea05f7e75e87f1458c (commit)
      from  8788c145b860e19a1f04c4dc6abdcda14c859403 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.


commit fc5257c18b24ab0e28b248655dcabfafe9665bf3
Merge: 8788c14 44995fc
Author: Brett Smith <brett at curoverse.com>
Date:   Tue Dec 1 16:21:43 2015 -0500

    Merge branch '7870-crunch-dispatch-retry-fail-lock-wip'
    
    Closes #7870, #7877.


commit 44995fc2895a304737e324ea05f7e75e87f1458c
Author: Brett Smith <brett at curoverse.com>
Date:   Tue Dec 1 10:38:09 2015 -0500

    7870: Teach crunch-dispatch to fail jobs it already locked.
    
    The fail_job method has been written with the assumption that the job
    should be unlocked.  The retry support added in #4410 breaks this
    assumption.  Teach fail_job to acquire the lock only when we don't
    know we already have it.

diff --git a/services/api/lib/crunch_dispatch.rb b/services/api/lib/crunch_dispatch.rb
index bd1591d..858bb67 100644
--- a/services/api/lib/crunch_dispatch.rb
+++ b/services/api/lib/crunch_dispatch.rb
@@ -205,14 +205,18 @@ class CrunchDispatch
       $stderr.puts "dispatch: log.create failed"
     end
 
-    begin
-      job.lock @authorizations[job.uuid].user.uuid
-      job.state = "Failed"
-      if not job.save
-        $stderr.puts "dispatch: save failed setting job #{job.uuid} to failed"
+    if not have_job_lock?(job)
+      begin
+        job.lock @authorizations[job.uuid].user.uuid
+      rescue ArvadosModel::AlreadyLockedError
+        $stderr.puts "dispatch: tried to mark job #{job.uuid} as failed but it was already locked by someone else"
+        return
       end
-    rescue ArvadosModel::AlreadyLockedError
-      $stderr.puts "dispatch: tried to mark job #{job.uuid} as failed but it was already locked by someone else"
+    end
+
+    job.state = "Failed"
+    if not job.save
+      $stderr.puts "dispatch: save failed setting job #{job.uuid} to failed"
     end
   end
 
@@ -391,7 +395,7 @@ class CrunchDispatch
         cmd_args += ['--docker-bin', @docker_bin]
       end
 
-      if @todo_job_retries.include?(job.uuid)
+      if have_job_lock?(job)
         cmd_args << "--force-unlock"
       end
 
@@ -803,6 +807,12 @@ class CrunchDispatch
 
   protected
 
+  def have_job_lock?(job)
+    # Return true if the given job is locked by this crunch-dispatch, normally
+    # because we've run crunch-job for it.
+    @todo_job_retries.include?(job.uuid)
+  end
+
   def did_recently(thing, min_interval)
     if !@did_recently[thing] or @did_recently[thing] < Time.now - min_interval
       @did_recently[thing] = Time.now

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list