[ARVADOS] created: 48ae216660755039de37983d430b93dfcbc6ec56

git at public.curoverse.com git at public.curoverse.com
Tue Dec 1 10:38:17 EST 2015


        at  48ae216660755039de37983d430b93dfcbc6ec56 (commit)


commit 48ae216660755039de37983d430b93dfcbc6ec56
Author: Brett Smith <brett at curoverse.com>
Date:   Tue Dec 1 10:38:09 2015 -0500

    7870: Teach crunch-dispatch to fail jobs it already locked.
    
    The fail_job method has been written with the assumption that the job
    should be unlocked.  The retry support added in #4410 breaks this
    assumption.  Teach fail_job to acquire the lock only when we don't
    know we already have it.

diff --git a/services/api/lib/crunch_dispatch.rb b/services/api/lib/crunch_dispatch.rb
index bd1591d..858bb67 100644
--- a/services/api/lib/crunch_dispatch.rb
+++ b/services/api/lib/crunch_dispatch.rb
@@ -205,14 +205,18 @@ class CrunchDispatch
       $stderr.puts "dispatch: log.create failed"
     end
 
-    begin
-      job.lock @authorizations[job.uuid].user.uuid
-      job.state = "Failed"
-      if not job.save
-        $stderr.puts "dispatch: save failed setting job #{job.uuid} to failed"
+    if not have_job_lock?(job)
+      begin
+        job.lock @authorizations[job.uuid].user.uuid
+      rescue ArvadosModel::AlreadyLockedError
+        $stderr.puts "dispatch: tried to mark job #{job.uuid} as failed but it was already locked by someone else"
+        return
       end
-    rescue ArvadosModel::AlreadyLockedError
-      $stderr.puts "dispatch: tried to mark job #{job.uuid} as failed but it was already locked by someone else"
+    end
+
+    job.state = "Failed"
+    if not job.save
+      $stderr.puts "dispatch: save failed setting job #{job.uuid} to failed"
     end
   end
 
@@ -391,7 +395,7 @@ class CrunchDispatch
         cmd_args += ['--docker-bin', @docker_bin]
       end
 
-      if @todo_job_retries.include?(job.uuid)
+      if have_job_lock?(job)
         cmd_args << "--force-unlock"
       end
 
@@ -803,6 +807,12 @@ class CrunchDispatch
 
   protected
 
+  def have_job_lock?(job)
+    # Return true if the given job is locked by this crunch-dispatch, normally
+    # because we've run crunch-job for it.
+    @todo_job_retries.include?(job.uuid)
+  end
+
   def did_recently(thing, min_interval)
     if !@did_recently[thing] or @did_recently[thing] < Time.now - min_interval
       @did_recently[thing] = Time.now

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list