[ARVADOS] updated: 8bc77d6ee612217cfb50bca997ce3b94c19637e9
git at public.curoverse.com
git at public.curoverse.com
Thu Sep 25 14:27:04 EDT 2014
Summary of changes:
services/api/script/crunch-dispatch.rb | 51 ++++++++++++++++------------------
1 file changed, 24 insertions(+), 27 deletions(-)
via 8bc77d6ee612217cfb50bca997ce3b94c19637e9 (commit)
from caa5dd776dfad5e50592a5cc2824c70ac3474b46 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
commit 8bc77d6ee612217cfb50bca997ce3b94c19637e9
Author: Peter Amstutz <peter.amstutz at curoverse.com>
Date: Thu Sep 25 14:26:58 2014 -0400
Setting running = false is ok for successful jobs, so take out extra exit_code == 0 check.
diff --git a/services/api/script/crunch-dispatch.rb b/services/api/script/crunch-dispatch.rb
index 68af85d..c152841 100755
--- a/services/api/script/crunch-dispatch.rb
+++ b/services/api/script/crunch-dispatch.rb
@@ -422,34 +422,31 @@ class Dispatcher
exit_status = j_done[:wait_thr].value.exitstatus
jobrecord = Job.find_by_uuid(job_done.uuid)
- if exit_status != 0
- # crunch-job exited with some kind of failure.
- if exit_status != 75 and jobrecord.started_at
- # Clean up state fields in case crunch-job exited without
- # putting the job in a suitable "finished" state.
- jobrecord.running = false
- jobrecord.finished_at ||= Time.now
- if jobrecord.success.nil?
- jobrecord.success = false
- end
- jobrecord.save!
- else
- # Don't fail the job if crunch-job didn't even get as far as
- # starting it. If the job failed to run due to an infrastructure
- # issue with crunch-job or slurm, we want the job to stay in the
- # queue. If crunch-job exited after losing a race to another
- # crunch-job process, it exits 75 and we should leave the job
- # record alone so the winner of the race do its thing.
- #
- # There is still an unhandled race condition: If our crunch-job
- # process is about to lose a race with another crunch-job
- # process, but crashes before getting to its "exit 75" (for
- # example, "cannot fork" or "cannot reach API server") then we
- # will assume incorrectly that it's our process's fault
- # jobrecord.started_at is non-nil, and mark the job as failed
- # even though the winner of the race is probably still doing
- # fine.
+ if exit_status != 75 and jobrecord.started_at
+ # Clean up state fields in case crunch-job exited without
+ # putting the job in a suitable "finished" state.
+ jobrecord.running = false
+ jobrecord.finished_at ||= Time.now
+ if jobrecord.success.nil?
+ jobrecord.success = false
end
+ jobrecord.save!
+ else
+ # Don't fail the job if crunch-job didn't even get as far as
+ # starting it. If the job failed to run due to an infrastructure
+ # issue with crunch-job or slurm, we want the job to stay in the
+ # queue. If crunch-job exited after losing a race to another
+ # crunch-job process, it exits 75 and we should leave the job
+ # record alone so the winner of the race do its thing.
+ #
+ # There is still an unhandled race condition: If our crunch-job
+ # process is about to lose a race with another crunch-job
+ # process, but crashes before getting to its "exit 75" (for
+ # example, "cannot fork" or "cannot reach API server") then we
+ # will assume incorrectly that it's our process's fault
+ # jobrecord.started_at is non-nil, and mark the job as failed
+ # even though the winner of the race is probably still doing
+ # fine.
end
# Invalidate the per-job auth token
-----------------------------------------------------------------------
hooks/post-receive
--
More information about the arvados-commits
mailing list