[ARVADOS] created: 82225f2eeb39f6798ff83e979c28698ff617d414
Git user
git at public.curoverse.com
Fri Apr 1 14:37:54 EDT 2016
at 82225f2eeb39f6798ff83e979c28698ff617d414 (commit)
commit 82225f2eeb39f6798ff83e979c28698ff617d414
Author: Brett Smith <brett at curoverse.com>
Date: Fri Apr 1 14:37:34 2016 -0400
8782: Remove WIFEXITED check from crunch-job reapchildren.
The intent of this check was to avoid reaping children that got
SIGSTOP. But from the waitpid(2) man page, you must pass specific
flags for waitpid to return those children. Without those flags,
waitpid will only return the pids of children that have terminated.
Meanwhile, WIFEXITED only returns true if the exit code indicates that
the child terminated normally. It returns false if the child was
killed by a signal like SIGINT or SIGKILL. This means children so
killed were not reaped by reapchildren, leading to infinite loops.
diff --git a/sdk/cli/bin/crunch-job b/sdk/cli/bin/crunch-job
index 689609d..86e018c 100755
--- a/sdk/cli/bin/crunch-job
+++ b/sdk/cli/bin/crunch-job
@@ -1152,13 +1152,6 @@ sub reapchildren
. $slot[$proc{$pid}->{slot}]->{cpu});
my $jobstepidx = $proc{$pid}->{jobstepidx};
- if (!WIFEXITED($childstatus))
- {
- # child did not exit (may be temporarily stopped)
- Log ($jobstepidx, "child $pid did not actually exit in reapchildren, ignoring for now.");
- next;
- }
-
$children_reaped++;
my $elapsed = time - $proc{$pid}->{time};
my $Jobstep = $jobstep[$jobstepidx];
-----------------------------------------------------------------------
hooks/post-receive
--
More information about the arvados-commits
mailing list