[ARVADOS] created: 82225f2eeb39f6798ff83e979c28698ff617d414

Git user git at public.curoverse.com
Fri Apr 1 14:37:54 EDT 2016


        at  82225f2eeb39f6798ff83e979c28698ff617d414 (commit)


commit 82225f2eeb39f6798ff83e979c28698ff617d414
Author: Brett Smith <brett at curoverse.com>
Date:   Fri Apr 1 14:37:34 2016 -0400

    8782: Remove WIFEXITED check from crunch-job reapchildren.
    
    The intent of this check was to avoid reaping children that got
    SIGSTOP.  But from the waitpid(2) man page, you must pass specific
    flags for waitpid to return those children.  Without those flags,
    waitpid will only return the pids of children that have terminated.
    
    Meanwhile, WIFEXITED only returns true if the exit code indicates that
    the child terminated normally.  It returns false if the child was
    killed by a signal like SIGINT or SIGKILL.  This means children so
    killed were not reaped by reapchildren, leading to infinite loops.

diff --git a/sdk/cli/bin/crunch-job b/sdk/cli/bin/crunch-job
index 689609d..86e018c 100755
--- a/sdk/cli/bin/crunch-job
+++ b/sdk/cli/bin/crunch-job
@@ -1152,13 +1152,6 @@ sub reapchildren
                     . $slot[$proc{$pid}->{slot}]->{cpu});
     my $jobstepidx = $proc{$pid}->{jobstepidx};
 
-    if (!WIFEXITED($childstatus))
-    {
-      # child did not exit (may be temporarily stopped)
-      Log ($jobstepidx, "child $pid did not actually exit in reapchildren, ignoring for now.");
-      next;
-    }
-
     $children_reaped++;
     my $elapsed = time - $proc{$pid}->{time};
     my $Jobstep = $jobstep[$jobstepidx];

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list