[ARVADOS] updated: baeb7dbe5929012dea22985b11ae4c5584f76891

git at public.curoverse.com git at public.curoverse.com
Tue Feb 9 17:10:22 EST 2016


Summary of changes:
 sdk/cli/bin/crunch-job | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

       via  baeb7dbe5929012dea22985b11ae4c5584f76891 (commit)
      from  19199a75e41004ea776622c305c3ca43e5367bf2 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.


commit baeb7dbe5929012dea22985b11ae4c5584f76891
Author: Brett Smith <brett at curoverse.com>
Date:   Tue Feb 9 17:10:08 2016 -0500

    crunch-job detects more "io aborted" SLURM errors.
    
    It's seemingly random whether SLURM reports "Aborting, io aborted and
    missing step" or "Aborting, missing step and io aborted".  Extend the
    regexp to catch both.  No issue #.

diff --git a/sdk/cli/bin/crunch-job b/sdk/cli/bin/crunch-job
index 5eb2f90..baaf795 100755
--- a/sdk/cli/bin/crunch-job
+++ b/sdk/cli/bin/crunch-job
@@ -1461,7 +1461,7 @@ sub preprocess_stderr
       # whoa.
       $main::please_freeze = 1;
     }
-    elsif ($line =~ /srun: error: (Node failure on|Aborting, io error)/) {
+    elsif ($line =~ /srun: error: (Node failure on|Aborting, .*\bio error\b)/) {
       my $job_slot_index = $jobstep[$job]->{slotindex};
       $slot[$job_slot_index]->{node}->{fail_count}++;
       $jobstep[$job]->{tempfail} = 1;

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list