[ARVADOS] updated: bee4fa82c15b2188c2105fff3b52c305b38f04a6

Git user git at public.curoverse.com
Wed May 24 10:36:44 EDT 2017


Summary of changes:
 doc/user/cwl/cwl-extensions.html.textile.liquid  |  4 ++--
 doc/user/cwl/cwl-run-options.html.textile.liquid | 20 ++++++++++++++++++++
 sdk/cwl/arvados_cwl/__init__.py                  | 22 +++++++++++++++++-----
 sdk/cwl/arvados_cwl/arv-cwl-schema.yml           | 17 ++++++++---------
 sdk/cwl/arvados_cwl/arvcontainer.py              |  3 +--
 5 files changed, 48 insertions(+), 18 deletions(-)

       via  bee4fa82c15b2188c2105fff3b52c305b38f04a6 (commit)
      from  8645888f12c25edaaac8e03fb5691cfcfbcdb9b2 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.


commit bee4fa82c15b2188c2105fff3b52c305b38f04a6
Author: Peter Amstutz <peter.amstutz at curoverse.com>
Date:   Wed May 24 10:36:40 2017 -0400

    11100: Separate "trash intermediate on success" behavior from "output intermediate TTL" option.  Update documentation.

diff --git a/doc/user/cwl/cwl-extensions.html.textile.liquid b/doc/user/cwl/cwl-extensions.html.textile.liquid
index 5e5e497..8a62034 100644
--- a/doc/user/cwl/cwl-extensions.html.textile.liquid
+++ b/doc/user/cwl/cwl-extensions.html.textile.liquid
@@ -82,5 +82,5 @@ Specify desired handling of intermediate output collections.
 
 table(table table-bordered table-condensed).
 |_. Field |_. Type |_. Description |
-|outputTTL|int|If the value is greater than zero, consider intermediate output collections to be temporary and should be automatically trashed. Temporary collections will be trashed `outputTTL` seconds after creation, or on successful completion of workflow (whichever comes first).  A value of zero means intermediate output should be retained indefinitely (this is the default behavior).
-Note: arvados-cwl-runner currently does not take workflow dependencies into account when setting the TTL on an intermediate output collection. If the TTL is too short, it is possible for a collection to be trashed before before downstream steps that consume it are started.  The recommend minimum value for TTL is the time required to complete entire the workflow.|
+|outputTTL|int|If the value is greater than zero, consider intermediate output collections to be temporary and should be automatically trashed. Temporary collections will be trashed @outputTTL@ seconds after creation.  A value of zero means intermediate output should be retained indefinitely (this is the default behavior).
+Note: arvados-cwl-runner currently does not take workflow dependencies into account when setting the TTL on an intermediate output collection. If the TTL is too short, it is possible for a collection to be trashed before downstream steps that consume it are started.  The recommended minimum value for TTL is the expected duration of the entire the workflow.|
diff --git a/doc/user/cwl/cwl-run-options.html.textile.liquid b/doc/user/cwl/cwl-run-options.html.textile.liquid
index c9b18e6..8d1ec19 100644
--- a/doc/user/cwl/cwl-run-options.html.textile.liquid
+++ b/doc/user/cwl/cwl-run-options.html.textile.liquid
@@ -10,6 +10,7 @@ table(table table-bordered table-condensed).
 |_. Option |_. Description |
 |==--basedir== BASEDIR|     Base directory used to resolve relative references in the input, default to directory of input object file or current directory (if inputs piped/provided on command line).|
 |==--version==|             Print version and exit|
+|==--validate==|            Validate CWL document only.|
 |==--verbose==|             Default logging|
 |==--quiet==|               Only print warnings and errors.|
 |==--debug==|               Print even more logging|
@@ -30,7 +31,14 @@ table(table table-bordered table-condensed).
 |==--api== WORK_API|        Select work submission API, one of 'jobs' or 'containers'. Default is 'jobs' if that API is available, otherwise 'containers'.|
 |==--compute-checksum==|    Compute checksum of contents while collecting outputs|
 |==--submit-runner-ram== SUBMIT_RUNNER_RAM|RAM (in MiB) required for the workflow runner job (default 1024)|
+|==--submit-runner-image== SUBMIT_RUNNER_IMAGE|Docker image for workflow runner job, default arvados/jobs|
 |==--name== NAME|           Name to use for workflow execution instance.|
+|==--on-error {stop,continue}|Desired workflow behavior when a step fails. One of 'stop' or 'continue'. Default is 'continue'.|
+|==--enable-dev==|          Enable loading and running development versions of CWL spec.|
+|==--intermediate-output-ttl== N|If N > 0, intermediate output collections will be trashed N seconds after creation. Default is 0 (don't trash).|
+|==--trash-intermediate==|  Immediately trash intermediate outputs on workflow success.|
+|==--no-trash-intermediate==|Do not trash intermediate outputs (default).|
+
 
 h3. Specify workflow and output names
 
@@ -92,3 +100,15 @@ arvados-cwl-runner 1.0.20160628195002, arvados-python-client 0.1.20160616015107,
 }
 </code></pre>
 </notextile>
+
+h3. Automatically delete intermediate outputs
+
+Use the @--intermediate-output-ttl@ and @--trash-intermediate@ options to specify how long intermediate outputs should be kept (in seconds) and whether to trash them immediately upon successful workflow completion.
+
+Temporary collections will be trashed @intermediate-output-ttl@ seconds after creation.  A value of zero (default) means intermediate output should be retained indefinitely.
+
+Note: arvados-cwl-runner currently does not take workflow dependencies into account when setting the TTL on an intermediate output collection. If the TTL is too short, it is possible for a collection to be trashed before downstream steps that consume it are started.  The recommended minimum value for TTL is the expected duration for the entire the workflow.
+
+Using @--trash-intermediate@ without @--intermediate-output-ttl@ means that intermediate files will be trashed on successful completion, but will remain on workflow failure.
+
+Using @--intermediate-output-ttl@ without @--trash-intermediate@ means that intermediate files will be trashed only after the TTL expires (regardless of workflow success or failure).
diff --git a/sdk/cwl/arvados_cwl/__init__.py b/sdk/cwl/arvados_cwl/__init__.py
index a4e2f8a..be1ec27 100644
--- a/sdk/cwl/arvados_cwl/__init__.py
+++ b/sdk/cwl/arvados_cwl/__init__.py
@@ -76,6 +76,7 @@ class ArvCwlRunner(object):
         self.project_uuid = None
         self.intermediate_output_ttl = 0
         self.intermediate_output_collections = []
+        self.trash_intermediate = False
 
         if keep_client is not None:
             self.keep_client = keep_client
@@ -346,6 +347,7 @@ class ArvCwlRunner(object):
         self.fs_access = make_fs_access(kwargs["basedir"])
 
         self.intermediate_output_ttl = kwargs["intermediate_output_ttl"]
+        self.trash_intermediate = kwargs["trash_intermediate"]
         if self.intermediate_output_ttl and self.work_api != "containers":
             raise Exception("--intermediate-output-ttl is only supported when using the containers api.")
 
@@ -533,7 +535,7 @@ class ArvCwlRunner(object):
             adjustDirObjs(self.final_output, partial(get_listing, self.fs_access))
             adjustFileObjs(self.final_output, partial(compute_checksums, self.fs_access))
 
-        if self.final_status == "success":
+        if self.trash_intermediate and self.final_status == "success":
             self.trash_intermediate_output()
 
         return (self.final_output, self.final_status)
@@ -582,10 +584,10 @@ def arg_parser():  # type: () -> argparse.ArgumentParser
     exgroup = parser.add_mutually_exclusive_group()
     exgroup.add_argument("--enable-reuse", action="store_true",
                         default=True, dest="enable_reuse",
-                        help="")
+                        help="Enable job or container reuse (default)")
     exgroup.add_argument("--disable-reuse", action="store_false",
                         default=True, dest="enable_reuse",
-                        help="")
+                        help="Disable job or container reuse")
 
     parser.add_argument("--project-uuid", type=str, metavar="UUID", help="Project that will own the workflow jobs, if not provided, will go to home project.")
     parser.add_argument("--output-name", type=str, help="Name to use for collection that stores the final output.", default=None)
@@ -618,7 +620,8 @@ def arg_parser():  # type: () -> argparse.ArgumentParser
 
     parser.add_argument("--api", type=str,
                         default=None, dest="work_api",
-                        help="Select work submission API, one of 'jobs' or 'containers'. Default is 'jobs' if that API is available, otherwise 'containers'.")
+                        choices=("jobs", "containers"),
+                        help="Select work submission API.  Default is 'jobs' if that API is available, otherwise 'containers'.")
 
     parser.add_argument("--compute-checksum", action="store_true", default=False,
                         help="Compute checksum of contents while collecting outputs",
@@ -643,10 +646,19 @@ def arg_parser():  # type: () -> argparse.ArgumentParser
     parser.add_argument("--enable-dev", action="store_true",
                         help="Enable loading and running development versions "
                              "of CWL spec.", default=False)
+
     parser.add_argument("--intermediate-output-ttl", type=int, metavar="N",
-                        help="If N > 0, intermediate output collections will be trashed N seconds after creation, or on successful completion of workflow (whichever comes first).",
+                        help="If N > 0, intermediate output collections will be trashed N seconds after creation.  Default is 0 (don't trash).",
                         default=0)
 
+    exgroup = parser.add_mutually_exclusive_group()
+    exgroup.add_argument("--trash-intermediate", action="store_true",
+                        default=False, dest="trash_intermediate",
+                         help="Immediately trash intermediate outputs on workflow success.")
+    exgroup.add_argument("--no-trash-intermediate", action="store_false",
+                        default=False, dest="trash_intermediate",
+                        help="Do not trash intermediate outputs (default).")
+
     parser.add_argument("workflow", type=str, nargs="?", default=None, help="The workflow to execute")
     parser.add_argument("job_order", nargs=argparse.REMAINDER, help="The input object to the workflow.")
 
diff --git a/sdk/cwl/arvados_cwl/arv-cwl-schema.yml b/sdk/cwl/arvados_cwl/arv-cwl-schema.yml
index 1af6e38..b45378d 100644
--- a/sdk/cwl/arvados_cwl/arv-cwl-schema.yml
+++ b/sdk/cwl/arvados_cwl/arv-cwl-schema.yml
@@ -132,15 +132,14 @@ $graph:
       type: int
       doc: |
         If the value is greater than zero, consider intermediate output
-        collections to be temporary and should be automatically trashed.
-        Temporary collections will be trashed `outputTTL` seconds after
-        creation, or on successful completion of workflow (whichever comes
-        first).  A value of zero means intermediate output should be retained
-        indefinitely (this is the default behavior).
+        collections to be temporary and should be automatically
+        trashed. Temporary collections will be trashed `outputTTL` seconds
+        after creation.  A value of zero means intermediate output should be
+        retained indefinitely (this is the default behavior).
 
         Note: arvados-cwl-runner currently does not take workflow dependencies
-        into account when setting the TTL on an intermediate output collection.
-        If the TTL is too short, it is possible for a collection to be trashed
-        before before downstream steps that consume it are started.  The
-        recommend minimum value for TTL is the time required to complete
+        into account when setting the TTL on an intermediate output
+        collection. If the TTL is too short, it is possible for a collection to
+        be trashed before downstream steps that consume it are started.  The
+        recommended minimum value for TTL is the expected duration of the
         entire the workflow.
diff --git a/sdk/cwl/arvados_cwl/arvcontainer.py b/sdk/cwl/arvados_cwl/arvcontainer.py
index 374f368..dc2d02f 100644
--- a/sdk/cwl/arvados_cwl/arvcontainer.py
+++ b/sdk/cwl/arvados_cwl/arvcontainer.py
@@ -211,8 +211,7 @@ class ArvadosContainer(object):
     def done(self, record):
         outputs = {}
         try:
-            if self.output_ttl:
-                self.arvrunner.add_intermediate_output(record["output_uuid"])
+            self.arvrunner.add_intermediate_output(record["output_uuid"])
 
             container = self.arvrunner.api.containers().get(
                 uuid=record["container_uuid"]

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list