[ARVADOS] updated: bee4fa82c15b2188c2105fff3b52c305b38f04a6
Git user
git at public.curoverse.com
Wed May 24 10:36:44 EDT 2017
Summary of changes:
doc/user/cwl/cwl-extensions.html.textile.liquid | 4 ++--
doc/user/cwl/cwl-run-options.html.textile.liquid | 20 ++++++++++++++++++++
sdk/cwl/arvados_cwl/__init__.py | 22 +++++++++++++++++-----
sdk/cwl/arvados_cwl/arv-cwl-schema.yml | 17 ++++++++---------
sdk/cwl/arvados_cwl/arvcontainer.py | 3 +--
5 files changed, 48 insertions(+), 18 deletions(-)
via bee4fa82c15b2188c2105fff3b52c305b38f04a6 (commit)
from 8645888f12c25edaaac8e03fb5691cfcfbcdb9b2 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
commit bee4fa82c15b2188c2105fff3b52c305b38f04a6
Author: Peter Amstutz <peter.amstutz at curoverse.com>
Date: Wed May 24 10:36:40 2017 -0400
11100: Separate "trash intermediate on success" behavior from "output intermediate TTL" option. Update documentation.
diff --git a/doc/user/cwl/cwl-extensions.html.textile.liquid b/doc/user/cwl/cwl-extensions.html.textile.liquid
index 5e5e497..8a62034 100644
--- a/doc/user/cwl/cwl-extensions.html.textile.liquid
+++ b/doc/user/cwl/cwl-extensions.html.textile.liquid
@@ -82,5 +82,5 @@ Specify desired handling of intermediate output collections.
table(table table-bordered table-condensed).
|_. Field |_. Type |_. Description |
-|outputTTL|int|If the value is greater than zero, consider intermediate output collections to be temporary and should be automatically trashed. Temporary collections will be trashed `outputTTL` seconds after creation, or on successful completion of workflow (whichever comes first). A value of zero means intermediate output should be retained indefinitely (this is the default behavior).
-Note: arvados-cwl-runner currently does not take workflow dependencies into account when setting the TTL on an intermediate output collection. If the TTL is too short, it is possible for a collection to be trashed before before downstream steps that consume it are started. The recommend minimum value for TTL is the time required to complete entire the workflow.|
+|outputTTL|int|If the value is greater than zero, consider intermediate output collections to be temporary and should be automatically trashed. Temporary collections will be trashed @outputTTL@ seconds after creation. A value of zero means intermediate output should be retained indefinitely (this is the default behavior).
+Note: arvados-cwl-runner currently does not take workflow dependencies into account when setting the TTL on an intermediate output collection. If the TTL is too short, it is possible for a collection to be trashed before downstream steps that consume it are started. The recommended minimum value for TTL is the expected duration of the entire the workflow.|
diff --git a/doc/user/cwl/cwl-run-options.html.textile.liquid b/doc/user/cwl/cwl-run-options.html.textile.liquid
index c9b18e6..8d1ec19 100644
--- a/doc/user/cwl/cwl-run-options.html.textile.liquid
+++ b/doc/user/cwl/cwl-run-options.html.textile.liquid
@@ -10,6 +10,7 @@ table(table table-bordered table-condensed).
|_. Option |_. Description |
|==--basedir== BASEDIR| Base directory used to resolve relative references in the input, default to directory of input object file or current directory (if inputs piped/provided on command line).|
|==--version==| Print version and exit|
+|==--validate==| Validate CWL document only.|
|==--verbose==| Default logging|
|==--quiet==| Only print warnings and errors.|
|==--debug==| Print even more logging|
@@ -30,7 +31,14 @@ table(table table-bordered table-condensed).
|==--api== WORK_API| Select work submission API, one of 'jobs' or 'containers'. Default is 'jobs' if that API is available, otherwise 'containers'.|
|==--compute-checksum==| Compute checksum of contents while collecting outputs|
|==--submit-runner-ram== SUBMIT_RUNNER_RAM|RAM (in MiB) required for the workflow runner job (default 1024)|
+|==--submit-runner-image== SUBMIT_RUNNER_IMAGE|Docker image for workflow runner job, default arvados/jobs|
|==--name== NAME| Name to use for workflow execution instance.|
+|==--on-error {stop,continue}|Desired workflow behavior when a step fails. One of 'stop' or 'continue'. Default is 'continue'.|
+|==--enable-dev==| Enable loading and running development versions of CWL spec.|
+|==--intermediate-output-ttl== N|If N > 0, intermediate output collections will be trashed N seconds after creation. Default is 0 (don't trash).|
+|==--trash-intermediate==| Immediately trash intermediate outputs on workflow success.|
+|==--no-trash-intermediate==|Do not trash intermediate outputs (default).|
+
h3. Specify workflow and output names
@@ -92,3 +100,15 @@ arvados-cwl-runner 1.0.20160628195002, arvados-python-client 0.1.20160616015107,
}
</code></pre>
</notextile>
+
+h3. Automatically delete intermediate outputs
+
+Use the @--intermediate-output-ttl@ and @--trash-intermediate@ options to specify how long intermediate outputs should be kept (in seconds) and whether to trash them immediately upon successful workflow completion.
+
+Temporary collections will be trashed @intermediate-output-ttl@ seconds after creation. A value of zero (default) means intermediate output should be retained indefinitely.
+
+Note: arvados-cwl-runner currently does not take workflow dependencies into account when setting the TTL on an intermediate output collection. If the TTL is too short, it is possible for a collection to be trashed before downstream steps that consume it are started. The recommended minimum value for TTL is the expected duration for the entire the workflow.
+
+Using @--trash-intermediate@ without @--intermediate-output-ttl@ means that intermediate files will be trashed on successful completion, but will remain on workflow failure.
+
+Using @--intermediate-output-ttl@ without @--trash-intermediate@ means that intermediate files will be trashed only after the TTL expires (regardless of workflow success or failure).
diff --git a/sdk/cwl/arvados_cwl/__init__.py b/sdk/cwl/arvados_cwl/__init__.py
index a4e2f8a..be1ec27 100644
--- a/sdk/cwl/arvados_cwl/__init__.py
+++ b/sdk/cwl/arvados_cwl/__init__.py
@@ -76,6 +76,7 @@ class ArvCwlRunner(object):
self.project_uuid = None
self.intermediate_output_ttl = 0
self.intermediate_output_collections = []
+ self.trash_intermediate = False
if keep_client is not None:
self.keep_client = keep_client
@@ -346,6 +347,7 @@ class ArvCwlRunner(object):
self.fs_access = make_fs_access(kwargs["basedir"])
self.intermediate_output_ttl = kwargs["intermediate_output_ttl"]
+ self.trash_intermediate = kwargs["trash_intermediate"]
if self.intermediate_output_ttl and self.work_api != "containers":
raise Exception("--intermediate-output-ttl is only supported when using the containers api.")
@@ -533,7 +535,7 @@ class ArvCwlRunner(object):
adjustDirObjs(self.final_output, partial(get_listing, self.fs_access))
adjustFileObjs(self.final_output, partial(compute_checksums, self.fs_access))
- if self.final_status == "success":
+ if self.trash_intermediate and self.final_status == "success":
self.trash_intermediate_output()
return (self.final_output, self.final_status)
@@ -582,10 +584,10 @@ def arg_parser(): # type: () -> argparse.ArgumentParser
exgroup = parser.add_mutually_exclusive_group()
exgroup.add_argument("--enable-reuse", action="store_true",
default=True, dest="enable_reuse",
- help="")
+ help="Enable job or container reuse (default)")
exgroup.add_argument("--disable-reuse", action="store_false",
default=True, dest="enable_reuse",
- help="")
+ help="Disable job or container reuse")
parser.add_argument("--project-uuid", type=str, metavar="UUID", help="Project that will own the workflow jobs, if not provided, will go to home project.")
parser.add_argument("--output-name", type=str, help="Name to use for collection that stores the final output.", default=None)
@@ -618,7 +620,8 @@ def arg_parser(): # type: () -> argparse.ArgumentParser
parser.add_argument("--api", type=str,
default=None, dest="work_api",
- help="Select work submission API, one of 'jobs' or 'containers'. Default is 'jobs' if that API is available, otherwise 'containers'.")
+ choices=("jobs", "containers"),
+ help="Select work submission API. Default is 'jobs' if that API is available, otherwise 'containers'.")
parser.add_argument("--compute-checksum", action="store_true", default=False,
help="Compute checksum of contents while collecting outputs",
@@ -643,10 +646,19 @@ def arg_parser(): # type: () -> argparse.ArgumentParser
parser.add_argument("--enable-dev", action="store_true",
help="Enable loading and running development versions "
"of CWL spec.", default=False)
+
parser.add_argument("--intermediate-output-ttl", type=int, metavar="N",
- help="If N > 0, intermediate output collections will be trashed N seconds after creation, or on successful completion of workflow (whichever comes first).",
+ help="If N > 0, intermediate output collections will be trashed N seconds after creation. Default is 0 (don't trash).",
default=0)
+ exgroup = parser.add_mutually_exclusive_group()
+ exgroup.add_argument("--trash-intermediate", action="store_true",
+ default=False, dest="trash_intermediate",
+ help="Immediately trash intermediate outputs on workflow success.")
+ exgroup.add_argument("--no-trash-intermediate", action="store_false",
+ default=False, dest="trash_intermediate",
+ help="Do not trash intermediate outputs (default).")
+
parser.add_argument("workflow", type=str, nargs="?", default=None, help="The workflow to execute")
parser.add_argument("job_order", nargs=argparse.REMAINDER, help="The input object to the workflow.")
diff --git a/sdk/cwl/arvados_cwl/arv-cwl-schema.yml b/sdk/cwl/arvados_cwl/arv-cwl-schema.yml
index 1af6e38..b45378d 100644
--- a/sdk/cwl/arvados_cwl/arv-cwl-schema.yml
+++ b/sdk/cwl/arvados_cwl/arv-cwl-schema.yml
@@ -132,15 +132,14 @@ $graph:
type: int
doc: |
If the value is greater than zero, consider intermediate output
- collections to be temporary and should be automatically trashed.
- Temporary collections will be trashed `outputTTL` seconds after
- creation, or on successful completion of workflow (whichever comes
- first). A value of zero means intermediate output should be retained
- indefinitely (this is the default behavior).
+ collections to be temporary and should be automatically
+ trashed. Temporary collections will be trashed `outputTTL` seconds
+ after creation. A value of zero means intermediate output should be
+ retained indefinitely (this is the default behavior).
Note: arvados-cwl-runner currently does not take workflow dependencies
- into account when setting the TTL on an intermediate output collection.
- If the TTL is too short, it is possible for a collection to be trashed
- before before downstream steps that consume it are started. The
- recommend minimum value for TTL is the time required to complete
+ into account when setting the TTL on an intermediate output
+ collection. If the TTL is too short, it is possible for a collection to
+ be trashed before downstream steps that consume it are started. The
+ recommended minimum value for TTL is the expected duration of the
entire the workflow.
diff --git a/sdk/cwl/arvados_cwl/arvcontainer.py b/sdk/cwl/arvados_cwl/arvcontainer.py
index 374f368..dc2d02f 100644
--- a/sdk/cwl/arvados_cwl/arvcontainer.py
+++ b/sdk/cwl/arvados_cwl/arvcontainer.py
@@ -211,8 +211,7 @@ class ArvadosContainer(object):
def done(self, record):
outputs = {}
try:
- if self.output_ttl:
- self.arvrunner.add_intermediate_output(record["output_uuid"])
+ self.arvrunner.add_intermediate_output(record["output_uuid"])
container = self.arvrunner.api.containers().get(
uuid=record["container_uuid"]
-----------------------------------------------------------------------
hooks/post-receive
--
More information about the arvados-commits
mailing list