[arvados] updated: 2.1.0-3081-gda952d583
git repository hosting
git at public.arvados.org
Tue Nov 29 14:59:05 UTC 2022
Summary of changes:
doc/user/cwl/cwl-style.html.textile.liquid | 22 ++++++++++++++--------
1 file changed, 14 insertions(+), 8 deletions(-)
via da952d583d65e9c6c7ff24ae40c4e0d0a21efd22 (commit)
from 3cdc1e47bf435c364644ce8ef792cb42e95ac183 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
commit da952d583d65e9c6c7ff24ae40c4e0d0a21efd22
Author: Peter Amstutz <peter.amstutz at curii.com>
Date: Tue Nov 29 09:58:50 2022 -0500
19699: Update from review comments
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz at curii.com>
diff --git a/doc/user/cwl/cwl-style.html.textile.liquid b/doc/user/cwl/cwl-style.html.textile.liquid
index 2466fe43a..911c9ba5a 100644
--- a/doc/user/cwl/cwl-style.html.textile.liquid
+++ b/doc/user/cwl/cwl-style.html.textile.liquid
@@ -172,7 +172,7 @@ Workflows should always provide @DockerRequirement@ in the @hints@ or @requireme
h3. Build a reusable library of components
-Build a reusable library of components. Share tool wrappers and subworkflows between projects. Make use of and contribute to "community maintained workflows and tools":https://github.com/common-workflow-library and tool registries such as "Dockstore":http://dockstore.org .
+Share tool wrappers and subworkflows between projects. Make use of and contribute to "community maintained workflows and tools":https://github.com/common-workflow-library and tool registries such as "Dockstore":http://dockstore.org .
h3. Supply scripts as input parameters
@@ -208,7 +208,7 @@ h3. Getting the temporary and output directories
You can get the designated temporary directory using @$(runtime.tmpdir)@ in your CWL file, or from the @$TMPDIR@ environment variable in your script.
-Similarly, you can get the designated output directory using $(runtime.outdir), or from the @HOME@ environment variable in your script.
+Similarly, you can get the designated output directory using @$(runtime.outdir)@, or from the @HOME@ environment variable in your script.
h3. Specifying @ResourceRequirement@
@@ -237,7 +237,7 @@ steps:
h3. Importing data into Keep
-You can use HTTP URLs in your input document and @arvados-cwl-runner@ will download them to Keep for you:
+You can use HTTP URLs as File input parameters and @arvados-cwl-runner@ will download them to Keep for you:
{% codeblock as yaml %}
fastq1:
@@ -250,14 +250,20 @@ fastq2:
Files are downloaded and stored in Keep collections with HTTP header information stored in metadata. If a file was previously downloaded, @arvados-cwl-runner@ uses HTTP caching rules to decide if a file should be re-downloaded or not.
- at arvados-cwl-runner@ also provides several additional options to control when the download happens, and caching behavior.
+The default behavior is to transfer the files on the client, prior to submitting the workflow run. This guarantees the data is available when the workflow is submitted. However, if data transfer is time consuming and you are submitting multiple workflow runs in a row, or the node submitting the workflow has limited bandwidth, you can use the @--defer-download@ option to have the data transfer performed by workflow runner process on a compute node, after the workflow is submitted.
-* ==--defer-download== will perform the download after the workflow is submitted (in the runner process on the compute node).
-* ==--varying-url-params== will ignore the listed URL query parameters from any HTTP URLs when checking if a URL has already been downloaded to Keep.
-* ==--prefer-cached-downloads== will search Keep for a copy of the HTTP URL's content and use that if found before downloading the resource. This means changes in the upstream resource won't be detected, but it also means the workflow will not fail if the upstream resource becomes inaccessible.
+ at arvados-cwl-runner@ provides two additional options to control caching behavior.
+
+* @--varying-url-params@ will ignore the listed URL query parameters from any HTTP URLs when checking if a URL has already been downloaded to Keep.
+* @--prefer-cached-downloads@ will search Keep for the previously downloaded URL and use that if found, without checking the upstream resource. This means changes in the upstream resource won't be detected, but it also means the workflow will not fail if the upstream resource becomes inaccessible.
One use of this is to import files from "AWS S3 signed URLs":https://docs.aws.amazon.com/AmazonS3/latest/userguide/ShareObjectPreSignedURL.html
Here is an example usage. The use of @--varying-url-params=AWSAccessKeyId,Signature,Expires@ is especially relevant, this removes these parameters from the cached URL, which means that if a new signed URL for the same object is generated later, it can be found in the cache.
- at arvados-cwl-runner --defer-download --varying-url-params=AWSAccessKeyId,Signature,Expires --prefer-cached-downloads workflow.cwl params.yml@
+{% codeblock as sh %}
+arvados-cwl-runner --defer-download \
+ --varying-url-params=AWSAccessKeyId,Signature,Expires \
+ --prefer-cached-downloads \
+ workflow.cwl params.yml
+{% endcodeblock %}
-----------------------------------------------------------------------
hooks/post-receive
--
More information about the arvados-commits
mailing list