[arvados] created: 2.1.0-3080-g3cdc1e47b

git repository hosting git at public.arvados.org
Wed Nov 23 20:08:59 UTC 2022


        at  3cdc1e47bf435c364644ce8ef792cb42e95ac183 (commit)


commit 3cdc1e47bf435c364644ce8ef792cb42e95ac183
Author: Peter Amstutz <peter.amstutz at curii.com>
Date:   Wed Nov 23 15:08:37 2022 -0500

    19699: Add section about data import
    
    Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz at curii.com>

diff --git a/doc/user/cwl/cwl-style.html.textile.liquid b/doc/user/cwl/cwl-style.html.textile.liquid
index 303ae37e9..2466fe43a 100644
--- a/doc/user/cwl/cwl-style.html.textile.liquid
+++ b/doc/user/cwl/cwl-style.html.textile.liquid
@@ -234,3 +234,30 @@ steps:
         coresMin: 2
         tmpdirMin: 90000
 {% endcodeblock %}
+
+h3. Importing data into Keep
+
+You can use HTTP URLs in your input document and @arvados-cwl-runner@ will download them to Keep for you:
+
+{% codeblock as yaml %}
+fastq1:
+  class: File
+  location: https://example.com/genomes/sampleA_1.fastq
+fastq2:
+  class: File
+  location: https://example.com/genomes/sampleA_2.fastq
+{% endcodeblock %}
+
+Files are downloaded and stored in Keep collections with HTTP header information stored in metadata.  If a file was previously downloaded, @arvados-cwl-runner@ uses HTTP caching rules to decide if a file should be re-downloaded or not.
+
+ at arvados-cwl-runner@ also provides several additional options to control when the download happens, and caching behavior.
+
+* ==--defer-download== will perform the download after the workflow is submitted (in the runner process on the compute node).
+* ==--varying-url-params== will ignore the listed URL query parameters from any HTTP URLs when checking if a URL has already been downloaded to Keep.
+* ==--prefer-cached-downloads== will search Keep for a copy of the HTTP URL's content and use that if found before downloading the resource. This means changes in the upstream resource won't be detected, but it also means the workflow will not fail if the upstream resource becomes inaccessible.
+
+One use of this is to import files from "AWS S3 signed URLs":https://docs.aws.amazon.com/AmazonS3/latest/userguide/ShareObjectPreSignedURL.html
+
+Here is an example usage.  The use of @--varying-url-params=AWSAccessKeyId,Signature,Expires@ is especially relevant, this removes these parameters from the cached URL, which means that if a new signed URL for the same object is generated later, it can be found in the cache.
+
+ at arvados-cwl-runner --defer-download --varying-url-params=AWSAccessKeyId,Signature,Expires --prefer-cached-downloads workflow.cwl params.yml@

commit 7d5593ff7e42845c03e35ba732f308ad00491a56
Author: Peter Amstutz <peter.amstutz at curii.com>
Date:   Wed Nov 23 14:53:27 2022 -0500

    19699: Refresh arvados-cwl-runner table of options
    
    Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz at curii.com>

diff --git a/doc/user/cwl/cwl-run-options.html.textile.liquid b/doc/user/cwl/cwl-run-options.html.textile.liquid
index 94e46ae1b..703ec8913 100644
--- a/doc/user/cwl/cwl-run-options.html.textile.liquid
+++ b/doc/user/cwl/cwl-run-options.html.textile.liquid
@@ -38,34 +38,40 @@ table(table table-bordered table-condensed).
 |==--output-name OUTPUT_NAME==|Name to use for collection that stores the final output.|
 |==--output-tags OUTPUT_TAGS==|Tags for the final output collection separated by commas, e.g., =='--output-tags tag0,tag1,tag2'==.|
 |==--ignore-docker-for-reuse==|Ignore Docker image version when deciding whether to reuse past containers.|
-|==--submit==|              Submit workflow runner to Arvados to manage the workflow (default).|
-|==--local==|               Run workflow on local host (still submits containers to Arvados).|
+|==--submit==|              Submit workflow to run on Arvados.|
+|==--local==|               Run workflow on local host (submits containers to Arvados).|
 |==--create-template==|     (Deprecated) synonym for --create-workflow.|
 |==--create-workflow==|     Register an Arvados workflow that can be run from Workbench|
-|==--update-workflow== UUID|Update an existing Arvados workflow or pipeline template with the given UUID.|
+|==--update-workflow== UUID|Update an existing Arvados workflow with the given UUID.|
 |==--wait==|                After submitting workflow runner, wait for completion.|
 |==--no-wait==|             Submit workflow runner and exit.|
 |==--log-timestamps==|      Prefix logging lines with timestamp|
 |==--no-log-timestamps==|   No timestamp on logging lines|
 |==--compute-checksum==|    Compute checksum of contents while collecting outputs|
-|==--submit-runner-ram== SUBMIT_RUNNER_RAM|RAM (in MiB) required for the workflow runner (default 1024)|
-|==--submit-runner-image== SUBMIT_RUNNER_IMAGE|Docker image for workflow runner|
+|==--submit-runner-ram== SUBMIT_RUNNER_RAM|RAM (in MiB) required for the workflow runner job (default 1024)|
+|==--submit-runner-image== SUBMIT_RUNNER_IMAGE|Docker image for workflow runner job|
 |==--always-submit-runner==|When invoked with --submit --wait, always submit a runner to manage the workflow, even when only running a single CommandLineTool|
 |==--match-submitter-images==|Where Arvados has more than one Docker image of the same name, use image from the Docker instance on the submitting node.|
 |==--submit-request-uuid== UUID|Update and commit to supplied container request instead of creating a new one.|
 |==--submit-runner-cluster== CLUSTER_ID|Submit workflow runner to a remote cluster|
-|==--name NAME==|Name to use for workflow execution instance.|
+|==--collection-cache-size== COLLECTION_CACHE_SIZE|Collection cache size (in MiB, default 256).|
+|==--name== NAME|Name to use for workflow execution instance.|
 |==--on-error== {stop,continue}|Desired workflow behavior when a step fails.  One of 'stop' (do not submit any more steps) or 'continue' (may submit other steps that are not downstream from the error). Default is 'continue'.|
-|==--enable-dev==|Enable loading and running development versions of CWL spec.|
-|==--storage-classes== STORAGE_CLASSES|Specify comma separated list of storage classes to be used when saving the final workflow output to Keep.|
-|==--intermediate-storage-classes== STORAGE_CLASSES|Specify comma separated list of storage classes to be used when intermediate workflow output to Keep.|
+|==--enable-dev==|Enable loading and running development versions of the CWL standards.|
+|==--storage-classes== STORAGE_CLASSES|Specify comma separated list of storage classes to be used when saving final workflow output to Keep.|
+|==--intermediate-storage-classes== INTERMEDIATE_STORAGE_CLASSES|Specify comma separated list of storage classes to be used when saving intermediate workflow output to Keep.|
 |==--intermediate-output-ttl== N|If N > 0, intermediate output collections will be trashed N seconds after creation. Default is 0 (don't trash).|
 |==--priority== PRIORITY|Workflow priority (range 1..1000, higher has precedence over lower)|
-|==--thread-count== THREAD_COUNT|Number of threads to use for container submit and output collection.|
+|==--thread-count== THREAD_COUNT|Number of threads to use for job submit and output collection.|
 |==--http-timeout== HTTP_TIMEOUT|API request timeout in seconds. Default is 300 seconds (5 minutes).|
-|==--enable-preemptible==|Use preemptible instances. Control individual steps with "arv:UsePreemptible":cwl-extensions.html#UsePreemptible hint.|
+|==--defer-downloads==|When submitting a workflow, defer downloading HTTP URLs to workflow launch instead of downloading to Keep before submit.|
+|==--varying-url-params== VARYING_URL_PARAMS|A comma separated list of URL query parameters that should be ignored when storing HTTP URLs in Keep.|
+|==--prefer-cached-downloads==|If a HTTP URL is found in Keep, skip upstream URL freshness check (will not notice if the upstream has changed, but also not error if upstream is unavailable).|
+|==--enable-preemptible==|Use preemptible instances. Control individual steps with arv:UsePreemptible hint.|
 |==--disable-preemptible==|Don't use preemptible instances.|
-|==--skip-schemas==|Skip loading of extension schemas (the $schemas section).|
+|==--copy-deps==|         Copy dependencies into the destination project.|
+|==--no-copy-deps==|      Leave dependencies where they are.|
+|==--skip-schemas==|      Skip loading of schemas|
 |==--trash-intermediate==|Immediately trash intermediate outputs on workflow success.|
 |==--no-trash-intermediate==|Do not trash intermediate outputs (default).|
 

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list