[ARVADOS] created: 7af78c9694656d42c2bbb090e5aac62e7b3a0362
Git user
git at public.curoverse.com
Mon Sep 5 11:46:28 EDT 2016
at 7af78c9694656d42c2bbb090e5aac62e7b3a0362 (commit)
commit 7af78c9694656d42c2bbb090e5aac62e7b3a0362
Author: Peter Amstutz <peter.amstutz at curoverse.com>
Date: Mon Sep 5 11:46:09 2016 -0400
9932: Add CWL best practices guide
diff --git a/doc/_config.yml b/doc/_config.yml
index 8fb2ff7..96b8a52 100644
--- a/doc/_config.yml
+++ b/doc/_config.yml
@@ -44,8 +44,8 @@ navbar:
- user/topics/keep.html.textile.liquid
- user/topics/arv-copy.html.textile.liquid
- Using Common Workflow Language:
- - user/cwl/intro-cwl.html.textile.liquid
- user/cwl/cwl-runner.html.textile.liquid
+ - user/cwl/cwl-style.html.textile.liquid
- Working on the command line:
- user/topics/running-pipeline-command-line.html.textile.liquid
- user/topics/arv-run.html.textile.liquid
diff --git a/doc/user/cwl/cwl-style.html.textile.liquid b/doc/user/cwl/cwl-style.html.textile.liquid
new file mode 100644
index 0000000..a99c7bc
--- /dev/null
+++ b/doc/user/cwl/cwl-style.html.textile.liquid
@@ -0,0 +1,166 @@
+---
+layout: default
+navsection: userguide
+title: Best Practices for writing CWL
+...
+
+* Build a reusable library of components. Share tool wrappers and subworkflows between projects. Make use of and contribute to community resources such as https://github.com/common-workflow-language/workflows and http://dockstore.org
+
+* When combining a parameter value with a string, such as adding a filename extension, write @$(inputs.file.basename).ext@ instead of @$(inputs.file.basename + 'ext')@. The first form is evaluated as a simple text substitution, the second form (using the @+@ operator) is evaluated as an arbitrary Javascript expression and requires that you declare @InlineJavascriptRequirement at .
+
+* Avoid including @InlineJavascriptRequirement@ or @ShellCommandRequirement@ unless you specifically need them. Don't include them "just in case" because they change the default behavior and may imply extra overhead.
+
+* Don't write CWL scripts that access the Arvados SDK. This is non-portable; a script that access Arvados directly won't work with @cwltool@ or crunch v2.
+
+* CommandLineTools wrapping custom scripts should represent the script as an input parameter with the script file as a default value. Use @secondaryFiles@ for scripts that consist of multiple files. For example:
+
+<pre>
+cwlVersion: v1.0
+class: CommandLineTool
+baseCommand: python
+inputs:
+ script:
+ type: File
+ inputBinding: {position: 1}
+ default:
+ class: File
+ location: bclfastq.py
+ secondaryFiles:
+ - class: File
+ location: helper1.py
+ - class: File
+ location: helper2.py
+ inputfile:
+ type: File
+ inputBinding: {position: 2}
+outputs:
+ out:
+ type: File
+ outputBinding:
+ glob: "*.fastq"
+</pre>
+
+* You can get the designated temporary directory using @$(runtime.tmpdir)@ in your CWL file, or from the @$TMPDIR@ environment variable in your script.
+
+* Use @ExpressionTool@ to efficiently rearrange input files between steps of a Workflow. For example, the following expression accepts a directory containing files paired by @_R1_@ and @_R2_@ and produces an array of Directories containing each pair.
+
+<pre>
+class: ExpressionTool
+cwlVersion: v1.0
+inputs:
+ inputdir: Directory
+outputs:
+ out: Directory[]
+requirements:
+ InlineJavascriptRequirement: {}
+expression: |
+ ${
+ var samples = {};
+ for (var i = 0; i < inputs.inputdir.listing.length; i++) {
+ var file = inputs.inputdir.listing[i];
+ var groups = file.basename.match(/^(.+)(_R[12]_)(.+)$/);
+ if (groups) {
+ if (!samples[groups[1]]) {
+ samples[groups[1]] = [];
+ }
+ samples[groups[1]].push(file);
+ }
+ }
+ var dirs = [];
+ for (var key in samples) {
+ dirs.push({"class": "Directory",
+ "basename": key,
+ "listing": [samples[key]]});
+ }
+ return {"out": dirs};
+ }
+</pre>
+
+* Don't specifying resource requirements in CommandLineTool. Prefer to specify them in the workflow. You can provide a default resource requirement in the top level @hints@ section, and individual steps can override it with their own resource requirement.
+
+<pre>
+cwlVersion: v1.0
+class: Workflow
+inputs:
+ inp: File
+hints:
+ ResourceRequirement:
+ ramMin: 1000
+ coresMin: 1
+ tmpdirMin: 45000
+steps:
+ step1:
+ in: {inp: inp}
+ out: [out]
+ run: tool1.cwl
+ step2:
+ in: {inp: step1/inp}
+ out: [out]
+ run: tool2.cwl
+ hints:
+ ResourceRequirement:
+ ramMin: 2000
+ coresMin: 2
+ tmpdirMin: 90000
+</pre>
+
+* Instead of scattering separate steps, prefer to scatter over a subworkflow.
+
+With the following pattern, @step1@ has to complete for all samples to complete before @step2@ can start on any samples. This means a single long-running sample can prevent the rest of the workflow from moving on:
+
+<pre>
+cwlVersion: v1.0
+class: Workflow
+inputs:
+ inp: File
+steps:
+ step1:
+ in: {inp: inp}
+ scatter: inp
+ out: [out]
+ run: tool1.cwl
+ step2:
+ in: {inp: step1/inp}
+ scatter: inp
+ out: [out]
+ run: tool2.cwl
+ step3:
+ in: {inp: step2/inp}
+ scatter: inp
+ out: [out]
+ run: tool3.cwl
+</pre>
+
+Instead, scatter over a subworkflow. In this pattern, a sample can proceed from to @step2@ as soon as @step1@ is done, independently of any other samples.
+Example: (note, the subworkflow can also be put in a separate file)
+
+<pre>
+cwlVersion: v1.0
+class: Workflow
+steps:
+ step1:
+ in: {inp: inp}
+ scatter: inp
+ out: [out]
+ run:
+ class: Workflow
+ inputs:
+ inp: File
+ outputs:
+ out:
+ type: File
+ outputSource: step3/out
+ steps:
+ step1:
+ in: {inp: inp}
+ out: [out]
+ run: tool1.cwl
+ step2:
+ in: {inp: step1/inp}
+ out: [out]
+ run: tool2.cwl
+ step3:
+ in: {inp: step2/inp}
+ out: [out]
+ run: tool3.cwl
+</pre>
-----------------------------------------------------------------------
hooks/post-receive
--
More information about the arvados-commits
mailing list