[ARVADOS] created: 7af78c9694656d42c2bbb090e5aac62e7b3a0362

Git user git at public.curoverse.com
Mon Sep 5 11:46:28 EDT 2016


        at  7af78c9694656d42c2bbb090e5aac62e7b3a0362 (commit)


commit 7af78c9694656d42c2bbb090e5aac62e7b3a0362
Author: Peter Amstutz <peter.amstutz at curoverse.com>
Date:   Mon Sep 5 11:46:09 2016 -0400

    9932: Add CWL best practices guide

diff --git a/doc/_config.yml b/doc/_config.yml
index 8fb2ff7..96b8a52 100644
--- a/doc/_config.yml
+++ b/doc/_config.yml
@@ -44,8 +44,8 @@ navbar:
       - user/topics/keep.html.textile.liquid
       - user/topics/arv-copy.html.textile.liquid
     - Using Common Workflow Language:
-      - user/cwl/intro-cwl.html.textile.liquid
       - user/cwl/cwl-runner.html.textile.liquid
+      - user/cwl/cwl-style.html.textile.liquid
     - Working on the command line:
       - user/topics/running-pipeline-command-line.html.textile.liquid
       - user/topics/arv-run.html.textile.liquid
diff --git a/doc/user/cwl/cwl-style.html.textile.liquid b/doc/user/cwl/cwl-style.html.textile.liquid
new file mode 100644
index 0000000..a99c7bc
--- /dev/null
+++ b/doc/user/cwl/cwl-style.html.textile.liquid
@@ -0,0 +1,166 @@
+---
+layout: default
+navsection: userguide
+title: Best Practices for writing CWL
+...
+
+* Build a reusable library of components.  Share tool wrappers and subworkflows between projects.  Make use of and contribute to community resources such as https://github.com/common-workflow-language/workflows and http://dockstore.org
+
+* When combining a parameter value with a string, such as adding a filename extension, write @$(inputs.file.basename).ext@ instead of @$(inputs.file.basename + 'ext')@.  The first form is evaluated as a simple text substitution, the second form (using the @+@ operator) is evaluated as an arbitrary Javascript expression and requires that you declare @InlineJavascriptRequirement at .
+
+* Avoid including @InlineJavascriptRequirement@ or @ShellCommandRequirement@ unless you specifically need them.  Don't include them "just in case" because they change the default behavior and may imply extra overhead.
+
+* Don't write CWL scripts that access the Arvados SDK.  This is non-portable; a script that access Arvados directly won't work with @cwltool@ or crunch v2.
+
+* CommandLineTools wrapping custom scripts should represent the script as an input parameter with the script file as a default value.  Use @secondaryFiles@ for scripts that consist of multiple files.  For example:
+
+<pre>
+cwlVersion: v1.0
+class: CommandLineTool
+baseCommand: python
+inputs:
+  script:
+    type: File
+    inputBinding: {position: 1}
+    default:
+      class: File
+      location: bclfastq.py
+      secondaryFiles:
+        - class: File
+          location: helper1.py
+        - class: File
+          location: helper2.py
+  inputfile:
+    type: File
+    inputBinding: {position: 2}
+outputs:
+  out:
+    type: File
+    outputBinding:
+      glob: "*.fastq"
+</pre>
+
+* You can get the designated temporary directory using @$(runtime.tmpdir)@ in your CWL file, or from the @$TMPDIR@ environment variable in your script.
+
+* Use @ExpressionTool@ to efficiently rearrange input files between steps of a Workflow.  For example, the following expression accepts a directory containing files paired by @_R1_@ and @_R2_@ and produces an array of Directories containing each pair.
+
+<pre>
+class: ExpressionTool
+cwlVersion: v1.0
+inputs:
+  inputdir: Directory
+outputs:
+  out: Directory[]
+requirements:
+  InlineJavascriptRequirement: {}
+expression: |
+  ${
+    var samples = {};
+    for (var i = 0; i < inputs.inputdir.listing.length; i++) {
+      var file = inputs.inputdir.listing[i];
+      var groups = file.basename.match(/^(.+)(_R[12]_)(.+)$/);
+      if (groups) {
+        if (!samples[groups[1]]) {
+          samples[groups[1]] = [];
+        }
+        samples[groups[1]].push(file);
+      }
+    }
+    var dirs = [];
+    for (var key in samples) {
+      dirs.push({"class": "Directory",
+                 "basename": key,
+                 "listing": [samples[key]]});
+    }
+    return {"out": dirs};
+  }
+</pre>
+
+* Don't specifying resource requirements in CommandLineTool.  Prefer to specify them in the workflow.  You can provide a default resource requirement in the top level @hints@ section, and individual steps can override it with their own resource requirement.
+
+<pre>
+cwlVersion: v1.0
+class: Workflow
+inputs:
+  inp: File
+hints:
+  ResourceRequirement:
+    ramMin: 1000
+    coresMin: 1
+    tmpdirMin: 45000
+steps:
+  step1:
+    in: {inp: inp}
+    out: [out]
+    run: tool1.cwl
+  step2:
+    in: {inp: step1/inp}
+    out: [out]
+    run: tool2.cwl
+    hints:
+      ResourceRequirement:
+        ramMin: 2000
+        coresMin: 2
+        tmpdirMin: 90000
+</pre>
+
+* Instead of scattering separate steps, prefer to scatter over a subworkflow.
+
+With the following pattern, @step1@ has to complete for all samples to complete before @step2@ can start on any samples.  This means a single long-running sample can prevent the rest of the workflow from moving on:
+
+<pre>
+cwlVersion: v1.0
+class: Workflow
+inputs:
+  inp: File
+steps:
+  step1:
+    in: {inp: inp}
+    scatter: inp
+    out: [out]
+    run: tool1.cwl
+  step2:
+    in: {inp: step1/inp}
+    scatter: inp
+    out: [out]
+    run: tool2.cwl
+  step3:
+    in: {inp: step2/inp}
+    scatter: inp
+    out: [out]
+    run: tool3.cwl
+</pre>
+
+Instead, scatter over a subworkflow.  In this pattern, a sample can proceed from to @step2@ as soon as @step1@ is done, independently of any other samples.
+Example: (note, the subworkflow can also be put in a separate file)
+
+<pre>
+cwlVersion: v1.0
+class: Workflow
+steps:
+  step1:
+    in: {inp: inp}
+    scatter: inp
+    out: [out]
+    run:
+      class: Workflow
+      inputs:
+        inp: File
+      outputs:
+        out:
+          type: File
+          outputSource: step3/out
+      steps:
+        step1:
+          in: {inp: inp}
+          out: [out]
+          run: tool1.cwl
+        step2:
+          in: {inp: step1/inp}
+          out: [out]
+          run: tool2.cwl
+        step3:
+          in: {inp: step2/inp}
+          out: [out]
+          run: tool3.cwl
+</pre>

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list