[ARVADOS] updated: 83a9390a05bbffc2e4ea95dd693af3ab3547fa12

git at public.curoverse.com git at public.curoverse.com
Thu Oct 9 09:32:41 EDT 2014


Summary of changes:
 crunch_scripts/run-command                        |  2 +-
 doc/_includes/_run_command_foreach_example.liquid |  2 +-
 doc/user/topics/run-command.html.textile.liquid   | 80 ++++++++++-------------
 3 files changed, 35 insertions(+), 49 deletions(-)

       via  83a9390a05bbffc2e4ea95dd693af3ab3547fa12 (commit)
      from  bdd309b073b6e836b78de28e82da89baba66a2a9 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.


commit 83a9390a05bbffc2e4ea95dd693af3ab3547fa12
Author: Peter Amstutz <peter.amstutz at curoverse.com>
Date:   Thu Oct 9 09:32:36 2014 -0400

    4042: Typo fixes.  Highlight run-command and script_parameters in text.  Rename
    --job-parameters to --script-parameters and add mention of --dry-run mode.

diff --git a/crunch_scripts/run-command b/crunch_scripts/run-command
index 7e86e9d..c1e7475 100755
--- a/crunch_scripts/run-command
+++ b/crunch_scripts/run-command
@@ -31,7 +31,7 @@ import tempfile
 
 parser = argparse.ArgumentParser()
 parser.add_argument('--dry-run', action='store_true')
-parser.add_argument('--job-parameters', type=str, default="{}")
+parser.add_argument('--script-parameters', type=str, default="{}")
 args = parser.parse_args()
 
 os.umask(0077)
diff --git a/doc/_includes/_run_command_foreach_example.liquid b/doc/_includes/_run_command_foreach_example.liquid
index d5643e6..3fb754f 100644
--- a/doc/_includes/_run_command_foreach_example.liquid
+++ b/doc/_includes/_run_command_foreach_example.liquid
@@ -26,7 +26,7 @@
                 "sample": {
                     "required": true,
                     "dataclass": "Collection"
-                }
+                },
                 "sample_subdir": "$(dir $(samples))",
                 "read_pair": {
                     "value": {
diff --git a/doc/user/topics/run-command.html.textile.liquid b/doc/user/topics/run-command.html.textile.liquid
index dc4bc1f..6d3e87b 100644
--- a/doc/user/topics/run-command.html.textile.liquid
+++ b/doc/user/topics/run-command.html.textile.liquid
@@ -8,21 +8,31 @@ The @run-command@ crunch script enables you run command line programs.
 
 h1. Using run-command
 
-The basic run-command process evaluates its inputs and builds a command line, executes the command, and saves the contents of the output directory back to Keep.  For large datasets, run-command can schedule concurrent tasks to execute the wrapped program over a range of inputs (see @task.foreach@ below.)
+The basic @run-command@ process evaluates its inputs and builds a command line, executes the command, and saves the contents of the output directory back to Keep.  For large datasets, @run-command@ can schedule concurrent tasks to execute the wrapped program over a range of inputs (see @task.foreach@ below.)
 
-Run-command is controlled through the script_parameters section of a pipeline component.  Script_parameters is a JSON object consisting of key-value pairs.  There are three categories of keys that are meaningful to run-command:
+ at run-command@ is controlled through the @script_parameters@ section of a pipeline component.  @script_parameters@ is a JSON object consisting of key-value pairs.  There are three categories of keys that are meaningful to run-command:
 * The @command@ section defining the template to build the command line of task
 * Special processing directives such as @task.foreach@ @task.cwd@ @task.vwd@ @task.stdin@ @task.stdout@
 * User-defined parameters (everything else)
 
+In the following examples, you can use "dry run mode" to determine the command line that @run-command@ will use without actually running the command.  For example:
+
+<notextile>
+<pre><code>~$ <span class="userinput">./run-command --dry-run --script-parameters '{
+  "command": ["echo", "hello world"]
+}'</span>
+run-command: echo hello world
+</code></pre>
+</notextile>
+
 h2. Command template
 
-The value of the "command" key is a list.  The first parameter of the list is the actual program to invoke, followed by the command arguments.  The simplest run-command invocation simply runs a program with static parameters.  In this example, run "echo" with the first argument "hello world":
+The value of the "command" key is a list.  The first parameter of the list is the actual program to invoke, followed by the command arguments.  The simplest @run-command@ invocation simply runs a program with static parameters.  In this example, run "echo" with the first argument "hello world":
 
 <pre>
-  "script_parameters": {
-    "command": ["echo", "hello world"]
-  }
+{
+  "command": ["echo", "hello world"]
+}
 </pre>
 
 Running this job will print "hello world" to the job log.
@@ -32,9 +42,9 @@ By default, the command will start with the current working directory set to the
 Items in the "command" list may include lists and objects in addition to strings.  Lists are flattened to produce the final command line.  JSON objects are evaluated as list item functions (see below).  For example, the following evaluates to @["echo", "hello", "world"]@:
 
 <pre>
-  "script_parameters": {
-    "command": ["echo", ["hello", "world"]]
-  }
+{
+  "command": ["echo", ["hello", "world"]]
+}
 </pre>
 
 h2. Parameter substitution
@@ -42,31 +52,7 @@ h2. Parameter substitution
 The "command" list can include parameter substitutions.  Substitutions are enclosed in "$(...)" and may contain the name of a user-defined parameter.  In the following example, the value of "a" is "hello world"; so when "command" is evaluated, it will substitute "hello world" for "$(a)":
 
 <pre>
-"script_parameters": {
-  "command": ["echo", "$(a)"],
-  "a": "hello world"
-}
-</pre>
-
-h2. Special parameters
-
-In addition to user-defined parameters, there are special parameters supplied by run-command that provide some information about the runtime environment.
-
-table(table table-bordered table-condensed).
-|_. Parameter   |_. Value |
-|$(node.cores)     |Number of cores on the current node|
-|$(task.tmpdir)    |Path to the temporary directory for this task      |
-|$(task.outdir)    |Path to the task's designated output directory.  Files written to this directory are automatically uploaded to Keep when the command completes.|
-|$(task.uuid)      |The current task's unique identifier      |
-|$(job.srcdir)     |The directory containing the source code for the run-command script|
-|$(job.uuid)       |The current job's unique identifier      |
-
-h2. Substitution functions
-
-Substitutions can also make use of functions.  Functions take a single parameter and substitution is performed recursively from the inside out.  In the following example, the parameter $(a) is evaluated first, then the $(file ...) function applied to get a local filesystem path, to produce a command like @["echo", "/path/to/keep/mount/c1bad4b39ca5a924e481008009d94e32+210/var-GS000016015-ASM.tsv.bz2"]@:
-
-<pre>
-"script_parameters": {
+{
   "command": ["echo", "$(file $(a))"],
   "a": "c1bad4b39ca5a924e481008009d94e32+210/var-GS000016015-ASM.tsv.bz2"
 }
@@ -77,7 +63,7 @@ table(table table-bordered table-condensed).
 |$(file ...)       | Takes a reference to a file within an Arvados collection and evaluates to a file path on the local file system where that file can be accessed by your command.  Will raise an error if the file is not accessible.|
 |$(dir ...)        | Takes a reference to an Arvados collection or directory within an Arvados collection and evaluates to a directory path on the local file system where that directory can be accessed by your command.  The path may include a file name, in which case it will evaluate to the parent directory of the file.  Uses Python's os.path.dirname(), so "/foo/bar" will evaluate to "/foo" but "/foo/bar/" will evaluate to "/foo/bar".  Will raise an error if the directory is not accessible. |
 |$(basename ...)   | Strip leading directory and trailing file extension from the path provided.  For example, $(basename /foo/bar.baz.txt) will evaluate to "bar.baz".|
-|$(glob ...)       | Take a unix shell path pattern (supports @*@ @?@ and @[]@) and search the local filesystem, returning the first match found.  Use together with $(dir ...) to get a local filesystem path for Arvados collections.  For example: $(glob $(dir $(mycollection)/*.bam)) will find the first .bam file in the collection specified by the user parameter "mycollection".  If there is more than one match, which one is returned is undefined.  Will raise an error if no matches are found.|
+|$(glob ...)       | Take a Unix shell path pattern (supports @*@ @?@ and @[]@) and search the local filesystem, returning the first match found.  Use together with $(dir ...) to get a local filesystem path for Arvados collections.  For example: $(glob $(dir $(mycollection)/*.bam)) will find the first .bam file in the collection specified by the user parameter "mycollection".  If there is more than one match, which one is returned is undefined.  Will raise an error if no matches are found.|
 
 h2. List context
 
@@ -91,14 +77,14 @@ If the value is a JSON object, it is evaluated as a list function described belo
 
 h2. List functions
 
-When run-command is evaluating a list (such as "command"), in addition to string parameter substitution, you can use list item functions.  Note: in the following functions, you specify the name of a user parameter to act on; you cannot provide the list value directly in line.
+When @run-command@ is evaluating a list (such as "command"), in addition to string parameter substitution, you can use list item functions.  Note: in the following functions, you specify the name of a user parameter to act on; you cannot provide the list value directly in line.
 
 h3. foreach
 
 The @foreach@ list item function (not to be confused with the @task.foreach@ directive) expands a command template for each item in the specified user parameter (the value of the user parameter is evaluated in a list context, as described above).  The following example will evaluate "command" to @["echo", "--something", "alice", "--something", "bob"]@:
 
 <pre>
-"script_parameters": {
+{
   "command": ["echo", {"foreach": "a", "command": ["--something", "$(a)"]}],
   "a": ["alice", "bob"]
 }
@@ -109,7 +95,7 @@ h3. index
 This function extracts a single item from a list.  The value of @index@ is zero-based (i.e. the first item is at index 0, the second item index 1, etc).  The following example will evaluate "command" to @["echo", "--something", "bob"]@:
 
 <pre>
-"script_parameters": {
+{
   "command": ["echo", {"list": "a", "index": 1, "command": ["--something", "$(a)"]}],
   "a": ["alice", "bob"]
 }
@@ -120,7 +106,7 @@ h3. filter
 Filter the list so that it only includes items that match a regular expression.  The following example will evaluate to @["echo", "bob"]@
 
 <pre>
-"script_parameters": {
+{
   "command": ["echo", {"filter": "a", "regex": "b.*"}],
   "a": ["alice", "bob"]
 }
@@ -131,7 +117,7 @@ h3. group
 Generate a list of lists, where items are grouped on common subexpression match.  Items which don't match the regular expression are excluded.  The following example evaluates to @["echo", "--group", "alice", "carol", "dave", "--group", "bob"]@:
 
 <pre>
-"script_parameters": {
+{
   "command": ["echo", {"foreach": "b", "command":["--group", {"foreach": "b", "command":"$(b)"}]}],
   "a": ["alice", "bob", "carol", "dave"],
   "b": {"group": "a", "regex": "[^a]*(a?).*"}
@@ -143,7 +129,7 @@ h3. extract
 Generate a list of lists, where items are split by subexpression match.  Items which don't match the regular expression are excluded.  The following example evaluates to @["echo", "c", "a", "rol", "d", "a", "ve"]@:
 
 <pre>
-"script_parameters": {
+{
   "command": ["echo", {"foreach": "b", "command":[{"foreach": "b", "command":"$(b)"}]}],
   "a": ["alice", "bob", "carol", "dave"],
   "b": {"extract": "a", "regex": "(.+)(a)(.*)"}
@@ -170,16 +156,16 @@ h3. task.vwd
 
 Background: because Keep collections are read-only, this does not play well with certain tools that expect to be able to write their outputs alongside their inputs (such as tools that generate indexes that are closely associated with the original file.)  The run-command's solution to this is the "virtual working directory".
 
- at task.vwd@ specifies a Keep collection with the starting contents of the directory.  Run-command will then populate @task.outdir@ with directories and symlinks to mirror the contents of the @task.vwd@ collection.  Your command will then be able to both access its input files and write its output files in @task.outdir at .  When the command completes, the output collection will merge the output of your command with the contents of the starting collection.  Note that files in the starting collection remain read-only and cannot be altered or deleted.
+ at task.vwd@ specifies a Keep collection with the starting contents of the directory.  @run-command@ will then populate @task.outdir@ with directories and symlinks to mirror the contents of the @task.vwd@ collection.  Your command will then be able to both access its input files and write its output files in @task.outdir at .  When the command completes, the output collection will merge the output of your command with the contents of the starting collection.  Note that files in the starting collection remain read-only and cannot be altered or deleted.
 
 h3. task.foreach
 
 Using @task.foreach@, you can run your command concurrently over large datasets.
 
- at task.foreach@ takes the names of one or more user-defined parameters.  The value of these parameters are evaluated in a list context.  Run-command then generates tasks based on the Cartesian product (i.e. all combinations) of the input lists.  The outputs of all tasks are merged to create the final output collection.  Note that if two tasks output a file in the same directory with the same name, that file will be concatenated in the final output.  In the following example, three tasks will be created for the "grep" command, based on the contents of user parameter "a":
+ at task.foreach@ takes the names of one or more user-defined parameters.  The value of these parameters are evaluated in a list context.  @run-command@ then generates tasks based on the Cartesian product (i.e. all combinations) of the input lists.  The outputs of all tasks are merged to create the final output collection.  Note that if two tasks output a file in the same directory with the same name, that file will be concatenated in the final output.  In the following example, three tasks will be created for the "grep" command, based on the contents of user parameter "a":
 
 <pre>
-"script_parameters": {
+{
   "command": ["echo", "$(a)"],
   "task.foreach": "a",
   "a": ["alice", "bob", "carol"]
@@ -198,7 +184,7 @@ This evaluates to the commands:
 You can also specify multiple parameters:
 
 <pre>
-"script_parameters": {
+{
   "command": ["echo", "$(a)", "$(b)"],
   "task.foreach": ["a", "b"],
   "a": ["alice", "bob"],
@@ -217,10 +203,10 @@ This evaluates to the commands:
 
 h1. Examples
 
-The following is a single task pipeline using run-command to run the bwa alignment tool to align a single paired-end read fastq sample.  The input to this pipeline is the reference genome and a collection consisting of two fastq files for the read pair.
+The following is a single task pipeline using @run-command@ to run the bwa alignment tool to align a single paired-end read fastq sample.  The input to this pipeline is the reference genome and a collection consisting of two fastq files for the read pair.
 
 <notextile>{% code 'run_command_simple_example' as javascript %}</notextile>
 
-The following is a concurrent task pipeline using run-command to run the bwa alignment tool to align a set of fastq reads over multiple sample.  The input to this pipeline is the reference genome and a collection consisting subdirectories for each sample, with each subdirectory containing pairs of fastq files for each set of reads.
+The following is a concurrent task pipeline using @run-command@ to run the bwa alignment tool to align a set of fastq reads over multiple samples.  The input to this pipeline is the reference genome and a collection consisting subdirectories for each sample, with each subdirectory containing pairs of fastq files for each set of reads.
 
 <notextile>{% code 'run_command_foreach_example' as javascript %}</notextile>

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list