[ARVADOS] updated: 88b5b089f38a587292bb68772251b8275c0e27c7
git at public.curoverse.com
git at public.curoverse.com
Tue Mar 11 10:51:01 EDT 2014
Summary of changes:
doc/_config.yml | 7 +-
doc/_includes/_run_md5sum_py.liquid | 33 ++-----
doc/_includes/_tutorial_hash_script_py.liquid | 9 +-
doc/_layouts/default.html.liquid | 1 +
.../examples/crunch-examples.html.textile.liquid | 4 -
.../check-environment.html.textile.liquid | 4 -
.../getting_started/community.html.textile.liquid | 4 -
.../getting_started/ssh-access.html.textile.liquid | 4 -
.../getting_started/workbench.html.textile.liquid | 3 -
doc/user/index.html.textile.liquid | 2 -
doc/user/reference/api-tokens.html.textile.liquid | 4 -
doc/user/reference/sdk-cli.html.textile.liquid | 3 -
doc/user/topics/keep.html.textile.liquid | 3 -
...nning-pipeline-command-line.html.textile.liquid | 86 +++++++++++++++
...rial-gatk-variantfiltration.html.textile.liquid | 2 -
.../topics/tutorial-job-debug.html.textile.liquid | 4 -
.../tutorial-job1.html.textile.liquid | 3 -
.../topics/tutorial-parallel.html.textile.liquid | 4 -
.../tutorial-trait-search.html.textile.liquid | 4 -
.../running-external-program.html.textile.liquid | 48 ++++-----
.../tutorial-firstscript.html.textile.liquid | 66 +++++++-----
.../tutorials/tutorial-keep.html.textile.liquid | 5 +-
.../tutorial-new-pipeline.html.textile.liquid | 109 ++------------------
...tutorial-pipeline-workbench.html.textile.liquid | 11 ++-
24 files changed, 186 insertions(+), 237 deletions(-)
create mode 100644 doc/user/topics/running-pipeline-command-line.html.textile.liquid
rename doc/user/{tutorials => topics}/tutorial-job1.html.textile.liquid (99%)
rename doc/user/{tutorials => topics}/tutorial-trait-search.html.textile.liquid (99%)
rename doc/user/{topics => tutorials}/running-external-program.html.textile.liquid (68%)
via 88b5b089f38a587292bb68772251b8275c0e27c7 (commit)
from 491f4f3023d0f45be94e5f7091da85094e887212 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
commit 88b5b089f38a587292bb68772251b8275c0e27c7
Author: Peter Amstutz <peter.amstutz at curoverse.com>
Date: Tue Mar 11 10:52:19 2014 -0400
Reorganizing documentation work in progress, checkpointing so that Brett can start looking through it.
diff --git a/doc/_config.yml b/doc/_config.yml
index 18b37ea..bc71282 100644
--- a/doc/_config.yml
+++ b/doc/_config.yml
@@ -23,14 +23,15 @@ navbar:
- Tutorials:
- user/tutorials/tutorial-keep.html.textile.liquid
- user/tutorials/tutorial-pipeline-workbench.html.textile.liquid
- - user/tutorials/tutorial-job1.html.textile.liquid
- user/tutorials/tutorial-firstscript.html.textile.liquid
- user/tutorials/tutorial-new-pipeline.html.textile.liquid
- - user/tutorials/tutorial-trait-search.html.textile.liquid
+ - user/tutorials/running-external-program.html.textile.liquid
- Intermediate topics:
+ - user/topics/tutorial-job1.html.textile.liquid
+ - user/topics/running-pipeline-command-line.html.textile.liquid
- user/topics/tutorial-job-debug.html.textile.liquid
- - user/topics/running-external-program.html.textile.liquid
- user/topics/tutorial-parallel.html.textile.liquid
+ - user/topics/tutorial-trait-search.html.textile.liquid
- user/topics/tutorial-gatk-variantfiltration.html.textile.liquid
- user/topics/keep.html.textile.liquid
- Examples:
diff --git a/doc/_includes/_run_md5sum_py.liquid b/doc/_includes/_run_md5sum_py.liquid
index a770c86..16516a8 100644
--- a/doc/_includes/_run_md5sum_py.liquid
+++ b/doc/_includes/_run_md5sum_py.liquid
@@ -2,34 +2,17 @@
import arvados
-arvados.job_setup.one_task_per_input_file(if_sequence=0, and_end_task=True)
-this_task = arvados.current_task()
+# Automatically parallelize this job by running one task per file.
+arvados.job_setup.one_task_per_input_file(if_sequence=0, and_end_task=True, input_as_path=True)
-# Get the input collection for this task
-this_task_input = this_task['parameters']['input']
+# Get the input file for the task
+input_file = arvados.get_task_param_mount('input')
-# Create a CollectionReader to access the collection
-input_collection = arvados.CollectionReader(this_task_input)
+# Run the external 'md5sum' program on the input file
+stdoutdata, stderrdata = arvados.util.run_command(['md5sum', input_file])
-# Get the name of the first file in the collection
-input_file = list(input_collection.all_files())[0].name()
-
-# Extract the file to a temporary directory
-# Returns the directory that the file was written to
-input_dir = arvados.util.collection_extract(this_task_input,
- 'tmp',
- files=[input_file],
- decompress=False)
-
-# Run the external 'md5sum' program on the input file, with the current working
-# directory set to the location the input file was extracted to.
-stdoutdata, stderrdata = arvados.util.run_command(
- ['md5sum', input_file],
- cwd=input_dir)
-
-# Save the standard output (stdoutdata) "md5sum.txt" in the output collection
+# Save the standard output (stdoutdata) to "md5sum.txt" in the output collection
out = arvados.CollectionWriter()
out.set_current_file_name("md5sum.txt")
out.write(stdoutdata)
-
-this_task.set_output(out.finish())
+arvados.current_task().set_output(out.finish())
diff --git a/doc/_includes/_tutorial_hash_script_py.liquid b/doc/_includes/_tutorial_hash_script_py.liquid
index 6462aab..0dcabae 100644
--- a/doc/_includes/_tutorial_hash_script_py.liquid
+++ b/doc/_includes/_tutorial_hash_script_py.liquid
@@ -10,11 +10,14 @@ import arvados # Import the Arvados sdk module
arvados.job_setup.one_task_per_input_file(if_sequence=0, and_end_task=True,
input_as_path=True)
-# Create the object that will actually compute the md5 hash
+# Create the message digest object that will compute the md5 hash
digestor = hashlib.new('md5')
-# Get the input file for the task and open it for reading
-with open(arvados.get_task_param_mount('input')) as f:
+# Get the input file for the task
+input_file = arvados.get_task_param_mount('input')
+
+# Open the input file for reading
+with open(input_file) as f:
while True:
buf = f.read(2**20) # read a 1 megabyte block from the file
if len(buf) == 0: # break when there is no more data left
diff --git a/doc/_layouts/default.html.liquid b/doc/_layouts/default.html.liquid
index 1f42d80..f9c0db5 100644
--- a/doc/_layouts/default.html.liquid
+++ b/doc/_layouts/default.html.liquid
@@ -75,6 +75,7 @@
<div class="row">
{% include 'navbar_left' %}
<div class="col-sm-9">
+ <h1>{{ page.title }}</h1>
{{ content }}
</div>
</div>
diff --git a/doc/user/examples/crunch-examples.html.textile.liquid b/doc/user/examples/crunch-examples.html.textile.liquid
index b657a68..13bb1ae 100644
--- a/doc/user/examples/crunch-examples.html.textile.liquid
+++ b/doc/user/examples/crunch-examples.html.textile.liquid
@@ -1,13 +1,9 @@
---
layout: default
navsection: userguide
-navmenu: Examples
title: "Crunch examples"
-
...
-h1. Crunch examples
-
Several crunch scripts are included with Arvados in the "/crunch_scripts directory":https://arvados.org/projects/arvados/repository/revisions/master/show/crunch_scripts. They are intended to provide examples and starting points for writing your own scripts.
h4. bwa-aln
diff --git a/doc/user/getting_started/check-environment.html.textile.liquid b/doc/user/getting_started/check-environment.html.textile.liquid
index 6cf35f3..bb29373 100644
--- a/doc/user/getting_started/check-environment.html.textile.liquid
+++ b/doc/user/getting_started/check-environment.html.textile.liquid
@@ -1,13 +1,9 @@
---
layout: default
navsection: userguide
-navmenu: Getting Started
title: "Checking your environment"
-
...
-h1. Checking your environment
-
First you should "log into an Arvados VM instance":{{site.baseurl}}/user/getting_started/ssh-access.html#login if you have not already done so.
If @arv user current@ is able to access the API server, it will print out information about your account. Check that you are able to access the Arvados API server using the following command:
diff --git a/doc/user/getting_started/community.html.textile.liquid b/doc/user/getting_started/community.html.textile.liquid
index c910ac1..8b6e22d 100644
--- a/doc/user/getting_started/community.html.textile.liquid
+++ b/doc/user/getting_started/community.html.textile.liquid
@@ -1,12 +1,8 @@
---
layout: default
navsection: userguide
-navmenu: Getting Started
title: Arvados Community and Getting Help
-
...
-h1. Arvados Community and Getting Help
-
h2. On the web
diff --git a/doc/user/getting_started/ssh-access.html.textile.liquid b/doc/user/getting_started/ssh-access.html.textile.liquid
index d08ad87..ddaf192 100644
--- a/doc/user/getting_started/ssh-access.html.textile.liquid
+++ b/doc/user/getting_started/ssh-access.html.textile.liquid
@@ -1,13 +1,9 @@
---
layout: default
navsection: userguide
-navmenu: Getting Started
title: Accessing an Arvados VM over ssh
-
...
-h1. Accessing an Arvados Virtual Machine over ssh
-
Arvados requires a public @ssh@ key in order to securely log in to an Arvados VM instance, or to access an Arvados @git@ repository.
This document is divided up into three sections.
diff --git a/doc/user/getting_started/workbench.html.textile.liquid b/doc/user/getting_started/workbench.html.textile.liquid
index 0dbb151..74cee77 100644
--- a/doc/user/getting_started/workbench.html.textile.liquid
+++ b/doc/user/getting_started/workbench.html.textile.liquid
@@ -1,11 +1,8 @@
---
layout: default
navsection: userguide
-navmenu: Getting Started
title: Accessing Arvados Workbench
-
...
-h1. Accessing Arvados Workbench
Access the Arvados beta test instance available using this link:
diff --git a/doc/user/index.html.textile.liquid b/doc/user/index.html.textile.liquid
index fd66764..950d7dc 100644
--- a/doc/user/index.html.textile.liquid
+++ b/doc/user/index.html.textile.liquid
@@ -4,8 +4,6 @@ navsection: userguide
title: Welcome to Arvados!
...
-h1. Welcome to Arvados!
-
This guide is intended to introduce new users to the Arvados system. It covers initial configuration required to access the system and then presents several tutorials on using Arvados to do data processing.
This user guide introduces how to use the major components of Arvados. These are:
diff --git a/doc/user/reference/api-tokens.html.textile.liquid b/doc/user/reference/api-tokens.html.textile.liquid
index d341c43..018c71c 100644
--- a/doc/user/reference/api-tokens.html.textile.liquid
+++ b/doc/user/reference/api-tokens.html.textile.liquid
@@ -1,13 +1,9 @@
---
layout: default
navsection: userguide
-navmenu: Reference
title: "Getting an API token"
-
...
-h1. Reference: Getting an API token
-
The Arvados API token is a secret key that enables the @arv@ command line client to access Arvados with the proper permissions.
Access the Arvados workbench using this link: "https://{{ site.arvados_workbench_host }}/":https://{{ site.arvados_workbench_host }}/
diff --git a/doc/user/reference/sdk-cli.html.textile.liquid b/doc/user/reference/sdk-cli.html.textile.liquid
index c795631..f44fef2 100644
--- a/doc/user/reference/sdk-cli.html.textile.liquid
+++ b/doc/user/reference/sdk-cli.html.textile.liquid
@@ -1,12 +1,9 @@
---
layout: default
navsection: userguide
-navmenu: Reference
title: "Command line interface"
...
-h1. Reference: Command Line Interface
-
*First, you should be "logged into an Arvados VM instance":{{site.baseurl}}/user/getting_started/ssh-access.html#login, and have a "working environment.":{{site.baseurl}}/user/getting_started/check-environment.html*
h3. Usage
diff --git a/doc/user/topics/keep.html.textile.liquid b/doc/user/topics/keep.html.textile.liquid
index f7c5926..4d169af 100644
--- a/doc/user/topics/keep.html.textile.liquid
+++ b/doc/user/topics/keep.html.textile.liquid
@@ -1,12 +1,9 @@
---
layout: default
navsection: userguide
-navmenu: Topics
title: "How Keep works"
...
-h1. Getting Data from Keep
-
In Keep, information is stored in *data blocks*. Data blocks are normally between 1 byte and 64 megabytes in size. If a file exceeds the maximum size of a single data block, the file will be split across multiple data blocks until the entire file can be stored. These data blocks may be stored and replicated across multiple disks, servers, or clusters. Each data block has its own identifier for the contents of that specific data block.
In order to reassemble the file, Keep stores a *collection* data block which lists in sequence the data blocks that make up the original file. A collection data block may store the information for multiple files, including a directory structure.
diff --git a/doc/user/topics/running-pipeline-command-line.html.textile.liquid b/doc/user/topics/running-pipeline-command-line.html.textile.liquid
new file mode 100644
index 0000000..b8ee8ed
--- /dev/null
+++ b/doc/user/topics/running-pipeline-command-line.html.textile.liquid
@@ -0,0 +1,86 @@
+---
+layout: default
+navsection: userguide
+title: "Running a pipeline on the command line"
+...
+
+Run the pipeline using @arv pipeline run@, using the UUID that you received from @arv pipeline create@:
+
+<notextile>
+<pre><code>$ <span class="userinput">arv pipeline run --template qr1hi-p5p6p-xxxxxxxxxxxxxxx</span>
+2013-12-16 14:08:40 +0000 -- pipeline_instance qr1hi-d1hrv-vxzkp38nlde9yyr
+do_hash qr1hi-8i9sb-hoyc2u964ecv1s6 queued 2013-12-16T14:08:40Z
+filter - -
+2013-12-16 14:08:51 +0000 -- pipeline_instance qr1hi-d1hrv-vxzkp38nlde9yyr
+do_hash qr1hi-8i9sb-hoyc2u964ecv1s6 e2ccd204bca37c77c0ba59fc470cd0f7+162
+filter qr1hi-8i9sb-w5k40fztqgg9i2x queued 2013-12-16T14:08:50Z
+2013-12-16 14:09:01 +0000 -- pipeline_instance qr1hi-d1hrv-vxzkp38nlde9yyr
+do_hash qr1hi-8i9sb-hoyc2u964ecv1s6 e2ccd204bca37c77c0ba59fc470cd0f7+162
+filter qr1hi-8i9sb-w5k40fztqgg9i2x 735ac35adf430126cf836547731f3af6+56
+</code></pre>
+</notextile>
+
+This instantiates your pipeline and displays a live feed of its status. The new pipeline instance will also show up on the Workbench %(rarr)→% Compute %(rarr)→% Pipeline instances page.
+
+Arvados adds each pipeline component to the job queue as its dependencies are satisfied (or immediately if it has no dependencies) and finishes when all components are completed or failed and there is no more work left to do.
+
+The Keep locators of the output of each of @"do_hash"@ and @"filter"@ component are available from the output log shown above. The output is also available on the Workbench by navigating to %(rarr)→% Compute %(rarr)→% Pipeline instances %(rarr)→% pipeline uuid under the *id* column %(rarr)→% components.
+
+<notextile>
+<pre><code>$ <span class="userinput">arv keep get e2ccd204bca37c77c0ba59fc470cd0f7+162/md5sum.txt</span>
+0f1d6bcf55c34bed7f92a805d2d89bbf alice.txt
+504938460ef369cd275e4ef58994cffe bob.txt
+8f3b36aff310e06f3c5b9e95678ff77a carol.txt
+$ <span class="userinput">arv keep get 735ac35adf430126cf836547731f3af6+56</span>
+0f1d6bcf55c34bed7f92a805d2d89bbf alice.txt
+</code></pre>
+</notextile>
+
+Indeed, the filter has picked out just the "alice" file as having a hash that starts with 0.
+
+h3. Running a pipeline with different parameters
+
+Notice that the pipeline definition explicitly specifies the Keep locator for the input:
+
+<notextile>
+<pre><code>...
+ "do_hash":{
+ "script_parameters":{
+ "input": "887cd41e9c613463eab2f0d885c6dd96+83"
+ },
+ }
+...
+</code></pre>
+</notextile>
+
+What if we want to run the pipeline on a different input block? One option is to define a new pipeline template, but would potentially result in clutter with many pipeline templates defined for one-off jobs. Instead, you can override values in the input of the component like this:
+
+<notextile>
+<pre><code>$ <span class="userinput">arv pipeline run --template qr1hi-d1hrv-vxzkp38nlde9yyr do_hash::input=33a9f3842b01ea3fdf27cc582f5ea2af+242</span>
+2013-12-17 20:31:24 +0000 -- pipeline_instance qr1hi-d1hrv-tlkq20687akys8e
+do_hash qr1hi-8i9sb-rffhuay4jryl2n2 queued 2013-12-17T20:31:24Z
+filter - -
+2013-12-17 20:31:34 +0000 -- pipeline_instance qr1hi-d1hrv-tlkq20687akys8e
+do_hash qr1hi-8i9sb-rffhuay4jryl2n2 {:done=>1, :running=>1, :failed=>0, :todo=>0}
+filter - -
+2013-12-17 20:31:44 +0000 -- pipeline_instance qr1hi-d1hrv-tlkq20687akys8e
+do_hash qr1hi-8i9sb-rffhuay4jryl2n2 {:done=>1, :running=>1, :failed=>0, :todo=>0}
+filter - -
+2013-12-17 20:31:55 +0000 -- pipeline_instance qr1hi-d1hrv-tlkq20687akys8e
+do_hash qr1hi-8i9sb-rffhuay4jryl2n2 880b55fb4470b148a447ff38cacdd952+54
+filter qr1hi-8i9sb-j347g1sqovdh0op queued 2013-12-17T20:31:55Z
+2013-12-17 20:32:05 +0000 -- pipeline_instance qr1hi-d1hrv-tlkq20687akys8e
+do_hash qr1hi-8i9sb-rffhuay4jryl2n2 880b55fb4470b148a447ff38cacdd952+54
+filter qr1hi-8i9sb-j347g1sqovdh0op fb728f0ffe152058fa64b9aeed344cb5+54
+</code></pre>
+</notextile>
+
+Now check the output:
+
+<notextile>
+<pre><code>$ <span class="userinput">arv keep ls -s fb728f0ffe152058fa64b9aeed344cb5+54</span>
+0 0-filter.txt
+</code></pre>
+</notextile>
+
+Here the filter script output is empty, so none of the files in the collection have hash code that start with 0.
diff --git a/doc/user/topics/tutorial-gatk-variantfiltration.html.textile.liquid b/doc/user/topics/tutorial-gatk-variantfiltration.html.textile.liquid
index d01703e..4e9a1eb 100644
--- a/doc/user/topics/tutorial-gatk-variantfiltration.html.textile.liquid
+++ b/doc/user/topics/tutorial-gatk-variantfiltration.html.textile.liquid
@@ -4,8 +4,6 @@ navsection: userguide
title: "Using GATK with Arvados"
...
-h1. Using GATK with Arvados
-
This tutorial demonstrates how to use the Genome Analysis Toolkit (GATK) with Arvados. In this example we will install GATK and then create a VariantFiltration job to assign pass/fail scores to variants in a VCF file.
*This tutorial assumes that you are "logged into an Arvados VM instance":{{site.baseurl}}/user/getting_started/ssh-access.html#login, and have a "working environment.":{{site.baseurl}}/user/getting_started/check-environment.html*
diff --git a/doc/user/topics/tutorial-job-debug.html.textile.liquid b/doc/user/topics/tutorial-job-debug.html.textile.liquid
index 6d0a6e1..f07a7da 100644
--- a/doc/user/topics/tutorial-job-debug.html.textile.liquid
+++ b/doc/user/topics/tutorial-job-debug.html.textile.liquid
@@ -1,13 +1,9 @@
---
layout: default
navsection: userguide
-navmenu: Tutorials
title: "Debugging a Crunch script"
-
...
-h1. Debugging a Crunch script
-
To test changes to a script by running a job, the change must be pushed into @git@, the job queued asynchronously, and the actual execution may be run on any compute server. As a result, debugging a script can be difficult and time consuming. This tutorial demonstrates using @arv-crunch-job@ to run your job in your local VM. This avoids the job queue and allows you to execute the script from your uncomitted git tree.
*This tutorial assumes that you are "logged into an Arvados VM instance":{{site.baseurl}}/user/getting_started/ssh-access.html#login, and have a "working environment.":{{site.baseurl}}/user/getting_started/check-environment.html*
diff --git a/doc/user/tutorials/tutorial-job1.html.textile.liquid b/doc/user/topics/tutorial-job1.html.textile.liquid
similarity index 99%
rename from doc/user/tutorials/tutorial-job1.html.textile.liquid
rename to doc/user/topics/tutorial-job1.html.textile.liquid
index 463f86d..796f684 100644
--- a/doc/user/tutorials/tutorial-job1.html.textile.liquid
+++ b/doc/user/topics/tutorial-job1.html.textile.liquid
@@ -1,12 +1,9 @@
---
layout: default
navsection: userguide
-navmenu: Tutorials
title: "Running a Crunch job on the command line"
...
-h1. Running a Crunch job on the command line
-
This tutorial introduces the concepts and use of the Crunch job system using the @arv@ command line tool and Arvados Workbench.
*This tutorial assumes that you are "logged into an Arvados VM instance":{{site.baseurl}}/user/getting_started/ssh-access.html#login, and have a "working environment.":{{site.baseurl}}/user/getting_started/check-environment.html*
diff --git a/doc/user/topics/tutorial-parallel.html.textile.liquid b/doc/user/topics/tutorial-parallel.html.textile.liquid
index 23b7cfb..2e4cf78 100644
--- a/doc/user/topics/tutorial-parallel.html.textile.liquid
+++ b/doc/user/topics/tutorial-parallel.html.textile.liquid
@@ -1,13 +1,9 @@
---
layout: default
navsection: userguide
-navmenu: Tutorials
title: "Parallel Crunch tasks"
-
...
-h1. Parallel Crunch tasks
-
In the tutorial "writing a crunch script,":tutorial-firstscript.html our script used a "for" loop to compute the md5 hashes for each file in sequence. This approach, while simple, is not able to take advantage of the compute cluster with multiple nodes and cores to speed up computation by running tasks in parallel. This tutorial will demonstrate how to create parallel Crunch tasks.
Start by entering the @crunch_scripts@ directory of your git repository:
diff --git a/doc/user/tutorials/tutorial-trait-search.html.textile.liquid b/doc/user/topics/tutorial-trait-search.html.textile.liquid
similarity index 99%
rename from doc/user/tutorials/tutorial-trait-search.html.textile.liquid
rename to doc/user/topics/tutorial-trait-search.html.textile.liquid
index 6402c7e..001fbbc 100644
--- a/doc/user/tutorials/tutorial-trait-search.html.textile.liquid
+++ b/doc/user/topics/tutorial-trait-search.html.textile.liquid
@@ -1,13 +1,9 @@
---
layout: default
navsection: userguide
-navmenu: Tutorials
title: "Querying the Metadata Database"
-
...
-h1. Querying the Metadata Database
-
This tutorial introduces the Arvados Metadata Database. The Metadata Database stores information about files in Keep. This example will use the Python SDK to find public WGS (Whole Genome Sequencing) data for people who have reported a certain medical condition.
*This tutorial assumes that you are "logged into an Arvados VM instance":{{site.baseurl}}/user/getting_started/ssh-access.html#login, and have a "working environment.":{{site.baseurl}}/user/getting_started/check-environment.html*
diff --git a/doc/user/topics/running-external-program.html.textile.liquid b/doc/user/tutorials/running-external-program.html.textile.liquid
similarity index 68%
rename from doc/user/topics/running-external-program.html.textile.liquid
rename to doc/user/tutorials/running-external-program.html.textile.liquid
index be84962..be257f8 100644
--- a/doc/user/topics/running-external-program.html.textile.liquid
+++ b/doc/user/tutorials/running-external-program.html.textile.liquid
@@ -1,13 +1,9 @@
---
layout: default
navsection: userguide
-navmenu: Tutorials
-title: "Running external programs"
-
+title: "Using Crunch to run external programs"
...
-h1. Running external programs
-
This tutorial demonstrates how to use Crunch to run an external program by writting a wrapper using the Python SDK.
*This tutorial assumes that you are "logged into an Arvados VM instance":{{site.baseurl}}/user/getting_started/ssh-access.html#login, and have a "working environment.":{{site.baseurl}}/user/getting_started/check-environment.html*
@@ -45,27 +41,29 @@ Next, add the file to @git@ staging, commit and push:
You should now be able to run your new script using Crunch, with "script" referring to our new "run-md5sum.py" script.
<notextile>
-<pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">cat >~/the_job <<EOF
-{
- "script": "run-md5sum.py",
- "script_version": "you:master",
- "script_parameters":
- {
- "input": "c1bad4b39ca5a924e481008009d94e32+210"
- }
-}
-EOF</span>
-~/<b>you</b>/crunch_scripts$ <span class="userinput">arv job create --job "$(cat the_job)"</span>
-{
- ...
- "uuid":"qr1hi-xxxxx-xxxxxxxxxxxxxxx"
- ...
-}
-~/<b>you</b>/crunch_scripts$ <span class="userinput">arv job get --uuid qr1hi-xxxxx-xxxxxxxxxxxxxxx</span>
+<pre><code>$ <span class="userinput">cat >the_pipeline <<EOF
{
- ...
- "output":"4d164b1658c261b9afc6b479130016a3+54",
- ...
+ "name":"Run external md5sum program",
+ "components":{
+ "do_hash":{
+ "script":"run-md5sum.py",
+ "script_parameters":{
+ "input":{
+ "required": true,
+ "dataclass": "Collection"
+ }
+ },
+ "script_version":"you:master"
+ }
+ }
}
+EOF
+</span></code></pre>
+</notextile>
+
+<notextile>
+<pre><code>$ <span class="userinput">arv pipeline_template create --pipeline-template "$(cat the_pipeline)"</span>
</code></pre>
</notextile>
+
+Your new pipeline template will appear on the "Workbench %(rarr)→% Compute %(rarr)→% Pipeline templates":http://{{ site.arvados_workbench_host }}/pipeline_instances page. You can run the "pipeline using workbench":tutorial-pipeline-workbench.html
diff --git a/doc/user/tutorials/tutorial-firstscript.html.textile.liquid b/doc/user/tutorials/tutorial-firstscript.html.textile.liquid
index fb70c76..0582d53 100644
--- a/doc/user/tutorials/tutorial-firstscript.html.textile.liquid
+++ b/doc/user/tutorials/tutorial-firstscript.html.textile.liquid
@@ -2,11 +2,9 @@
layout: default
navsection: userguide
navmenu: Tutorials
-title: "Writing a Crunch script"
+title: "Writing a pipeline"
...
-h1. Writing a Crunch script
-
In this tutorial, we will write the "hash" script demonstrated in the first tutorial.
*This tutorial assumes that you are "logged into an Arvados VM instance":{{site.baseurl}}/user/getting_started/ssh-access.html#login, and have a "working environment.":{{site.baseurl}}/user/getting_started/check-environment.html*
@@ -95,36 +93,46 @@ To git at git.qr1hi.arvadosapi.com:you.git
* [new branch] master -> master</code></pre>
</notextile>
-You should now be able to run your script using Crunch, as described in "running a crunch job on the command line.":tutorial-job1.html The field @"script_version"@ should be <notextile><code><b>you</b>:master</code></notextile> to tell Crunch to run the script at the head of the "master" git branch, which you just pushed to the repository.
+h2. Create a pipeline template
+
+Next, create a file that contains the pipeline definition:
<notextile>
-<pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">cat >~/the_job <<EOF
-{
- "script": "hash.py",
- "script_version": "<b>you</b>:master",
- "script_parameters":
- {
- "input": "c1bad4b39ca5a924e481008009d94e32+210"
- }
-}
-EOF</span>
-~/<b>you</b>/crunch_scripts$ <span class="userinput">arv job create --job "$(cat ~/the_job)"</span>
-{
- ...
- "uuid":"qr1hi-xxxxx-xxxxxxxxxxxxxxx"
- ...
-}
-~/<b>you</b>/crunch_scripts$ <span class="userinput">arv job get --uuid qr1hi-xxxxx-xxxxxxxxxxxxxxx</span>
+<pre><code>$ <span class="userinput">cat >the_pipeline <<EOF
{
- ...
- "output":"880b55fb4470b148a447ff38cacdd952+54",
- ...
+ "name":"My first pipeline",
+ "components":{
+ "do_hash":{
+ "script":"hash.py",
+ "script_parameters":{
+ "input":{
+ "required": true,
+ "dataclass": "Collection"
+ }
+ },
+ "script_version":"you:master"
+ }
+ }
}
-~/<b>you</b>/crunch_scripts$ <span class="userinput">arv keep get 880b55fb4470b148a447ff38cacdd952+54/md5sum.txt</span>
-44b8ae3fde7a8a88d2f7ebd237625b4f var-GS000016015-ASM.tsv.bz2
-</code></pre>
+EOF
+</span></code></pre>
</notextile>
-<hr>
+* @cat@ is a standard Unix utility that simply copies standard input to standard output
+* @<<EOF@ tells the shell to direct the following lines into the standard input for @cat@ up until it sees the line @EOF@
+* @>the_job@ redirects standard output to a file called @the_job@
+* @"name"@ is a human-readable name for the pipeline
+* @"components"@ is a set of scripts that make up the pipeline
+* The component is listed with a human-readable name (@"do_hash"@ in this example)
+* @"script"@ specifies the name of the script to run. The script is searched for in the "crunch_scripts/" subdirectory of the @git@ checkout specified by @"script_version"@.
+* @"script_version"@ specifies the version of the script that you wish to run. This can be in the form of an explicit @git@ revision hash, or in the form "repository:branch" (in which case it will take the HEAD of the specified branch). Arvados logs the script version that was used in the run, enabling you to go back and re-run any past job with the guarantee that the exact same code will be used as was used in the previous run. You can access a list of available @git@ repositories on the Arvados workbench under "Compute %(rarr)→% Code repositories":http://{{site.arvados_workbench_host}}/repositiories .
+* @"script_parameters"@ describes the parameters for the script. In this example, there is one parameter called @input@ which is @required@ and is a @Collection at .
+
+Now, use @arv pipeline_template create@ tell Arvados about your pipeline template:
+
+<notextile>
+<pre><code>$ <span class="userinput">arv pipeline_template create --pipeline-template "$(cat the_pipeline)"</span>
+</code></pre>
+</notextile>
-Next, "writing a crunch pipeline.":tutorial-new-pipeline.html
+Your new pipeline template will appear on the "Workbench %(rarr)→% Compute %(rarr)→% Pipeline templates":http://{{ site.arvados_workbench_host }}/pipeline_instances page. You can run the "pipeline using workbench":tutorial-pipeline-workbench.html
diff --git a/doc/user/tutorials/tutorial-keep.html.textile.liquid b/doc/user/tutorials/tutorial-keep.html.textile.liquid
index f736bd0..9fbdb2a 100644
--- a/doc/user/tutorials/tutorial-keep.html.textile.liquid
+++ b/doc/user/tutorials/tutorial-keep.html.textile.liquid
@@ -1,12 +1,9 @@
---
layout: default
navsection: userguide
-navmenu: Tutorials
title: "Storing and Retrieving data using Keep"
...
-h1. Storing and Retrieving data using Keep
-
This tutorial introduces you to the Arvados file storage system.
@@ -58,7 +55,7 @@ c1bad4b39ca5a924e481008009d94e32+210
The output value @c1bad4b39ca5a924e481008009d94e32+210@ from @arv keep put@ is the Keep locator. This enables you to access the file you just uploaded, and is explained in the next section.
-h2. Putting a directory
+h2(#dir). Putting a directory
You can also use @arv keep put@ to add an entire directory:
diff --git a/doc/user/tutorials/tutorial-new-pipeline.html.textile.liquid b/doc/user/tutorials/tutorial-new-pipeline.html.textile.liquid
index c881617..fe849a5 100644
--- a/doc/user/tutorials/tutorial-new-pipeline.html.textile.liquid
+++ b/doc/user/tutorials/tutorial-new-pipeline.html.textile.liquid
@@ -1,19 +1,16 @@
---
layout: default
navsection: userguide
-navmenu: Tutorials
-title: "Writing a Crunch pipeline"
+title: "Writing a multi-step pipeline"
...
-h1. Writing a Crunch pipeline
-
A pipeline in Arvados is a collection of crunch scripts, in which the output from one script may be used as the input to another script.
*This tutorial assumes that you are "logged into an Arvados VM instance":{{site.baseurl}}/user/getting_started/ssh-access.html#login, and have a "working environment.":{{site.baseurl}}/user/getting_started/check-environment.html*
h2. Create a new script
-Our second script will filter the output of @parallel_hash.py@ and only include hashes that start with 0. Create a new script in @crunch_scripts/@ called @0-filter.py@:
+Our second script will filter the output of @hash.py@ and only include hashes that start with 0. Create a new script in @crunch_scripts/@ called @0-filter.py@:
<notextile> {% code '0_filter_py' as python %} </notextile>
@@ -33,12 +30,15 @@ Next, create a file that contains the pipeline definition:
<notextile>
<pre><code>$ <span class="userinput">cat >the_pipeline <<EOF
{
- "name":"my_first_pipeline",
+ "name":"Filter md5 hash values",
"components":{
"do_hash":{
- "script":"parallel-hash.py",
+ "script":"hash.py",
"script_parameters":{
- "input": "887cd41e9c613463eab2f0d885c6dd96+83"
+ "input":{
+ "required": true,
+ "dataclass": "Collection"
+ }
},
"script_version":"you:master"
},
@@ -54,104 +54,17 @@ Next, create a file that contains the pipeline definition:
}
}
EOF
-</code></pre>
+</span></code></pre>
</notextile>
-* @"name"@ is a human-readable name for the pipeline
-* @"components"@ is a set of scripts that make up the pipeline
-* Each component is listed with a human-readable name (@"do_hash"@ and @"filter"@ in this example)
-* Each item in @"components"@ is a single Arvados job, and uses the same format that we saw previously with @arv job create@
-* @"output_of"@ indicates that the @"input"@ of @"filter"@ is the @"output"@ of the @"do_hash"@ component. This is a _dependency_. Arvados uses the dependencies between jobs to automatically determine the correct order to run the jobs.
+* @"output_of"@ indicates that the @input@ of the @do_hash@ component is connected to the @output@ of @filter at . This is a _dependency_. Arvados uses the dependencies between jobs to automatically determine the correct order to run the jobs.
Now, use @arv pipeline_template create@ tell Arvados about your pipeline template:
<notextile>
<pre><code>$ <span class="userinput">arv pipeline_template create --pipeline-template "$(cat the_pipeline)"</span>
-qr1hi-p5p6p-xxxxxxxxxxxxxxx
-</code></pre>
-</notextile>
-
-Your new pipeline template will appear on the Workbench %(rarr)→% Compute %(rarr)→% Pipeline templates page.
-
-h3. Running a pipeline
-
-Run the pipeline using @arv pipeline run@, using the UUID that you received from @arv pipeline create@:
-
-<notextile>
-<pre><code>$ <span class="userinput">arv pipeline run --template qr1hi-p5p6p-xxxxxxxxxxxxxxx</span>
-2013-12-16 14:08:40 +0000 -- pipeline_instance qr1hi-d1hrv-vxzkp38nlde9yyr
-do_hash qr1hi-8i9sb-hoyc2u964ecv1s6 queued 2013-12-16T14:08:40Z
-filter - -
-2013-12-16 14:08:51 +0000 -- pipeline_instance qr1hi-d1hrv-vxzkp38nlde9yyr
-do_hash qr1hi-8i9sb-hoyc2u964ecv1s6 e2ccd204bca37c77c0ba59fc470cd0f7+162
-filter qr1hi-8i9sb-w5k40fztqgg9i2x queued 2013-12-16T14:08:50Z
-2013-12-16 14:09:01 +0000 -- pipeline_instance qr1hi-d1hrv-vxzkp38nlde9yyr
-do_hash qr1hi-8i9sb-hoyc2u964ecv1s6 e2ccd204bca37c77c0ba59fc470cd0f7+162
-filter qr1hi-8i9sb-w5k40fztqgg9i2x 735ac35adf430126cf836547731f3af6+56
-</code></pre>
-</notextile>
-
-This instantiates your pipeline and displays a live feed of its status. The new pipeline instance will also show up on the Workbench %(rarr)→% Compute %(rarr)→% Pipeline instances page.
-
-Arvados adds each pipeline component to the job queue as its dependencies are satisfied (or immediately if it has no dependencies) and finishes when all components are completed or failed and there is no more work left to do.
-
-The Keep locators of the output of each of @"do_hash"@ and @"filter"@ component are available from the output log shown above. The output is also available on the Workbench by navigating to %(rarr)→% Compute %(rarr)→% Pipeline instances %(rarr)→% pipeline uuid under the *id* column %(rarr)→% components.
-
-<notextile>
-<pre><code>$ <span class="userinput">arv keep get e2ccd204bca37c77c0ba59fc470cd0f7+162/md5sum.txt</span>
-0f1d6bcf55c34bed7f92a805d2d89bbf alice.txt
-504938460ef369cd275e4ef58994cffe bob.txt
-8f3b36aff310e06f3c5b9e95678ff77a carol.txt
-$ <span class="userinput">arv keep get 735ac35adf430126cf836547731f3af6+56</span>
-0f1d6bcf55c34bed7f92a805d2d89bbf alice.txt
</code></pre>
</notextile>
-Indeed, the filter has picked out just the "alice" file as having a hash that starts with 0.
-
-h3. Running a pipeline with different parameters
-
-Notice that the pipeline definition explicitly specifies the Keep locator for the input:
-
-<notextile>
-<pre><code>...
- "do_hash":{
- "script_parameters":{
- "input": "887cd41e9c613463eab2f0d885c6dd96+83"
- },
- }
-...
-</code></pre>
-</notextile>
-
-What if we want to run the pipeline on a different input block? One option is to define a new pipeline template, but would potentially result in clutter with many pipeline templates defined for one-off jobs. Instead, you can override values in the input of the component like this:
-
-<notextile>
-<pre><code>$ <span class="userinput">arv pipeline run --template qr1hi-d1hrv-vxzkp38nlde9yyr do_hash::input=33a9f3842b01ea3fdf27cc582f5ea2af+242</span>
-2013-12-17 20:31:24 +0000 -- pipeline_instance qr1hi-d1hrv-tlkq20687akys8e
-do_hash qr1hi-8i9sb-rffhuay4jryl2n2 queued 2013-12-17T20:31:24Z
-filter - -
-2013-12-17 20:31:34 +0000 -- pipeline_instance qr1hi-d1hrv-tlkq20687akys8e
-do_hash qr1hi-8i9sb-rffhuay4jryl2n2 {:done=>1, :running=>1, :failed=>0, :todo=>0}
-filter - -
-2013-12-17 20:31:44 +0000 -- pipeline_instance qr1hi-d1hrv-tlkq20687akys8e
-do_hash qr1hi-8i9sb-rffhuay4jryl2n2 {:done=>1, :running=>1, :failed=>0, :todo=>0}
-filter - -
-2013-12-17 20:31:55 +0000 -- pipeline_instance qr1hi-d1hrv-tlkq20687akys8e
-do_hash qr1hi-8i9sb-rffhuay4jryl2n2 880b55fb4470b148a447ff38cacdd952+54
-filter qr1hi-8i9sb-j347g1sqovdh0op queued 2013-12-17T20:31:55Z
-2013-12-17 20:32:05 +0000 -- pipeline_instance qr1hi-d1hrv-tlkq20687akys8e
-do_hash qr1hi-8i9sb-rffhuay4jryl2n2 880b55fb4470b148a447ff38cacdd952+54
-filter qr1hi-8i9sb-j347g1sqovdh0op fb728f0ffe152058fa64b9aeed344cb5+54
-</code></pre>
-</notextile>
-
-Now check the output:
-
-<notextile>
-<pre><code>$ <span class="userinput">arv keep ls -s fb728f0ffe152058fa64b9aeed344cb5+54</span>
-0 0-filter.txt
-</code></pre>
-</notextile>
+Your new pipeline template will appear on the "Workbench %(rarr)→% Compute %(rarr)→% Pipeline templates":http://{{ site.arvados_workbench_host }}/pipeline_instances page.
-Here the filter script output is empty, so none of the files in the collection have hash code that start with 0.
diff --git a/doc/user/tutorials/tutorial-pipeline-workbench.html.textile.liquid b/doc/user/tutorials/tutorial-pipeline-workbench.html.textile.liquid
index da968c2..fbe6786 100644
--- a/doc/user/tutorials/tutorial-pipeline-workbench.html.textile.liquid
+++ b/doc/user/tutorials/tutorial-pipeline-workbench.html.textile.liquid
@@ -1,8 +1,15 @@
---
layout: default
navsection: userguide
-navmenu: Tutorials
title: "Running a pipeline using Workbench"
...
-h1. Running a pipeline using Workbench
+# Go to "Collections":http://{{ site.arvados_workbench_host }}/collections
+# Go to the search box <span class="glyphicon glyphicon-search"></span> and search for "tutorial".
+# This should yield a collection with the contents "var-GS000016015-ASM.tsv.bz2"
+# Click on the check box to the left of "var-GS000016015-ASM.tsv.bz2". This puts the collection in your persistent selection list. Click on the paperclip <span class="glyphicon glyphicon-paperclip"></span> in the upper right to see your selections.
+# Click on "Pipeline templates":http://{{ site.arvados_workbench_host }}/pipeline_templates
+# Look for a pipeline named "Tutorial pipeline"
+# Click on the play button <span class="glyphicon glyphicon-play"></span> on the left. This will take you to the new pipeline page.
+# Next to "input" click on "none" to set the input value. At the top of the list will be the collection that you selected in step 4.
+# You can now click on "Run pipeline" in the upper right to start the pipeline.
-----------------------------------------------------------------------
hooks/post-receive
--
More information about the arvados-commits
mailing list