[ARVADOS] updated: 491f4f3023d0f45be94e5f7091da85094e887212

git at public.curoverse.com git at public.curoverse.com
Mon Mar 10 17:00:27 EDT 2014


Summary of changes:
 doc/_config.yml                                    |   11 ++-
 doc/_includes/_tutorial_hash_script_py.liquid      |   71 +++++----------
 .../getting_started/ssh-access.html.textile.liquid |    2 +-
 .../getting_started/workbench.html.textile.liquid  |    2 +-
 doc/user/index.html.textile.liquid                 |    5 +-
 doc/user/reference/api-tokens.html.textile.liquid  |    2 +-
 doc/user/topics/keep.html.textile.liquid           |   40 ++++++++
 .../running-external-program.html.textile.liquid   |    0
 ...rial-gatk-variantfiltration.html.textile.liquid |    2 -
 .../tutorial-job-debug.html.textile.liquid         |    1 -
 .../tutorial-parallel.html.textile.liquid          |    0
 .../tutorial-firstscript.html.textile.liquid       |   13 ++--
 .../tutorials/tutorial-job1.html.textile.liquid    |   21 +++--
 .../tutorials/tutorial-keep.html.textile.liquid    |   94 +++++++++++++-------
 .../tutorial-new-pipeline.html.textile.liquid      |    5 +-
 ...tutorial-pipeline-workbench.html.textile.liquid |    8 ++
 16 files changed, 170 insertions(+), 107 deletions(-)
 create mode 100644 doc/user/topics/keep.html.textile.liquid
 rename doc/user/{tutorials => topics}/running-external-program.html.textile.liquid (100%)
 rename doc/user/{tutorials => topics}/tutorial-gatk-variantfiltration.html.textile.liquid (99%)
 rename doc/user/{tutorials => topics}/tutorial-job-debug.html.textile.liquid (99%)
 rename doc/user/{tutorials => topics}/tutorial-parallel.html.textile.liquid (100%)
 create mode 100644 doc/user/tutorials/tutorial-pipeline-workbench.html.textile.liquid

       via  491f4f3023d0f45be94e5f7091da85094e887212 (commit)
       via  7791c7e1b09341ce1fed131c6b11c91da8217c3f (commit)
      from  cc414357e4eda3ed2060307f5b821385ae3d74bb (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.


commit 491f4f3023d0f45be94e5f7091da85094e887212
Author: Peter Amstutz <peter.amstutz at curoverse.com>
Date:   Mon Mar 10 17:00:49 2014 -0400

    Removed out of order link to next tutorial.

diff --git a/doc/user/topics/tutorial-job-debug.html.textile.liquid b/doc/user/topics/tutorial-job-debug.html.textile.liquid
index 2805208..6d0a6e1 100644
--- a/doc/user/topics/tutorial-job-debug.html.textile.liquid
+++ b/doc/user/topics/tutorial-job-debug.html.textile.liquid
@@ -153,4 +153,3 @@ Read and write data to @/tmp/@ instead of Keep. This only works with the Python
 
 notextile. <pre><code>~$ <span class="userinput">export KEEP_LOCAL_STORE=/tmp</span></code></pre>
 
-Next, "parallel tasks.":tutorial-parallel.html

commit 7791c7e1b09341ce1fed131c6b11c91da8217c3f
Author: Peter Amstutz <peter.amstutz at curoverse.com>
Date:   Mon Mar 10 17:00:20 2014 -0400

    Updating and reorganizing tutorials based on new features and feedback.

diff --git a/doc/_config.yml b/doc/_config.yml
index 784ac61..18b37ea 100644
--- a/doc/_config.yml
+++ b/doc/_config.yml
@@ -22,14 +22,17 @@ navbar:
       - user/getting_started/community.html.textile.liquid
     - Tutorials:
       - user/tutorials/tutorial-keep.html.textile.liquid
+      - user/tutorials/tutorial-pipeline-workbench.html.textile.liquid
       - user/tutorials/tutorial-job1.html.textile.liquid
       - user/tutorials/tutorial-firstscript.html.textile.liquid
-      - user/tutorials/tutorial-job-debug.html.textile.liquid
-      - user/tutorials/tutorial-parallel.html.textile.liquid
       - user/tutorials/tutorial-new-pipeline.html.textile.liquid
       - user/tutorials/tutorial-trait-search.html.textile.liquid
-      - user/tutorials/tutorial-gatk-variantfiltration.html.textile.liquid
-      - user/tutorials/running-external-program.html.textile.liquid
+    - Intermediate topics:
+      - user/topics/tutorial-job-debug.html.textile.liquid
+      - user/topics/running-external-program.html.textile.liquid
+      - user/topics/tutorial-parallel.html.textile.liquid
+      - user/topics/tutorial-gatk-variantfiltration.html.textile.liquid
+      - user/topics/keep.html.textile.liquid
     - Examples:
       - user/examples/crunch-examples.html.textile.liquid
     - Reference:
diff --git a/doc/_includes/_tutorial_hash_script_py.liquid b/doc/_includes/_tutorial_hash_script_py.liquid
index f9b2ec0..6462aab 100644
--- a/doc/_includes/_tutorial_hash_script_py.liquid
+++ b/doc/_includes/_tutorial_hash_script_py.liquid
@@ -1,63 +1,42 @@
 #!/usr/bin/env python
 
-# Import the hashlib module (part of the Python standard library) to compute md5.
-import hashlib
+import hashlib      # Import the hashlib module to compute md5.
+import arvados      # Import the Arvados sdk module
 
-# Import the Arvados sdk module
-import arvados
+# Automatically parallelize this job by running one task per file.
+# This means that if the input consists of many files, each file will
+# be processed in parallel on different nodes enabling the job to 
+# be completed quicker.
+arvados.job_setup.one_task_per_input_file(if_sequence=0, and_end_task=True, 
+                                          input_as_path=True)
 
-# Get information about the task from the environment
-this_task = arvados.current_task()
-
-# Get the "input" field from "script_parameters" on the job creation object
-this_job_input = arvados.getjobparam('input')
-
-# Create the object access to the collection referred to in the input
-collection = arvados.CollectionReader(this_job_input)
-
-# Create an object to write a new collection as output
-out = arvados.CollectionWriter()
-
-# Set the name of output file within the collection
-out.set_current_file_name("md5sum.txt")
-
-# Get an iterator over the files listed in the collection
-all_files = collection.all_files()
-
-# Iterate over each file
-for input_file in all_files:
-    # Create the object that will actually compute the md5 hash
-    digestor = hashlib.new('md5')
+# Create the object that will actually compute the md5 hash
+digestor = hashlib.new('md5')
 
+# Get the input file for the task and open it for reading
+with open(arvados.get_task_param_mount('input')) as f:
     while True:
-        # read a 1 megabyte block from the file
-        buf = input_file.read(2**20)
-
-        # break when there is no more data left
-        if len(buf) == 0:
+        buf = f.read(2**20)      # read a 1 megabyte block from the file
+        if len(buf) == 0:        # break when there is no more data left
             break
+        digestor.update(buf)     # update the md5 hash object
 
-        # update the md5 hash object
-        digestor.update(buf)
-
-    # Get the final hash code
-    hexdigest = digestor.hexdigest()
+# Get object representing the current task
+this_task = arvados.current_task()
 
-    # Get the file name from the StreamFileReader object
-    file_name = input_file.name()
+ # Write a new collection as output
+out = arvados.CollectionWriter()
 
-    # The "stream name" is the subdirectory inside the collection in which
-    # the file is located; '.' is the root of the collection.
-    if input_file.stream_name() != '.':
-        file_name = os.join(input_file.stream_name(), file_name)
+ # Set output file within the collection
+out.set_current_file_name("md5sum.txt")
 
-    # Write an output line with the md5 value and file name.
-    out.write("%s %s\n" % (hexdigest, file_name))
+# Write an output line with the md5 value and input
+out.write("%s %s\n" % (digestor.hexdigest(), this_task['parameters']['input']))
 
-# Commit the output to keep.  This returns a Keep id.
+ # Commit the output to keep.  This returns a Keep id.
 output_id = out.finish()
 
 # Set the output for this task to the Keep id
-this_task.set_output(output_id)
+this_task.set_output(output_id) 
 
 # Done!
diff --git a/doc/user/getting_started/ssh-access.html.textile.liquid b/doc/user/getting_started/ssh-access.html.textile.liquid
index 3c40315..d08ad87 100644
--- a/doc/user/getting_started/ssh-access.html.textile.liquid
+++ b/doc/user/getting_started/ssh-access.html.textile.liquid
@@ -133,7 +133,7 @@ h1(#workbench). Adding your key to Arvados Workbench
 
 h3. From the workbench dashboard
 
-If you have no @ssh@ keys registered, there should be a notification asking you to provide your @ssh@ public key.  On the Workbench dashboard (in this guide, this is "https://workbench.{{ site.arvados_api_host }}/":https://workbench.{{ site.arvados_api_host }}/ ), look for the envelope icon <span class="glyphicon glyphicon-envelope"></span> <span class="badge badge-alert">1</span> in upper right corner (the number indicates there are new notifications).  Click on this icon and a dropdown menu should appear with a message asking you to add your public key.  Paste your public key into the text area provided and click on the check button to submit the key.  You are now ready to "log into an Arvados VM":#login.
+If you have no @ssh@ keys registered, there should be a notification asking you to provide your @ssh@ public key.  On the Workbench dashboard (in this guide, this is "https://{{ site.arvados_workbench_host }}/":https://{{ site.arvados_workbench_host }}/ ), look for the envelope icon <span class="glyphicon glyphicon-envelope"></span> <span class="badge badge-alert">1</span> in upper right corner (the number indicates there are new notifications).  Click on this icon and a dropdown menu should appear with a message asking you to add your public key.  Paste your public key into the text area provided and click on the check button to submit the key.  You are now ready to "log into an Arvados VM":#login.
 
 h3. Alternate way to add ssh keys
 
diff --git a/doc/user/getting_started/workbench.html.textile.liquid b/doc/user/getting_started/workbench.html.textile.liquid
index 71041b3..0dbb151 100644
--- a/doc/user/getting_started/workbench.html.textile.liquid
+++ b/doc/user/getting_started/workbench.html.textile.liquid
@@ -9,7 +9,7 @@ h1. Accessing Arvados Workbench
 
 Access the Arvados beta test instance available using this link:
 
-"https://workbench.{{ site.arvados_api_host }}/":https://workbench.{{ site.arvados_api_host }}/
+"https://{{ site.arvados_workbench_host }}/":https://{{ site.arvados_workbench_host }}/
 
 If you are accessing Arvados for the first time, you will be asked to log in using a Google account.  Arvados uses only your name and email address from Google services for identification, and will never access any personal information.  Once you are logged in, the Workbench page may indicate your account status is *New / inactive*.  If this is the case, contact the administrator of the Arvados instance to activate your account.
 
diff --git a/doc/user/index.html.textile.liquid b/doc/user/index.html.textile.liquid
index 48d3f86..fd66764 100644
--- a/doc/user/index.html.textile.liquid
+++ b/doc/user/index.html.textile.liquid
@@ -2,12 +2,11 @@
 layout: default
 navsection: userguide
 title: Welcome to Arvados!
-
 ...
 
 h1. Welcome to Arvados!
 
-This guide is intended to introduce new users to the Arvados system.  It covers initial configuration required to use the system and then presents several tutorials on using Arvados to do data processing.
+This guide is intended to introduce new users to the Arvados system.  It covers initial configuration required to access the system and then presents several tutorials on using Arvados to do data processing.
 
 This user guide introduces how to use the major components of Arvados.  These are:
 
@@ -26,6 +25,8 @@ To get the most value out of this guide, you should be comfortable with the foll
 # Programming in @python@
 # Revision control using @git@
 
+We also recommend you read the "Arvados Platform Overview":https://arvados.org/projects/arvados/wiki#Platform-Overview for an introduction and background information about Arvados.
+
 The examples in this guide uses the Arvados instance located at "https://{{ site.arvados_workbench_host }}/":https://{{ site.arvados_workbench_host }}/ .  If you are using a different Arvados instance replace @{{ site.arvados_workbench_host }}@ with your private instance in all of the examples in this guide.
 
 The Arvados public beta instance is located at "https://workbench.qr1hi.arvadosapi.com/":https://workbench.qr1hi.arvadosapi.com/ .  You must have an account in order to use this service.  If you would like to request an account, please send an email to "arvados at curoverse.com":mailto:arvados at curoverse.com .
diff --git a/doc/user/reference/api-tokens.html.textile.liquid b/doc/user/reference/api-tokens.html.textile.liquid
index d47c1cc..d341c43 100644
--- a/doc/user/reference/api-tokens.html.textile.liquid
+++ b/doc/user/reference/api-tokens.html.textile.liquid
@@ -10,7 +10,7 @@ h1. Reference: Getting an API token
 
 The Arvados API token is a secret key that enables the @arv@ command line client to access Arvados with the proper permissions.
 
-Access the Arvados workbench using this link: "https://workbench.{{ site.arvados_api_host }}/":https://workbench.{{ site.arvados_api_host }}/
+Access the Arvados workbench using this link: "https://{{ site.arvados_workbench_host }}/":https://{{ site.arvados_workbench_host }}/
 
 (Replace @{{ site.arvados_api_host }}@ with the hostname of your local Arvados instance if necessary.)
 
diff --git a/doc/user/topics/keep.html.textile.liquid b/doc/user/topics/keep.html.textile.liquid
new file mode 100644
index 0000000..f7c5926
--- /dev/null
+++ b/doc/user/topics/keep.html.textile.liquid
@@ -0,0 +1,40 @@
+---
+layout: default
+navsection: userguide
+navmenu: Topics
+title: "How Keep works"
+...
+
+h1. Getting Data from Keep
+
+In Keep, information is stored in *data blocks*.  Data blocks are normally between 1 byte and 64 megabytes in size.  If a file exceeds the maximum size of a single data block, the file will be split across multiple data blocks until the entire file can be stored.  These data blocks may be stored and replicated across multiple disks, servers, or clusters.  Each data block has its own identifier for the contents of that specific data block.
+
+In order to reassemble the file, Keep stores a *collection* data block which lists in sequence the data blocks that make up the original file.  A collection data block may store the information for multiple files, including a directory structure.
+
+In this example we will use @c1bad4b39ca5a924e481008009d94e32+210@ which we added to keep in the previous section.  First let us examine the contents of this collection using @arv keep get@:
+
+<notextile>
+<pre><code>/scratch/<b>you</b>$ <span class="userinput">arv keep get c1bad4b39ca5a924e481008009d94e32+210</span>
+. 204e43b8a1185621ca55a94839582e6f+67108864 b9677abbac956bd3e86b1deb28dfac03+67108864 fc15aff2a762b13f521baf042140acec+67108864 323d2a3ce20370c4ca1d3462a344f8fd+25885655 0:227212247:var-GS000016015-ASM.tsv.bz2
+</code></pre>
+</notextile>
+
+The command @arv keep get@ fetches the contents of the locator @c1bad4b39ca5a924e481008009d94e32+210 at .  This is a locator for a collection data block, so it fetches the contents of the collection.  In this example, this collection consists of a single file @var-GS000016015-ASM.tsv.bz2@ which is 227212247 bytes long, and is stored using four sequential data blocks, <code>204e43b8a1185621ca55a94839582e6f+67108864</code>, <code>b9677abbac956bd3e86b1deb28dfac03+67108864</code>, <code>fc15aff2a762b13f521baf042140acec+67108864</code>, <code>323d2a3ce20370c4ca1d3462a344f8fd+25885655</code>.
+
+Let's use @arv keep get@ to download the first datablock:
+
+notextile. <pre><code>/scratch/<b>you</b>$ <span class="userinput">arv keep get 204e43b8a1185621ca55a94839582e6f+67108864 > block1</span></code></pre>
+
+Let's look at the size and compute the md5 hash of @block1@:
+
+<notextile>
+<pre><code>/scratch/<b>you</b>$ <span class="userinput">ls -l block1</span>
+-rw-r--r-- 1 you group 67108864 Dec  9 20:14 block1
+/scratch/<b>you</b>$ <span class="userinput">md5sum block1</span>
+204e43b8a1185621ca55a94839582e6f  block1
+</code></pre>
+</notextile>
+
+Notice that the block identifer <code>204e43b8a1185621ca55a94839582e6f+67108864</code> of:
+* the md5 hash @204e43b8a1185621ca55a94839582e6f@ which matches the md5 hash of @block1@
+* a size hint @67108864@ which matches the size of @block1@
diff --git a/doc/user/tutorials/running-external-program.html.textile.liquid b/doc/user/topics/running-external-program.html.textile.liquid
similarity index 100%
rename from doc/user/tutorials/running-external-program.html.textile.liquid
rename to doc/user/topics/running-external-program.html.textile.liquid
diff --git a/doc/user/tutorials/tutorial-gatk-variantfiltration.html.textile.liquid b/doc/user/topics/tutorial-gatk-variantfiltration.html.textile.liquid
similarity index 99%
rename from doc/user/tutorials/tutorial-gatk-variantfiltration.html.textile.liquid
rename to doc/user/topics/tutorial-gatk-variantfiltration.html.textile.liquid
index 3bf05a5..d01703e 100644
--- a/doc/user/tutorials/tutorial-gatk-variantfiltration.html.textile.liquid
+++ b/doc/user/topics/tutorial-gatk-variantfiltration.html.textile.liquid
@@ -1,9 +1,7 @@
 ---
 layout: default
 navsection: userguide
-navmenu: Tutorials
 title: "Using GATK with Arvados"
-
 ...
 
 h1. Using GATK with Arvados
diff --git a/doc/user/tutorials/tutorial-job-debug.html.textile.liquid b/doc/user/topics/tutorial-job-debug.html.textile.liquid
similarity index 100%
rename from doc/user/tutorials/tutorial-job-debug.html.textile.liquid
rename to doc/user/topics/tutorial-job-debug.html.textile.liquid
diff --git a/doc/user/tutorials/tutorial-parallel.html.textile.liquid b/doc/user/topics/tutorial-parallel.html.textile.liquid
similarity index 100%
rename from doc/user/tutorials/tutorial-parallel.html.textile.liquid
rename to doc/user/topics/tutorial-parallel.html.textile.liquid
diff --git a/doc/user/tutorials/tutorial-firstscript.html.textile.liquid b/doc/user/tutorials/tutorial-firstscript.html.textile.liquid
index 254d9db..fb70c76 100644
--- a/doc/user/tutorials/tutorial-firstscript.html.textile.liquid
+++ b/doc/user/tutorials/tutorial-firstscript.html.textile.liquid
@@ -3,7 +3,6 @@ layout: default
 navsection: userguide
 navmenu: Tutorials
 title: "Writing a Crunch script"
-
 ...
 
 h1. Writing a Crunch script
@@ -25,7 +24,7 @@ First, you should do some basic configuration for git (you only need to do this
 ~$ <span class="userinput">git config --global user.email <b>you</b>@example.com</span></code></pre>
 </notextile>
 
-On the Arvados Workbench, navigate to _Compute %(rarr)→% Code repositories._  You should see two repositories, one named "arvados" (under the *name* column) and a second with your user name.  Next to *name* is the column *push_url*.  Copy the *push_url* cell associated with your repository.  This should look like <notextile><code>git at git.{{ site.arvados_api_host }}:<b>you</b>.git</code></notextile>.
+On the Arvados Workbench, navigate to "Compute %(rarr)→% Code repositories":http://{{site.arvados_workbench_host}}/repositiories .  You should see a repository with your user name listed in the *name* column.  Next to *name* is the column *push_url*.  Copy the *push_url* value associated with your repository.  This should look like <notextile><code>git at git.{{ site.arvados_api_host }}:<b>you</b>.git</code></notextile>.
 
 Next, on the Arvados virtual machine, clone your git repository:
 
@@ -67,7 +66,7 @@ Make the file executable:
 notextile. <pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">chmod +x hash.py</span></code></pre>
 
 {% include 'notebox_begin' %}
-The below steps describe how to execute the script after committing changes to git. To test the script locally, please see the "debugging a crunch script":tutorial-job-debug.html page.
+The steps below describe how to execute the script after committing changes to git. To test the script locally, please see the "debugging a crunch script":{{site.baseurl}}/user/topics/tutorial-job-debug.html page.
 
 {% include 'notebox_end' %}
 
@@ -96,13 +95,13 @@ To git at git.qr1hi.arvadosapi.com:you.git
  * [new branch]      master -> master</code></pre>
 </notextile>
 
-You should now be able to run your script using Crunch, similar to how we did it in the "first tutorial.":tutorial-job1.html  The field @"script_version"@ should be @you:master@ to tell Crunch to run the script at the head of the "master" git branch, which you just uploaded.
+You should now be able to run your script using Crunch, as described in "running a crunch job on the command line.":tutorial-job1.html  The field @"script_version"@ should be <notextile><code><b>you</b>:master</code></notextile> to tell Crunch to run the script at the head of the "master" git branch, which you just pushed to the repository.
 
 <notextile>
 <pre><code>~/<b>you</b>/crunch_scripts$ <span class="userinput">cat >~/the_job <<EOF
 {
  "script": "hash.py",
- "script_version": "you:master",
+ "script_version": "<b>you</b>:master",
  "script_parameters":
  {
   "input": "c1bad4b39ca5a924e481008009d94e32+210"
@@ -126,4 +125,6 @@ EOF</span>
 </code></pre>
 </notextile>
 
-Next, "debugging a crunch script.":tutorial-job-debug.html
+<hr>
+
+Next, "writing a crunch pipeline.":tutorial-new-pipeline.html
diff --git a/doc/user/tutorials/tutorial-job1.html.textile.liquid b/doc/user/tutorials/tutorial-job1.html.textile.liquid
index a0dd896..463f86d 100644
--- a/doc/user/tutorials/tutorial-job1.html.textile.liquid
+++ b/doc/user/tutorials/tutorial-job1.html.textile.liquid
@@ -2,11 +2,10 @@
 layout: default
 navsection: userguide
 navmenu: Tutorials
-title: "Running a Crunch job"
-
+title: "Running a Crunch job on the command line"
 ...
 
-h1. Running a crunch job
+h1. Running a Crunch job on the command line
 
 This tutorial introduces the concepts and use of the Crunch job system using the @arv@ command line tool and Arvados Workbench.
 
@@ -26,7 +25,11 @@ The Arvados "Crunch" framework is designed to support processing very large data
 
 For your first job, you will run the "hash" crunch script using the Arvados system.  The "hash" script computes the md5 hash of each file in a collection.
 
-Crunch jobs are described using JSON objects.  For example:
+h2. Jobs
+
+A "job" is a single run of a specific version of a crunch script with a specific input.
+
+A request to run a crunch job are is described using a JSON object.  For example:
 
 <notextile>
 <pre><code>~$ <span class="userinput">cat >the_job <<EOF
@@ -46,7 +49,7 @@ EOF
 * @<<EOF@ tells the shell to direct the following lines into the standard input for @cat@ up until it sees the line @EOF@
 * @>the_job@ redirects standard output to a file called @the_job@
 * @"script"@ specifies the name of the script to run.  The script is searched for in the "crunch_scripts/" subdirectory of the @git@ checkout specified by @"script_version"@.
-* @"script_version"@ specifies the version of the script that you wish to run.  This can be in the form of an explicit @git@ revision hash, or in the form "repository:branch" (in which case it will take the HEAD of the specified branch).  Arvados logs the script version that was used in the run, enabling you to go back and re-run any past job with the guarantee that the exact same code will be used as was used in the previous run.  You can access a list of available @git@ repositories on the Arvados workbench under _Compute %(rarr)→% Code repositories_.
+* @"script_version"@ specifies the version of the script that you wish to run.  This can be in the form of an explicit @git@ revision hash, or in the form "repository:branch" (in which case it will take the HEAD of the specified branch).  Arvados logs the script version that was used in the run, enabling you to go back and re-run any past job with the guarantee that the exact same code will be used as was used in the previous run.  You can access a list of available @git@ repositories on the Arvados workbench under "Compute %(rarr)→% Code repositories":http://{{site.arvados_workbench_host}}/repositiories .
 * @"script_parameters"@ are provided to the script.  In this case, the input is the locator for the collection that we inspected in the previous section.
 
 Use @arv job create@ to actually submit the job.  It should print out a JSON object which describes the newly created job:
@@ -98,7 +101,7 @@ The job is now queued and will start running as soon as it reaches the front of
 
 h2. Monitor job progress
 
-Go to the Workbench dashboard.  Your job should be at the top of the "Recent jobs" table.  This table refreshes automatically.  When the job has completed successfully, it will show <span class="label label-success">finished</span> in the *Status* column.
+Go to the "Workbench dashboard":http://{{site.arvados_workbench_host}}.  Your job should be at the top of the "Recent jobs" table.  This table refreshes automatically.  When the job has completed successfully, it will show <span class="label label-success">finished</span> in the *Status* column.
 
 On the command line, you can access log messages while the job runs using @arv job log_tail_follow@:
 
@@ -108,7 +111,7 @@ This will print out the last several lines of the log for that job.
 
 h2. Inspect the job output
 
-On the workbench dashboard, look for the *Output* column of the *Recent jobs* table.  Click on the link under *Output* for your job to go to the files page with the job output.  The files page lists all the files that were output by the job.  Click on the link under the *files* column to view a file, or click on the download icon <span class="glyphicon glyphicon-download-alt"></span> to download the output file.
+On the "Workbench dashboard":http://{{site.arvados_workbench_host}}, look for the *Output* column of the *Recent jobs* table.  Click on the link under *Output* for your job to go to the files page with the job output.  The files page lists all the files that were output by the job.  Click on the link under the *files* column to view a file, or click on the download icon <span class="glyphicon glyphicon-download-alt"></span> to download the output file.
 
 On the command line, you can use @arv job get@ to access a JSON object describing the output:
 
@@ -232,4 +235,6 @@ The log collection consists of one log file named with the job id.  You can acce
 </code></pre>
 </notextile>
 
-This concludes the first tutorial.  In the next tutorial, we will "write a script to compute the hash.":tutorial-firstscript.html
+<hr>
+
+This concludes the first tutorial.  In the next tutorial, we will "write a crunch job script.":tutorial-firstscript.html
diff --git a/doc/user/tutorials/tutorial-keep.html.textile.liquid b/doc/user/tutorials/tutorial-keep.html.textile.liquid
index 8196363..f736bd0 100644
--- a/doc/user/tutorials/tutorial-keep.html.textile.liquid
+++ b/doc/user/tutorials/tutorial-keep.html.textile.liquid
@@ -2,11 +2,10 @@
 layout: default
 navsection: userguide
 navmenu: Tutorials
-title: "Storing and Retrieving data using Arvados Keep"
-
+title: "Storing and Retrieving data using Keep"
 ...
 
-h1. Storing and Retrieving data using Arvados Keep
+h1. Storing and Retrieving data using Keep
 
 This tutorial introduces you to the Arvados file storage system.
 
@@ -74,47 +73,56 @@ You can also use @arv keep put@ to add an entire directory:
 </code></pre>
 </notextile>
 
+The locator @887cd41e9c613463eab2f0d885c6dd96+83@ represents a collection with multiple files.
+
 h1. Getting Data from Keep
 
-In Keep, information is stored in *data blocks*.  Data blocks are normally between 1 byte and 64 megabytes in size.  If a file exceeds the maximum size of a single data block, the file will be split across multiple data blocks until the entire file can be stored.  These data blocks may be stored and replicated across multiple disks, servers, or clusters.  Each data block has its own identifier for the contents of that specific data block.
+h2. Using Workbench
+
+You may access collections through the "Collections section of Arvados Workbench":https://{{ site.arvados_workbench_host }}/collections located at "https://{{ site.arvados_workbench_host }}/collections":https://{{ site.arvados_workbench_host }}/collections .  You can also access individual collections and individual files within a collection.  Some examples:
+
+* "https://{{ site.arvados_workbench_host }}/collections/c1bad4b39ca5a924e481008009d94e32+210":https://{{ site.arvados_workbench_host }}/collections/c1bad4b39ca5a924e481008009d94e32+210
+* "https://{{ site.arvados_workbench_host }}/collections/887cd41e9c613463eab2f0d885c6dd96+83/alice.txt":https://{{ site.arvados_workbench_host }}/collections/887cd41e9c613463eab2f0d885c6dd96+83/alice.txt
 
-In order to reassemble the file, Keep stores a *collection* data block which lists in sequence the data blocks that make up the original file.  A collection data block may store the information for multiple files, including a directory structure.
+h2. Using arv-get
 
-In this example we will use @c1bad4b39ca5a924e481008009d94e32+210@ which we added to keep in the previous section.  First let us examine the contents of this collection using @arv keep get@:
+You can view the contents of a collection using @arv keep ls@:
 
 <notextile>
-<pre><code>/scratch/<b>you</b>$ <span class="userinput">arv keep get c1bad4b39ca5a924e481008009d94e32+210</span>
-. 204e43b8a1185621ca55a94839582e6f+67108864 b9677abbac956bd3e86b1deb28dfac03+67108864 fc15aff2a762b13f521baf042140acec+67108864 323d2a3ce20370c4ca1d3462a344f8fd+25885655 0:227212247:var-GS000016015-ASM.tsv.bz2
+<pre><code>/scratch/<b>you</b>$ <span class="userinput">arv keep ls c1bad4b39ca5a924e481008009d94e32+210</span>
+var-GS000016015-ASM.tsv.bz2
 </code></pre>
-</notextile>
 
-The command @arv keep get@ fetches the contents of the locator @c1bad4b39ca5a924e481008009d94e32+210 at .  This is a locator for a collection data block, so it fetches the contents of the collection.  In this example, this collection consists of a single file @var-GS000016015-ASM.tsv.bz2@ which is 227212247 bytes long, and is stored using four sequential data blocks, <code>204e43b8a1185621ca55a94839582e6f+67108864</code>, <code>b9677abbac956bd3e86b1deb28dfac03+67108864</code>, <code>fc15aff2a762b13f521baf042140acec+67108864</code>, <code>323d2a3ce20370c4ca1d3462a344f8fd+25885655</code>.
+<pre><code>/scratch/<b>you</b>$ <span class="userinput">arv keep ls 887cd41e9c613463eab2f0d885c6dd96+83</span>
+alice.txt
+bob.txt
+carol.txt
+</code></pre>
+</notextile>
 
-Let's use @arv keep get@ to download the first datablock:
+Use @-s@ to print file sizes rounded up to the nearest kilobyte:
 
-notextile. <pre><code>/scratch/<b>you</b>$ <span class="userinput">arv keep get 204e43b8a1185621ca55a94839582e6f+67108864 > block1</span></code></pre>
+<notextile>
+<pre><code>/scratch/<b>you</b>$ <span class="userinput">arv keep ls -s c1bad4b39ca5a924e481008009d94e32+210</span>
+221887 var-GS000016015-ASM.tsv.bz2
+</code></pre>
+</notextile>
 
-Let's look at the size and compute the md5 hash of @block1@:
+Use @arv keep get@ to download the contents of a collection and place it in the directory specified in the second argument (in this example, @.@ for the current directory):
 
 <notextile>
-<pre><code>/scratch/<b>you</b>$ <span class="userinput">ls -l block1</span>
--rw-r--r-- 1 you group 67108864 Dec  9 20:14 block1
-/scratch/<b>you</b>$ <span class="userinput">md5sum block1</span>
-204e43b8a1185621ca55a94839582e6f  block1
+<pre><code>/scratch/<b>you</b>$ <span class="userinput">arv keep get c1bad4b39ca5a924e481008009d94e32+210/ .</span>
 </code></pre>
 </notextile>
 
-Notice that the block identifer <code>204e43b8a1185621ca55a94839582e6f+67108864</code> of:
-* the md5 hash @204e43b8a1185621ca55a94839582e6f@ which matches the md5 hash of @block1@
-* a size hint @67108864@ which matches the size of @block1@
-
-Next, let's use @arv keep get@ to download and reassemble @var-GS000016015-ASM.tsv.bz2@ using the following command:
+You can also download indvidual files:
 
 <notextile>
-<pre><code>/scratch/<b>you</b>$ <span class="userinput">arv keep get c1bad4b39ca5a924e481008009d94e32+210/var-GS000016015-ASM.tsv.bz2 .</span>
+<pre><code>/scratch/<b>you</b>$ <span class="userinput">arv keep get 887cd41e9c613463eab2f0d885c6dd96+83/alice.txt .</span>
 </code></pre>
+</notextile>
 
-This downloads the file @var-GS000016015-ASM.tsv.bz2@ described by collection @c1bad4b39ca5a924e481008009d94e32+210@ from Keep and places it into the local directory.  Now that we have the file, we can compute the md5 hash of the complete file:
+With a local copy of the file, we can do some computation, for example computing the md5 hash of the complete file:
 
 <notextile>
 <pre><code>/scratch/<b>you</b>$ <span class="userinput">md5sum var-GS000016015-ASM.tsv.bz2</span>
@@ -122,22 +130,44 @@ This downloads the file @var-GS000016015-ASM.tsv.bz2@ described by collection @c
 </code></pre>
 </notextile>
 
-h2. Accessing Collections
+h2. Using arv-mount
 
-There are a couple of other ways to access a collection.  You may view the contents of a collection using @arv keep ls@:
+Use @arv-mount@ to take advantage of the "File System in User Space / FUSE":http://fuse.sourceforge.net/ feature of the Linux kernel to mount a Keep collection as if it were a regular directory tree.
 
 <notextile>
-<pre><code>/scratch/<b>you</b>$ <span class="userinput">arv keep ls c1bad4b39ca5a924e481008009d94e32+210</span>
+<pre><code>/scratch/<b>you</b>$ <span class="userinput">mkdir mnt</span>
+/scratch/<b>you</b>$ <span class="userinput">arv-mount --collection c1bad4b39ca5a924e481008009d94e32+210 mnt &</span>
+/scratch/<b>you</b>$ <span class="userinput">cd mnt</span>
+/scratch/<b>you</b>/mnt$ <span class="userinput">ls</span>
 var-GS000016015-ASM.tsv.bz2
-/scratch/<b>you</b>$ <span class="userinput">arv keep ls -s c1bad4b39ca5a924e481008009d94e32+210</span>
-221887 var-GS000016015-ASM.tsv.bz2
+/scratch/<b>you</b>/mnt$ <span class="userinput">md5sum var-GS000016015-ASM.tsv.bz2</span>
+44b8ae3fde7a8a88d2f7ebd237625b4f  var-GS000016015-ASM.tsv.bz2
+/scratch/<b>you</b>/mnt$ <span class="userinput">cd ..</span>
+/scratch/<b>you</b>$ <span class="userinput">fusermount -u mnt</span>
+</code></pre>
+</notextile>
+
+You can also mount the entire Keep namespace in "magic directory" mode:
+
+<notextile>
+<pre><code>/scratch/<b>you</b>$ <span class="userinput">mkdir mnt</span>
+/scratch/<b>you</b>$ <span class="userinput">arv-mount mnt &</span>
+/scratch/<b>you</b>$ <span class="userinput">cd mnt/c1bad4b39ca5a924e481008009d94e32+210</span>
+/scratch/<b>you</b>/mnt/c1bad4b39ca5a924e481008009d94e32+210$ <span class="userinput">ls</span>
+var-GS000016015-ASM.tsv.bz2
+/scratch/<b>you</b>/mnt/c1bad4b39ca5a924e481008009d94e32+210$ <span class="userinput">md5sum var-GS000016015-ASM.tsv.bz2</span>
+44b8ae3fde7a8a88d2f7ebd237625b4f  var-GS000016015-ASM.tsv.bz2
+/scratch/<b>you</b>/mnt/c1bad4b39ca5a924e481008009d94e32+210$ <span class="userinput">cd ../..</span>
+/scratch/<b>you</b>$ <span class="userinput">fusermount -u mnt</span>
 </code></pre>
 </notextile>
 
-* @-s@ prints file sizes in kilobytes
+Using @arv-mount@ has several significant benefits:
 
-You may also access through the Arvados Workbench using a URI similar to this, where the last part of the path is the Keep locator:
+* You can browse, open and read Keep entries as if they are regular files.
+* It is easy for existing tools to access files in Keep.
+* Data is downloaded on demand, it is not necessary to download an entire file or collection to start processing
 
-"https://workbench.{{ site.arvados_api_host }}/collections/c1bad4b39ca5a924e481008009d94e32+210":https://workbench.{{ site.arvados_api_host }}/collections/c1bad4b39ca5a924e481008009d94e32+210
+<hr>
 
 You are now ready to proceed to the next tutorial, "running a crunch job.":tutorial-job1.html
diff --git a/doc/user/tutorials/tutorial-new-pipeline.html.textile.liquid b/doc/user/tutorials/tutorial-new-pipeline.html.textile.liquid
index ed319b4..c881617 100644
--- a/doc/user/tutorials/tutorial-new-pipeline.html.textile.liquid
+++ b/doc/user/tutorials/tutorial-new-pipeline.html.textile.liquid
@@ -2,11 +2,10 @@
 layout: default
 navsection: userguide
 navmenu: Tutorials
-title: "Constructing a Crunch pipeline"
-
+title: "Writing a Crunch pipeline"
 ...
 
-h1. Constructing a Crunch pipeline
+h1. Writing a Crunch pipeline
 
 A pipeline in Arvados is a collection of crunch scripts, in which the output from one script may be used as the input to another script.
 
diff --git a/doc/user/tutorials/tutorial-pipeline-workbench.html.textile.liquid b/doc/user/tutorials/tutorial-pipeline-workbench.html.textile.liquid
new file mode 100644
index 0000000..da968c2
--- /dev/null
+++ b/doc/user/tutorials/tutorial-pipeline-workbench.html.textile.liquid
@@ -0,0 +1,8 @@
+---
+layout: default
+navsection: userguide
+navmenu: Tutorials
+title: "Running a pipeline using Workbench"
+...
+
+h1. Running a pipeline using Workbench

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list