[ARVADOS] created: fb0585a11cc10929cd8121ddd219855172754ef3

git at public.curoverse.com git at public.curoverse.com
Thu Aug 7 10:44:36 EDT 2014


        at  fb0585a11cc10929cd8121ddd219855172754ef3 (commit)


commit fb0585a11cc10929cd8121ddd219855172754ef3
Author: Peter Amstutz <peter.amstutz at curoverse.com>
Date:   Thu Aug 7 10:44:31 2014 -0400

    "Writing a crunch script" now shows how to run locally, "Running on an Arvados cluster" shows how to commit to git and create a pipeline template.

diff --git a/doc/_config.yml b/doc/_config.yml
index d8b0d88..e89c89c 100644
--- a/doc/_config.yml
+++ b/doc/_config.yml
@@ -37,7 +37,7 @@ navbar:
       - user/tutorials/intro-crunch.html.textile.liquid
       - user/tutorials/running-external-program.html.textile.liquid
       - user/tutorials/tutorial-firstscript.html.textile.liquid
-      - user/topics/tutorial-job-debug.html.textile.liquid
+      - user/tutorials/tutorial-submit-job.html.textile.liquid
       - user/topics/tutorial-parallel.html.textile.liquid
       - user/examples/crunch-examples.html.textile.liquid
     - Query the metadata database:
diff --git a/doc/user/topics/tutorial-job-debug.html.textile.liquid b/doc/user/topics/tutorial-job-debug.html.textile.liquid
deleted file mode 100644
index 63f152e..0000000
--- a/doc/user/topics/tutorial-job-debug.html.textile.liquid
+++ /dev/null
@@ -1,163 +0,0 @@
----
-layout: default
-navsection: userguide
-title: "Debugging a Crunch script"
-...
-
-To test changes to a script by running a job, the change must be pushed to your hosted repository, and the job might have to wait in the queue before it runs. This cycle can be an inefficient way to develop and debug scripts. This tutorial demonstrates an alternative: using @arv-crunch-job@ to run your job in your local VM.  This avoids the job queue and allows you to execute the script directly from your git working tree without committing or pushing.
-
-{% include 'tutorial_expectations' %}
-
-This tutorial uses @$USER@ to denote your username.  Replace @$USER@ with your user name in all the following examples.
-
-h2. Create a new script
-
-Change to your Git working directory and create a new script in @crunch_scripts/@.
-
-<notextile>
-<pre><code>~$ <span class="userinput">cd $USER/crunch_scripts</span>
-~/$USER/crunch_scripts$ <span class="userinput">cat >hello-world.py <<EOF
-#!/usr/bin/env python
-
-print "hello world"
-print "this script will fail, and that is expected!"
-EOF</span>
-~/$USER/crunch_scripts$ <span class="userinput">chmod +x hello-world.py</span>
-</code></pre>
-</notextile>
-
-h2. Using arv-crunch-job to run the job in your VM
-
-Instead of a Git commit hash, we provide the path to the directory in the "script_version" parameter.  The script specified in "script" is expected to be in the @crunch_scripts/@ subdirectory of the directory specified "script_version".  Although we are running the script locally, the script still requires access to the Arvados API server and Keep storage service. The job will be recorded in the Arvados job history, and visible in Workbench.
-
-<notextile>
-<pre><code>~/$USER/crunch_scripts$ <span class="userinput">cat >~/the_job <<EOF
-{
- "repository":"",
- "script":"hello-world.py",
- "script_version":"$HOME/$USER",
- "script_parameters":{}
-}
-EOF</span>
-</code></pre>
-</notextile>
-
-Your shell should fill in values for @$HOME@ and @$USER@ so that the saved JSON points "script_version" at the directory with your checkout.  Now you can run that job:
-
-<notextile>
-<pre><code>~/$USER/crunch_scripts</span>$ <span class="userinput">arv-crunch-job --job "$(cat ~/the_job)"</span>
-2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827  check slurm allocation
-2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827  node localhost - 1 slots
-2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827  start
-2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827  script hello-world.py
-2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827  script_version /home/$USER/$USER
-2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827  script_parameters {}
-2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827  runtime_constraints {"max_tasks_per_node":0}
-2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827  start level 0
-2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827  status: 0 done, 0 running, 1 todo
-2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 0 job_task qr1hi-ot0gb-4zdajby8cjmlguh
-2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 0 child 29834 started on localhost.1
-2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827  status: 0 done, 1 running, 0 todo
-2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 0 stderr hello world
-2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 0 stderr this script will fail, and that is expected!
-2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827 0 child 29834 on localhost.1 exit 0 signal 0 success=
-2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827 0 failure (#1, permanent) after 0 seconds
-2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827 0 output
-2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827  Every node has failed -- giving up on this round
-2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827  wait for last 0 children to finish
-2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827  status: 0 done, 0 running, 0 todo
-2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827  Freeze not implemented
-2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827  collate
-2013-12-12_21:36:43 qr1hi-8i9sb-okzukfzkpbrnhst 29827  output d41d8cd98f00b204e9800998ecf8427e+0
-2013-12-12_21:36:44 qr1hi-8i9sb-okzukfzkpbrnhst 29827  meta key is c00bfbd58e6f58ce3bebdd47f745a70f+1857
-</code></pre>
-</notextile>
-
-These are the lines of interest:
-
-bc. 2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 0 stderr hello world
-2013-12-12_21:36:42 qr1hi-8i9sb-okzukfzkpbrnhst 29827 0 stderr this script will fail, and that is expected!
-
-The script's output is captured in the log, which is useful for print statement debugging. However, although this script returned a status code of 0 (success), the job failed.  Why?  For a job to complete successfully scripts must explicitly add their output to Keep, and then tell Arvados about it.  Here is a second try:
-
-<notextile>
-<pre><code>~/$USER/crunch_scripts$ <span class="userinput">cat >hello-world-fixed.py <<EOF
-#!/usr/bin/env python
-
-import arvados
-
-# Create a new collection
-out = arvados.CollectionWriter()
-
-# Set the name of the file in the collection to write to
-out.set_current_file_name('hello.txt')
-
-# Actually output our text
-out.write('hello world')
-
-# Commit the collection to Keep
-out_collection = out.finish()
-
-# Tell Arvados which Keep object is our output
-arvados.current_task().set_output(out_collection)
-
-# Done!
-EOF</span>
-~/$USER/crunch_scripts$ <span class="userinput">chmod +x hello-world-fixed.py</span>
-~/$USER/crunch_scripts$ <span class="userinput">cat >~/the_job <<EOF
-{
- "repository":"",
- "script":"hello-world-fixed.py",
- "script_version":"$HOME/$USER",
- "script_parameters":{}
-}
-EOF</span>
-~/$USER/crunch_scripts$ <span class="userinput">arv-crunch-job --job "$(cat ~/the_job)"</span>
-2013-12-12_21:56:59 qr1hi-8i9sb-79260ykfew5trzl 31578  check slurm allocation
-2013-12-12_21:56:59 qr1hi-8i9sb-79260ykfew5trzl 31578  node localhost - 1 slots
-2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578  start
-2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578  script hello-world-fixed.py
-2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578  script_version /home/$USER/$USER
-2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578  script_parameters {}
-2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578  runtime_constraints {"max_tasks_per_node":0}
-2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578  start level 0
-2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578  status: 0 done, 0 running, 1 todo
-2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578 0 job_task qr1hi-ot0gb-u8g594ct0wt7f3f
-2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578 0 child 31585 started on localhost.1
-2013-12-12_21:57:00 qr1hi-8i9sb-79260ykfew5trzl 31578  status: 0 done, 1 running, 0 todo
-2013-12-12_21:57:02 qr1hi-8i9sb-79260ykfew5trzl 31578 0 child 31585 on localhost.1 exit 0 signal 0 success=true
-2013-12-12_21:57:02 qr1hi-8i9sb-79260ykfew5trzl 31578 0 success in 1 seconds
-2013-12-12_21:57:02 qr1hi-8i9sb-79260ykfew5trzl 31578 0 output 576c44d762ba241b0a674aa43152b52a+53
-2013-12-12_21:57:02 qr1hi-8i9sb-79260ykfew5trzl 31578  wait for last 0 children to finish
-2013-12-12_21:57:02 qr1hi-8i9sb-79260ykfew5trzl 31578  status: 1 done, 0 running, 0 todo
-2013-12-12_21:57:02 qr1hi-8i9sb-79260ykfew5trzl 31578  Freeze not implemented
-2013-12-12_21:57:02 qr1hi-8i9sb-79260ykfew5trzl 31578  collate
-2013-12-12_21:57:02 qr1hi-8i9sb-79260ykfew5trzl 31578  output 576c44d762ba241b0a674aa43152b52a+53
-WARNING:root:API lookup failed for collection 576c44d762ba241b0a674aa43152b52a+53 (<class 'apiclient.errors.HttpError'>: <HttpError 404 when requesting https://qr1hi.arvadosapi.com/arvados/v1/collections/576c44d762ba241b0a674aa43152b52a%2B53?alt=json returned "Not Found">)
-2013-12-12_21:57:03 qr1hi-8i9sb-79260ykfew5trzl 31578  finish
-</code></pre>
-</notextile>
-
-(The WARNING issued near the end of the script may be safely ignored here; it is the Arvados SDK letting you know that it could not find a collection named @576c44d762ba241b0a674aa43152b52a+53@ and that it is going to try looking up a block by that name instead.)
-
-The job succeeded, with output in Keep object @576c44d762ba241b0a674aa43152b52a+53 at .  Let's look at our output:
-
-<notextile>
-<pre><code>~/$USER/crunch_scripts$ <span class="userinput">arv keep get 576c44d762ba241b0a674aa43152b52a+53/hello.txt</span>
-hello world
-</code></pre>
-</notextile>
-
-h3. Location of temporary files
-
-Crunch job tasks are supplied with @TASK_WORK@ and @JOB_WORK@ environment variables, to be used as scratch space.  When running in local development mode using @arv-crunch-job@, Crunch sets these variables to point to directory called @crunch-job-{USERID}@ in @TMPDIR@ (or @/tmp@ if @TMPDIR@ is not set).
-
-* Set @TMPDIR@ to @/scratch@ to make Crunch use a directory like @/scratch/crunch-job-{USERID}/@ for temporary space.
-
-* Set @CRUNCH_TMP@ to @/scratch/foo@ to make Crunch use @/scratch/foo/@ for temporary space (omitting the default @crunch-job-{USERID}@ leaf name)
-
-h3. Testing job scripts without SDKs and Keep access
-
-Read and write data to @/tmp/@ instead of Keep. This only works with the Python SDK.
-
-notextile. <pre><code>~$ <span class="userinput">export KEEP_LOCAL_STORE=/tmp</span></code></pre>
diff --git a/doc/user/tutorials/tutorial-firstscript.html.textile.liquid b/doc/user/tutorials/tutorial-firstscript.html.textile.liquid
index 9c3242f..35710c0 100644
--- a/doc/user/tutorials/tutorial-firstscript.html.textile.liquid
+++ b/doc/user/tutorials/tutorial-firstscript.html.textile.liquid
@@ -5,55 +5,22 @@ navmenu: Tutorials
 title: "Writing a Crunch script"
 ...
 
-This tutorial demonstrates how to create a new Arvados pipeline using the Arvados Python SDK.  The Arvados SDK supports access to advanced features not available using the @run-command@ wrapper, such as scheduling parallel tasks across nodes.
+This tutorial demonstrates how to write a script using Arvados Python SDK.  The Arvados SDK supports access to advanced features not available using the @run-command@ wrapper, such as scheduling parallel tasks across nodes.
 
 {% include 'tutorial_expectations' %}
 
 This tutorial uses @$USER@ to denote your username.  Replace @$USER@ with your user name in all the following examples.
 
-h2. Setting up Git
-
-All Crunch scripts are managed through the Git revision control system.  Before you start using Git, you should do some basic configuration (you only need to do this the first time):
+Start by creating a directory called @$USER@ .  Next, create a subdirectory called @crunch_scripts@ and change to that directory:
 
 <notextile>
-<pre><code>~$ <span class="userinput">git config --global user.name "Your Name"</span>
-~$ <span class="userinput">git config --global user.email $USER at example.com</span></code></pre>
-</notextile>
-
-On the Arvados Workbench, navigate to "Code repositories":https://{{site.arvados_workbench_host}}/repositories.  You should see a repository with your user name listed in the *name* column.  Next to *name* is the column *push_url*.  Copy the *push_url* value associated with your repository.  This should look like <notextile><code>git at git.{{ site.arvados_api_host }}:$USER.git</code></notextile>.
-
-Next, on the Arvados virtual machine, clone your Git repository:
-
-<notextile>
-<pre><code>~$ <span class="userinput">cd $HOME</span> # (or wherever you want to install)
-~$ <span class="userinput">git clone git at git.{{ site.arvados_api_host }}:$USER.git</span>
-Cloning into '$USER'...</code></pre>
-</notextile>
-
-This will create a Git repository in the directory called @$USER@ in your home directory. Say yes when prompted to continue with connection.
-Ignore any warning that you are cloning an empty repository.
-
-{% include 'notebox_begin' %}
-For more information about using Git, try
-
-notextile. <pre><code>$ <span class="userinput">man gittutorial</span></code></pre>
-
-or *"search Google for Git tutorials":http://google.com/#q=git+tutorial*.
-{% include 'notebox_end' %}
-
-h2. Creating a Crunch script
-
-Start by entering the @$USER@ directory created by @git clone at .  Next create a subdirectory called @crunch_scripts@ and change to that directory:
-
-<notextile>
-<pre><code>~$ <span class="userinput">cd $USER</span>
-~/$USER$ <span class="userinput">mkdir crunch_scripts</span>
-~/$USER$ <span class="userinput">cd crunch_scripts</span></code></pre>
+<pre><code>~$ <span class="userinput">mkdir -p tutorial/crunch_scripts</span>
+~$ <span class="userinput">cd tutorial/crunch_scripts</span></code></pre>
 </notextile>
 
 Next, using @nano@ or your favorite Unix text editor, create a new file called @hash.py@ in the @crunch_scripts@ directory.
 
-notextile. <pre>~/$USER/crunch_scripts$ <code class="userinput">nano hash.py</code></pre>
+notextile. <pre>~/tutorial/crunch_scripts$ <code class="userinput">nano hash.py</code></pre>
 
 Add the following code to compute the MD5 hash of each file in a collection:
 
@@ -61,82 +28,74 @@ Add the following code to compute the MD5 hash of each file in a collection:
 
 Make the file executable:
 
-notextile. <pre><code>~/$USER/crunch_scripts$ <span class="userinput">chmod +x hash.py</span></code></pre>
-
-{% include 'notebox_begin' %}
-The steps below describe how to execute the script after committing changes to Git. To run a single script locally for testing (bypassing the job queue) please see "debugging a crunch script":{{site.baseurl}}/user/topics/tutorial-job-debug.html.
-
-{% include 'notebox_end' %}
+notextile. <pre><code>~/tutorial/crunch_scripts$ <span class="userinput">chmod +x hash.py</span></code></pre>
 
-Next, add the file to the staging area.  This tells @git@ that the file should be included on the next commit.
-
-notextile. <pre><code>~/$USER/crunch_scripts$ <span class="userinput">git add hash.py</span></code></pre>
-
-Next, commit your changes.  All staged changes are recorded into the local git repository:
-
-<notextile>
-<pre><code>~/$USER/crunch_scripts$ <span class="userinput">git commit -m"my first script"</span>
-[master (root-commit) 27fd88b] my first script
- 1 file changed, 45 insertions(+)
- create mode 100755 crunch_scripts/hash.py</code></pre>
-</notextile>
-
-Finally, upload your changes to the Arvados server:
-
-<notextile>
-<pre><code>~/$USER/crunch_scripts$ <span class="userinput">git push origin master</span>
-Counting objects: 4, done.
-Compressing objects: 100% (2/2), done.
-Writing objects: 100% (4/4), 682 bytes, done.
-Total 4 (delta 0), reused 0 (delta 0)
-To git at git.qr1hi.arvadosapi.com:$USER.git
- * [new branch]      master -> master</code></pre>
-</notextile>
-
-h2. Create a pipeline template
-
-Next, create a file that contains the pipeline definition:
+Next, create a submission job record.  This describes a specific invocation of your script:
 
 <notextile>
-<pre><code>~/$USER/crunch_scripts$ <span class="userinput">cd ~</span>
-~$ <span class="userinput">cat >the_pipeline <<EOF
+<pre><code>~/tutorial/crunch_scripts$ <span class="userinput">cat >~/the_job <<EOF
 {
-  "name":"My md5 pipeline",
-  "components":{
-    "do_hash":{
-      "script":"hash.py",
-      "script_parameters":{
-        "input":{
-          "required": true,
-          "dataclass": "Collection"
-        }
-      },
-      "repository":"$USER",
-      "script_version":"master",
-      "output_is_persistent":true,
-      "runtime_constraints":{
-        "docker_image":"arvados/jobs-java-bwa-samtools"
-      }
-    }
-  }
+ "repository":"",
+ "script":"hash.py",
+ "script_version":"$HOME/tutorial",
+ "script_parameters":{
+   "input":"c1bad4b39ca5a924e481008009d94e32+210"
+ }
 }
-EOF
-</span></code></pre>
+EOF</span>
+</code></pre>
 </notextile>
 
-* @"repository"@ is the name of a git repository to search for the script version.  You can access a list of available git repositories on the Arvados Workbench under "Code repositories":https://{{site.arvados_workbench_host}}/repositories.
-* @"script_version"@ specifies the version of the script that you wish to run.  This can be in the form of an explicit Git revision hash, a tag, or a branch (in which case it will use the HEAD of the specified branch).  Arvados logs the script version that was used in the run, enabling you to go back and re-run any past job with the guarantee that the exact same code will be used as was used in the previous run.
-* @"script"@ specifies the filename of the script to run.  Crunch expects to find this in the @crunch_scripts/@ subdirectory of the Git repository.
-
-Now, use @arv pipeline_template create@ to register your pipeline template in Arvados:
+You can now run your script on your local workstation or VM using @arv-crunch-job@:
 
 <notextile>
-<pre><code>~$ <span class="userinput">arv pipeline_template create --pipeline-template "$(cat the_pipeline)"</span>
+<pre><code>~/tutorial/crunch_scripts</span>$ <span class="userinput">arv-crunch-job --job "$(cat ~/the_job)"</span>
+2014-08-06_15:16:22 qr1hi-8i9sb-qyrat80ef927lam 14473  check slurm allocation
+2014-08-06_15:16:22 qr1hi-8i9sb-qyrat80ef927lam 14473  node localhost - 1 slots
+2014-08-06_15:16:23 qr1hi-8i9sb-qyrat80ef927lam 14473  start
+2014-08-06_15:16:23 qr1hi-8i9sb-qyrat80ef927lam 14473  script hash.py
+2014-08-06_15:16:23 qr1hi-8i9sb-qyrat80ef927lam 14473  script_version /home/peter/peter
+2014-08-06_15:16:23 qr1hi-8i9sb-qyrat80ef927lam 14473  script_parameters {"input":"c1bad4b39ca5a924e481008009d94e32+210"}
+2014-08-06_15:16:23 qr1hi-8i9sb-qyrat80ef927lam 14473  runtime_constraints {"max_tasks_per_node":0}
+2014-08-06_15:16:23 qr1hi-8i9sb-qyrat80ef927lam 14473  start level 0
+2014-08-06_15:16:23 qr1hi-8i9sb-qyrat80ef927lam 14473  status: 0 done, 0 running, 1 todo
+2014-08-06_15:16:23 qr1hi-8i9sb-qyrat80ef927lam 14473 0 job_task qr1hi-ot0gb-lptn85mwkrn9pqo
+2014-08-06_15:16:23 qr1hi-8i9sb-qyrat80ef927lam 14473 0 child 14478 started on localhost.1
+2014-08-06_15:16:23 qr1hi-8i9sb-qyrat80ef927lam 14473  status: 0 done, 1 running, 0 todo
+2014-08-06_15:16:24 qr1hi-8i9sb-qyrat80ef927lam 14473 0 stderr crunchstat: Running [stdbuf --output=0 --error=0 /home/$USER/tutorial/crunch_scripts/hash.py]
+2014-08-06_15:16:24 qr1hi-8i9sb-qyrat80ef927lam 14473 0 child 14478 on localhost.1 exit 0 signal 0 success=true
+2014-08-06_15:16:24 qr1hi-8i9sb-qyrat80ef927lam 14473 0 success in 1 seconds
+2014-08-06_15:16:24 qr1hi-8i9sb-qyrat80ef927lam 14473 0 output
+2014-08-06_15:16:25 qr1hi-8i9sb-qyrat80ef927lam 14473  wait for last 0 children to finish
+2014-08-06_15:16:25 qr1hi-8i9sb-qyrat80ef927lam 14473  status: 1 done, 0 running, 1 todo
+2014-08-06_15:16:25 qr1hi-8i9sb-qyrat80ef927lam 14473  start level 1
+2014-08-06_15:16:25 qr1hi-8i9sb-qyrat80ef927lam 14473  status: 1 done, 0 running, 1 todo
+2014-08-06_15:16:25 qr1hi-8i9sb-qyrat80ef927lam 14473 1 job_task qr1hi-ot0gb-e3obm0lv6k6p56a
+2014-08-06_15:16:25 qr1hi-8i9sb-qyrat80ef927lam 14473 1 child 14504 started on localhost.1
+2014-08-06_15:16:25 qr1hi-8i9sb-qyrat80ef927lam 14473  status: 1 done, 1 running, 0 todo
+2014-08-06_15:16:26 qr1hi-8i9sb-qyrat80ef927lam 14473 1 stderr crunchstat: Running [stdbuf --output=0 --error=0 /home/$USER/tutorial/crunch_scripts/hash.py]
+2014-08-06_15:16:35 qr1hi-8i9sb-qyrat80ef927lam 14473 1 child 14504 on localhost.1 exit 0 signal 0 success=true
+2014-08-06_15:16:35 qr1hi-8i9sb-qyrat80ef927lam 14473 1 success in 10 seconds
+2014-08-06_15:16:35 qr1hi-8i9sb-qyrat80ef927lam 14473 1 output 50cafdb29cc21dd6eaec85ba9e0c6134+56+Aef0f991b80fa0b75f802e58e70b207aa184d24ff at 53f4bbd3
+2014-08-06_15:16:35 qr1hi-8i9sb-qyrat80ef927lam 14473  wait for last 0 children to finish
+2014-08-06_15:16:35 qr1hi-8i9sb-qyrat80ef927lam 14473  status: 2 done, 0 running, 0 todo
+2014-08-06_15:16:35 qr1hi-8i9sb-qyrat80ef927lam 14473  Freeze not implemented
+2014-08-06_15:16:35 qr1hi-8i9sb-qyrat80ef927lam 14473  collate
+2014-08-06_15:16:36 qr1hi-8i9sb-qyrat80ef927lam 14473  output d6338df28d6b8e5d14929833b417e20e+107+Adf1ce81222b6992ce5d33d8bfb28a6b5a1497898 at 53f4bbd4
+2014-08-06_15:16:37 qr1hi-8i9sb-qyrat80ef927lam 14473  finish
+2014-08-06_15:16:38 qr1hi-8i9sb-qyrat80ef927lam 14473  log manifest is 7fe8cf1d45d438a3ca3ac4a184b7aff4+83
 </code></pre>
 </notextile>
 
-h2. Running your pipeline
+Although the job runs locally, the output of the job has been saved to Keep, the Arvados file store.  The "output" line (third from the bottom) provides the "Keep locator":/user/topics/tutorial-keep-get.html to which the script's output has been saved.  Copy the output identifier and use @arv-ls@ to list the contents of your output collection, and @arv-get@ to download it to the current directory:
 
-Your new pipeline template should appear at the top of the Workbench "pipeline templates":https://{{ site.arvados_workbench_host }}/pipeline_templates page.  You can run your pipeline "using Workbench":tutorial-pipeline-workbench.html or the "command line.":{{site.baseurl}}/user/topics/running-pipeline-command-line.html
+<notextile>
+<pre><code>~/tutorial/crunch_scripts$ <span class="userinput">arv-ls d6338df28d6b8e5d14929833b417e20e+107+Adf1ce81222b6992ce5d33d8bfb28a6b5a1497898 at 53f4bbd4</span>
+./md5sum.txt
+~/tutorial/crunch_scripts$ <span class="userinput">arv-get d6338df28d6b8e5d14929833b417e20e+107+Adf1ce81222b6992ce5d33d8bfb28a6b5a1497898 at 53f4bbd4/ .</span>
+~/tutorial/crunch_scripts$ <span class="userinput">cat md5sum.txt</span>
+44b8ae3fde7a8a88d2f7ebd237625b4f c1bad4b39ca5a924e481008009d94e32+210/./var-GS000016015-ASM.tsv.bz2
+</code></pre>
+</notextile>
 
-For more information and examples for writing pipelines, see the "pipeline template reference":{{site.baseurl}}/api/schema/PipelineTemplate.html
+Running locally is convenient for development and debugging, as it permits a fast iterative development cycle.  Your job run is also recorded by Arvados, and will show up in the "Recent jobs and pipelines" panel on the "Workbench dashboard":https://{{site.arvados_workbench_host}}.  This provides limited provenance, by recording the input parameters, the execution log, and the output.  However, running locally does not allow you to scale out to multiple nodes, and does not store the complete system snapshot required to achieve reproducibilty; to that you need to "submit a job to the Arvados cluster":/user/tutorials/tutorial-submit-job.html
diff --git a/doc/user/tutorials/tutorial-firstscript.html.textile.liquid b/doc/user/tutorials/tutorial-submit-job.html.textile.liquid
similarity index 90%
copy from doc/user/tutorials/tutorial-firstscript.html.textile.liquid
copy to doc/user/tutorials/tutorial-submit-job.html.textile.liquid
index 9c3242f..8be72a2 100644
--- a/doc/user/tutorials/tutorial-firstscript.html.textile.liquid
+++ b/doc/user/tutorials/tutorial-submit-job.html.textile.liquid
@@ -2,10 +2,10 @@
 layout: default
 navsection: userguide
 navmenu: Tutorials
-title: "Writing a Crunch script"
+title: "Running on an Arvados cluster"
 ...
 
-This tutorial demonstrates how to create a new Arvados pipeline using the Arvados Python SDK.  The Arvados SDK supports access to advanced features not available using the @run-command@ wrapper, such as scheduling parallel tasks across nodes.
+This tutorial demonstrates how to create a pipeline to run your crunch script on an Arvados cluster.  Cluster jobs can scale out to multiple nodes, and use @git@ and @docker@ to store the complete system snapshot required to achieve reproducibilty.
 
 {% include 'tutorial_expectations' %}
 
@@ -55,7 +55,7 @@ Next, using @nano@ or your favorite Unix text editor, create a new file called @
 
 notextile. <pre>~/$USER/crunch_scripts$ <code class="userinput">nano hash.py</code></pre>
 
-Add the following code to compute the MD5 hash of each file in a collection:
+Add the following code to compute the MD5 hash of each file in a collection (if you already completed "Writing a Crunch script":tutorial-firstscript.html you can just copy the @hash.py@ file you created previously.)
 
 <notextile> {% code 'tutorial_hash_script_py' as python %} </notextile>
 
@@ -63,11 +63,6 @@ Make the file executable:
 
 notextile. <pre><code>~/$USER/crunch_scripts$ <span class="userinput">chmod +x hash.py</span></code></pre>
 
-{% include 'notebox_begin' %}
-The steps below describe how to execute the script after committing changes to Git. To run a single script locally for testing (bypassing the job queue) please see "debugging a crunch script":{{site.baseurl}}/user/topics/tutorial-job-debug.html.
-
-{% include 'notebox_end' %}
-
 Next, add the file to the staging area.  This tells @git@ that the file should be included on the next commit.
 
 notextile. <pre><code>~/$USER/crunch_scripts$ <span class="userinput">git add hash.py</span></code></pre>

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list