[arvados] Creating pipeline templates without keep docker

George Chlipala gchlip2 at uic.edu
Wed Aug 24 17:11:41 EDT 2016

Tom -

Thanks for the response.

Primarily, I am having issues with the keep docker.  Docker is providing
fully qualified names, e.g. docker.io/arvados/jobs, and when the job run it
seems to be looking for the relative name of the image, e.g. arvados/jobs.
  I can run the CWL script using cwltool just fine.  It seems that Arvados
has a problem with the docker image name.   I had problems with the
<hash>/json vs. <hash>.json issue and was able to apply the fix.  I just
seemed like the name issue was just another problem and I though that if I
could eliminate the "middle man" it would simplify the system and we could
actually use Arvados.

Also, I have noticed that the arvados keep docker cannot handle docker
image registries that have a port.  For example we have a private registry
of  If I try to add an image from the registry to the
keep docker, e.g., the keep docker will see the
image name as  and the tag as 5000/mytest:latest.  My solution
is to setup a reverse proxy for this, but just another Arvados-ism that
requires extra effort.

I am also very concerned about storage, We plan to have a number of tools
that we would use both with arvados and outside of arvados.  Considering
that avardos stores the docker image as a tar ball of all layers I am
concerned that if we make a large number of tools available to arvados,
there will be multiple copies of dependency layers in the keep.   Also, if
we needed to update a dependency layer and decided to update the tools, it
seems that we would also need to update arvados along with our private
registry.   I do understand that the keep does have some level of data
deduplication using blocks which should help some, however if the image
tarballs are not, for lack of a better word, aligned properly the resulting
blocks may split differently and would not be properly deduplicated.   Even
if the images are 100% deduplicated, if these are tools in our private
registry we would still have two copies, one in the private registry and
one in the keep.

I can appreciate the issue of having reproducible pipelines but couldn't
you just use the image hashes, now provided by docker, to ensure that that
the tools of the pipeline have not changed?

George Chlipala, Ph.D.
Senior Research Specialist
Research Resources Center
University of Illinois at Chicago

phone: 312-413-1700
email: gchlip2 at uic.edu

On Wed, Aug 24, 2016 at 3:26 PM, Tom Morris <tfmorris at curoverse.com> wrote:

> On Wed, Aug 24, 2016 at 12:06 PM, George Chlipala <gchlip2 at uic.edu> wrote:
> > Is it possible to configure arvados to create and run jobs (pipelines)
> > without having to store the docker images in keep, i.e. just have the
> docker
> > on the compute nodes download docker images directly from a docker
> > repository?
> This isn't currently possible. One problem with allowing this is that
> it would negatively impact the reproducability of the pipeline.
> What is your motivation for wanting to do this?
> Tom Morris
> Director, Product Management
> Curoverse
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.arvados.org/pipermail/arvados/attachments/20160824/2b5e7c67/attachment.html>

More information about the arvados mailing list