[arvados] Tuning keep performance

George Chlipala gchlip2 at uic.edu
Fri Aug 4 14:08:25 EDT 2017


We have an application that we would use to download data from Illumina
Basespace directly to our servers.  Previously we had been writing directly
to disk and average data transfer speeds were > 10 MB/s.  We modified the
application (a python script) to now push the data into Arvados via a
CollectionWriter.  Now we are seeing data transfer speeds 200-400 kB/s.
Both the Arvados servers and our Basespace application are on the same
subnet and connected via 1 Gbps ethernet.

I have setup keepstore volume to serialize and I have the default buffer
setting.

Here is our keepstore configuration (keepstore.yml).

BlobSignatureTTL: 96h0m0s
BlobSigningKeyFile: /etc/arvados/keepstore/blob-signing.key
Debug: false
EnableDelete: true
Listen: :25107
LogFormat: text
MaxBuffers: 100
MaxRequests: 0
PIDFile: ""
RequireSignatures: false
SystemAuthTokenFile: /etc/arvados/keepstore/system-auth.key
TrashCheckInterval: 24h0m0s
TrashLifetime: 96h0m0s
Volumes:
- DirectoryReplication: 0
  ReadOnly: false
  Root: /mnt/keep
  Serialize: true
  Type: Directory

Also I have checked the socket connections on the system hosting the
application and it is directly connecting to the keepstore server.

Are there any other items to look at in order to improve performance?

For references, here are snippets from our push application.  The following
are the lines associated with creating the CollectionWriter.

self.arv = arvados.api(token=arv_token, host=arvados_api_host)
self.writer = CollectionWriter(self.arv, replication=replication)

The following are the lines on how we push the data.  The fileinfo object
is a custom class that has the path and filename for the file fetched from
Basespace.  We are fetching the file from Basespace and saving to a temp
directory in case there are issues during the download.  I have checked and
the download speed is > 10 MB/s.

with open(fileinfo.path, 'rb') as filein, self.writer.open('./raw_data/' +
fileinfo.filename) as col_file:
    logging.info("Adding file {0} to Arvados
collection".format(fileinfo.filename))
    for data in filein.read():
          col_file.write(data)
          fileinfo.byte_count += len(data)

    col_file.close()
    filein.close()

Any help would be greatly appreciated!

George Chlipala, Ph.D.
Senior Research Specialist
Research Resources Center
University of Illinois at Chicago

phone: 312-413-1700
email: gchlip2 at uic.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.arvados.org/pipermail/arvados/attachments/20170804/b7217721/attachment.html>


More information about the arvados mailing list