[arvados] Tuning keep performance

George Chlipala gchlip2 at uic.edu
Fri Aug 4 16:55:57 EDT 2017


Peter -

Thanks for the info!

George Chlipala, Ph.D.
Senior Research Specialist
Research Resources Center
University of Illinois at Chicago

phone: 312-413-1700 <(312)%20413-1700>
email: gchlip2 at uic.edu

On Fri, Aug 4, 2017 at 1:25 PM, Peter Amstutz <peter.amstutz at curoverse.com>
wrote:

> Try using the Collection class instead of CollectionWriter, and setting
> put_threads in the Collection constructor (in our experiments I think we
> found 4-6 threads to get the best throughput).
>
>
> On Fri, Aug 4, 2017 at 2:08 PM, George Chlipala <gchlip2 at uic.edu> wrote:
>
>> We have an application that we would use to download data from Illumina
>> Basespace directly to our servers.  Previously we had been writing directly
>> to disk and average data transfer speeds were > 10 MB/s.  We modified the
>> application (a python script) to now push the data into Arvados via a
>> CollectionWriter.  Now we are seeing data transfer speeds 200-400 kB/s.
>> Both the Arvados servers and our Basespace application are on the same
>> subnet and connected via 1 Gbps ethernet.
>>
>> I have setup keepstore volume to serialize and I have the default buffer
>> setting.
>>
>> Here is our keepstore configuration (keepstore.yml).
>>
>> BlobSignatureTTL: 96h0m0s
>> BlobSigningKeyFile: /etc/arvados/keepstore/blob-signing.key
>> Debug: false
>> EnableDelete: true
>> Listen: :25107
>> LogFormat: text
>> MaxBuffers: 100
>> MaxRequests: 0
>> PIDFile: ""
>> RequireSignatures: false
>> SystemAuthTokenFile: /etc/arvados/keepstore/system-auth.key
>> TrashCheckInterval: 24h0m0s
>> TrashLifetime: 96h0m0s
>> Volumes:
>> - DirectoryReplication: 0
>>   ReadOnly: false
>>   Root: /mnt/keep
>>   Serialize: true
>>   Type: Directory
>>
>> Also I have checked the socket connections on the system hosting the
>> application and it is directly connecting to the keepstore server.
>>
>> Are there any other items to look at in order to improve performance?
>>
>> For references, here are snippets from our push application.  The
>> following are the lines associated with creating the CollectionWriter.
>>
>> self.arv = arvados.api(token=arv_token, host=arvados_api_host)
>> self.writer = CollectionWriter(self.arv, replication=replication)
>>
>> The following are the lines on how we push the data.  The fileinfo object
>> is a custom class that has the path and filename for the file fetched from
>> Basespace.  We are fetching the file from Basespace and saving to a temp
>> directory in case there are issues during the download.  I have checked and
>> the download speed is > 10 MB/s.
>>
>> with open(fileinfo.path, 'rb') as filein, self.writer.open('./raw_data/'
>> + fileinfo.filename) as col_file:
>>     logging.info("Adding file {0} to Arvados collection".format(
>> fileinfo.filename))
>>     for data in filein.read():
>>           col_file.write(data)
>>           fileinfo.byte_count += len(data)
>>
>>     col_file.close()
>>     filein.close()
>>
>> Any help would be greatly appreciated!
>>
>> George Chlipala, Ph.D.
>> Senior Research Specialist
>> Research Resources Center
>> University of Illinois at Chicago
>>
>> phone: 312-413-1700 <(312)%20413-1700>
>> email: gchlip2 at uic.edu
>>
>> _______________________________________________
>> arvados mailing list
>> arvados at arvados.org
>> http://lists.arvados.org/mailman/listinfo/arvados
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.arvados.org/pipermail/arvados/attachments/20170804/aaeb8eb5/attachment-0001.html>


More information about the arvados mailing list