<div dir="ltr">We have an application that we would use to download data from Illumina Basespace directly to our servers. Previously we had been writing directly to disk and average data transfer speeds were > 10 MB/s. We modified the application (a python script) to now push the data into Arvados via a CollectionWriter. Now we are seeing data transfer speeds 200-400 kB/s. Both the Arvados servers and our Basespace application are on the same subnet and connected via 1 Gbps ethernet.<br><br>I have setup keepstore volume to serialize and I have the default buffer setting.<br><div><div><div class="gmail_signature"><br></div><div class="gmail_signature">Here is our keepstore configuration (keepstore.yml).<br><br>BlobSignatureTTL: 96h0m0s<br>BlobSigningKeyFile: /etc/arvados/keepstore/blob-signing.key<br>Debug: false<br>EnableDelete: true<br>Listen: :25107<br>LogFormat: text<br>MaxBuffers: 100<br>MaxRequests: 0<br>PIDFile: ""<br>RequireSignatures: false<br>SystemAuthTokenFile: /etc/arvados/keepstore/system-auth.key<br>TrashCheckInterval: 24h0m0s<br>TrashLifetime: 96h0m0s<br>Volumes:<br>- DirectoryReplication: 0<br> ReadOnly: false<br> Root: /mnt/keep<br> Serialize: true<br> Type: Directory<br><br></div><div class="gmail_signature">Also I have checked the socket connections on the system hosting the application and it is directly connecting to the keepstore server.<br><br></div><div class="gmail_signature">Are there any other items to look at in order to improve performance?<br><br></div><div class="gmail_signature">For references, here are snippets from our push application. The following are the lines associated with creating the CollectionWriter. <br><br>self.arv = arvados.api(token=arv_token, host=arvados_api_host)<br>self.writer = CollectionWriter(self.arv, replication=replication)<br><br></div><div class="gmail_signature">The following are the lines on how we push the data. The fileinfo object is a custom class that has the path and filename for the file fetched from Basespace. We are fetching the file from Basespace and saving to a temp directory in case there are issues during the download. I have checked and the download speed is > 10 MB/s.<br><br>with open(fileinfo.path, 'rb') as filein, self.writer.open('./raw_data/' + fileinfo.filename) as col_file:<br> <a href="http://logging.info">logging.info</a>("Adding file {0} to Arvados collection".format(fileinfo.filename))<br> for data in filein.read():<br> col_file.write(data)<br> fileinfo.byte_count += len(data)<br><br> col_file.close()<br> filein.close()<br><br></div><div class="gmail_signature">Any help would be greatly appreciated!<br></div><div class="gmail_signature"><br>George Chlipala, Ph.D.<br>Senior Research Specialist<br>Research Resources Center<br>University of Illinois at Chicago<br><br>phone: 312-413-1700<br>email: <a href="mailto:gchlip2@uic.edu" target="_blank">gchlip2@uic.edu</a></div></div>
</div></div>