[ARVADOS] created: df05c25bca0a058aa7840c6ff1e25b0917a21a6f

git at public.curoverse.com git at public.curoverse.com
Mon Mar 16 10:10:02 EDT 2015


        at  df05c25bca0a058aa7840c6ff1e25b0917a21a6f (commit)


commit df05c25bca0a058aa7840c6ff1e25b0917a21a6f
Author: Brett Smith <brett at curoverse.com>
Date:   Mon Mar 16 10:09:57 2015 -0400

    5319: Improve performance of Collection manifest fixing migration.
    
    * Use PostgreSQL's native regular expression search to limit the
      number of records we pull through ActiveRecord.
    * Use a smaller batch size to avoid pulling pathological batches of
      records that cause swapping.

diff --git a/services/api/db/migrate/20150303210106_fix_collection_portable_data_hash_with_hinted_manifest.rb b/services/api/db/migrate/20150303210106_fix_collection_portable_data_hash_with_hinted_manifest.rb
index 7f65450..d983e7b 100644
--- a/services/api/db/migrate/20150303210106_fix_collection_portable_data_hash_with_hinted_manifest.rb
+++ b/services/api/db/migrate/20150303210106_fix_collection_portable_data_hash_with_hinted_manifest.rb
@@ -55,8 +55,13 @@ class FixCollectionPortableDataHashWithHintedManifest < ActiveRecord::Migration
   end
 
   def each_bad_collection
-    Collection.find_each do |coll|
-      next unless (coll.manifest_text =~ /\+[A-Z]/)
+    # It's important to make sure that this line doesn't swap.  The
+    # worst case scenario is that it finds a batch of collections that
+    # all have maximum size manifests (64MiB).  With a batch size of
+    # 50, that's about 3GiB.  Figure it will end up being 4GiB after
+    # other ActiveRecord overhead.  That's a size we're comfortable with.
+    Collection.where("manifest_text ~ '\\+[A-Z]'").
+        find_each(batch_size: 50) do |coll|
       stripped_manifest = coll.manifest_text.
         gsub(/( [0-9a-f]{32}(\+\d+)?)(\+\S+)/, '\1')
       stripped_pdh = sprintf("%s+%i",

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list