[ARVADOS] updated: 9877c8914fa0bb17fcb9d6f2e30e067f8b135d79

git at public.curoverse.com git at public.curoverse.com
Mon Mar 16 12:42:31 EDT 2015

Summary of changes:
 ...106_fix_collection_portable_data_hash_with_hinted_manifest.rb | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

       via  9877c8914fa0bb17fcb9d6f2e30e067f8b135d79 (commit)
       via  4f3b7339bef1286a34745c7ecd97476f56af469c (commit)
      from  d16e54da7e751807685f576d089d69417c9094b0 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

commit 9877c8914fa0bb17fcb9d6f2e30e067f8b135d79
Merge: d16e54d 4f3b733
Author: Brett Smith <brett at curoverse.com>
Date:   Mon Mar 16 12:41:36 2015 -0400

    Merge branch '5319-collection-pdh-fix-performance-wip'
    Refs #5319.

commit 4f3b7339bef1286a34745c7ecd97476f56af469c
Author: Brett Smith <brett at curoverse.com>
Date:   Mon Mar 16 10:09:57 2015 -0400

    5319: Improve performance of Collection PDH fix migration.
    * Use PostgreSQL's native regular expression search to limit the
      number of records we pull through ActiveRecord.
    * Use a smaller batch size to avoid pulling pathological batches of
      records that cause swapping.

diff --git a/services/api/db/migrate/20150303210106_fix_collection_portable_data_hash_with_hinted_manifest.rb b/services/api/db/migrate/20150303210106_fix_collection_portable_data_hash_with_hinted_manifest.rb
index 7f65450..d983e7b 100644
--- a/services/api/db/migrate/20150303210106_fix_collection_portable_data_hash_with_hinted_manifest.rb
+++ b/services/api/db/migrate/20150303210106_fix_collection_portable_data_hash_with_hinted_manifest.rb
@@ -55,8 +55,13 @@ class FixCollectionPortableDataHashWithHintedManifest < ActiveRecord::Migration
   def each_bad_collection
-    Collection.find_each do |coll|
-      next unless (coll.manifest_text =~ /\+[A-Z]/)
+    # It's important to make sure that this line doesn't swap.  The
+    # worst case scenario is that it finds a batch of collections that
+    # all have maximum size manifests (64MiB).  With a batch size of
+    # 50, that's about 3GiB.  Figure it will end up being 4GiB after
+    # other ActiveRecord overhead.  That's a size we're comfortable with.
+    Collection.where("manifest_text ~ '\\+[A-Z]'").
+        find_each(batch_size: 50) do |coll|
       stripped_manifest = coll.manifest_text.
         gsub(/( [0-9a-f]{32}(\+\d+)?)(\+\S+)/, '\1')
       stripped_pdh = sprintf("%s+%i",



More information about the arvados-commits mailing list