[ARVADOS] created: df05c25bca0a058aa7840c6ff1e25b0917a21a6f
git at public.curoverse.com
git at public.curoverse.com
Mon Mar 16 10:10:02 EDT 2015
at df05c25bca0a058aa7840c6ff1e25b0917a21a6f (commit)
commit df05c25bca0a058aa7840c6ff1e25b0917a21a6f
Author: Brett Smith <brett at curoverse.com>
Date: Mon Mar 16 10:09:57 2015 -0400
5319: Improve performance of Collection manifest fixing migration.
* Use PostgreSQL's native regular expression search to limit the
number of records we pull through ActiveRecord.
* Use a smaller batch size to avoid pulling pathological batches of
records that cause swapping.
diff --git a/services/api/db/migrate/20150303210106_fix_collection_portable_data_hash_with_hinted_manifest.rb b/services/api/db/migrate/20150303210106_fix_collection_portable_data_hash_with_hinted_manifest.rb
index 7f65450..d983e7b 100644
--- a/services/api/db/migrate/20150303210106_fix_collection_portable_data_hash_with_hinted_manifest.rb
+++ b/services/api/db/migrate/20150303210106_fix_collection_portable_data_hash_with_hinted_manifest.rb
@@ -55,8 +55,13 @@ class FixCollectionPortableDataHashWithHintedManifest < ActiveRecord::Migration
end
def each_bad_collection
- Collection.find_each do |coll|
- next unless (coll.manifest_text =~ /\+[A-Z]/)
+ # It's important to make sure that this line doesn't swap. The
+ # worst case scenario is that it finds a batch of collections that
+ # all have maximum size manifests (64MiB). With a batch size of
+ # 50, that's about 3GiB. Figure it will end up being 4GiB after
+ # other ActiveRecord overhead. That's a size we're comfortable with.
+ Collection.where("manifest_text ~ '\\+[A-Z]'").
+ find_each(batch_size: 50) do |coll|
stripped_manifest = coll.manifest_text.
gsub(/( [0-9a-f]{32}(\+\d+)?)(\+\S+)/, '\1')
stripped_pdh = sprintf("%s+%i",
-----------------------------------------------------------------------
hooks/post-receive
--
More information about the arvados-commits
mailing list