[ARVADOS] updated: 1.3.0-991-gd56b21609
Git user
git at public.curoverse.com
Mon Jun 3 21:00:32 UTC 2019
Summary of changes:
doc/admin/upgrading.html.textile.liquid | 8 +-
services/api/app/models/user.rb | 2 +-
.../20190322174136_add_file_info_to_collection.rb | 52 ++----------
.../populate-file-info-columns-in-collections.rb | 97 ++++++++++++++++++++++
4 files changed, 110 insertions(+), 49 deletions(-)
mode change 100755 => 100644 services/api/db/migrate/20190322174136_add_file_info_to_collection.rb
create mode 100755 services/api/script/populate-file-info-columns-in-collections.rb
via d56b21609ddd2e2e096f5c30e991d24aa213f7f4 (commit)
via 1a18a29729f91dd23f89387f5277c91607376fd9 (commit)
via 65c515131a8243fd30c32a609b6b37a1fb4d8fc2 (commit)
via 7f68130d098483c5169add6f0454e62fa2d7befa (commit)
via 6f17bfcae56d0f6032e1cf4087ba7c2e7b092424 (commit)
from c792e4991e1d77620d61efaa2600a93d75227f06 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
commit d56b21609ddd2e2e096f5c30e991d24aa213f7f4
Merge: c792e4991 1a18a2972
Author: Ward Vandewege <wvandewege at veritasgenetics.com>
Date: Mon Jun 3 16:51:19 2019 -0400
Merge branch '15286-fixes'
Fixes from the 1.4 branch.
refs #15286
Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <wvandewege at veritasgenetics.com>
commit 1a18a29729f91dd23f89387f5277c91607376fd9
Author: Ward Vandewege <wvandewege at veritasgenetics.com>
Date: Mon Jun 3 15:14:37 2019 -0400
Do not blow up when Rails.configuration.Users.UserProfileNotificationAddress is
set to the empty string, which is the default since #13996 (it defaulted to a
dummy e-mail adress before).
refs #15286
refs #13996
Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <wvandewege at veritasgenetics.com>
diff --git a/services/api/app/models/user.rb b/services/api/app/models/user.rb
index 989a97592..fc5ae0a49 100644
--- a/services/api/app/models/user.rb
+++ b/services/api/app/models/user.rb
@@ -580,7 +580,7 @@ class User < ArvadosModel
if self.prefs_changed?
if self.prefs_was.andand.empty? || !self.prefs_was.andand['profile']
profile_notification_address = Rails.configuration.Users.UserProfileNotificationAddress
- ProfileNotifier.profile_created(self, profile_notification_address).deliver_now if profile_notification_address
+ ProfileNotifier.profile_created(self, profile_notification_address).deliver_now if profile_notification_address and !profile_notification_address.empty?
end
end
end
commit 65c515131a8243fd30c32a609b6b37a1fb4d8fc2
Author: Ward Vandewege <wvandewege at veritasgenetics.com>
Date: Fri May 31 15:42:37 2019 -0400
Update the 'upgrading' documentation to reflect the v1.4.0 release, and warn
about the db migration that can take some time during upgrade.
refs #15093
Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <wvandewege at veritasgenetics.com>
diff --git a/doc/admin/upgrading.html.textile.liquid b/doc/admin/upgrading.html.textile.liquid
index def8bed79..053acb220 100644
--- a/doc/admin/upgrading.html.textile.liquid
+++ b/doc/admin/upgrading.html.textile.liquid
@@ -30,11 +30,13 @@ Note to developers: Add new items at the top. Include the date, issue number, co
TODO: extract this information based on git commit messages and generate changelogs / release notes automatically.
{% endcomment %}
-h3. current master branch
+h3. v1.4.0 (2019-05-31)
h4. Populating the new file_count and file_size_total columns on the collections table
-As part of story "#14484":https://dev.arvados.org/issues/14484, two new columns were added to the collections table in a database migration. These columns are initialized with a zero value. In order to populate them, it is necessary to run a script called <code class="userinput">populate-file-info-columns-in-collections.rb</code> from the scripts directory of the API server. This can be done out of band, ideally directly after the API server has been upgraded to v1.4.0.
+As part of story "#14484":https://dev.arvados.org/issues/14484, two new columns were added to the collections table in a database migration. If your installation has a large collections table, this migration may take some time. We've seen it take ~5 minutes on an installation with 250k collections, but your mileage may vary.
+
+The new columns are initialized with a zero value. In order to populate them, it is necessary to run a script called <code class="userinput">populate-file-info-columns-in-collections.rb</code> from the scripts directory of the API server. This can be done out of band, ideally directly after the API server has been upgraded to v1.4.0.
h4. Stricter collection manifest validation on the API server
commit 7f68130d098483c5169add6f0454e62fa2d7befa
Author: Ward Vandewege <wvandewege at veritasgenetics.com>
Date: Fri May 31 15:27:38 2019 -0400
Address review comments.
refs #15093
Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <wvandewege at veritasgenetics.com>
diff --git a/services/api/script/populate-file-info-columns-in-collections.rb b/services/api/script/populate-file-info-columns-in-collections.rb
index b0bc5a21a..f7cb024b2 100755
--- a/services/api/script/populate-file-info-columns-in-collections.rb
+++ b/services/api/script/populate-file-info-columns-in-collections.rb
@@ -70,7 +70,7 @@ require "group_pdhs"
def main
distinct_pdh_count = ActiveRecord::Base.connection.exec_query(
- "SELECT DISTINCT portable_data_hash FROM collections"
+ "SELECT DISTINCT portable_data_hash FROM collections where file_count=0"
).rows.count
# Generator that queries for all the distinct pdhs greater than last_pdh
commit 6f17bfcae56d0f6032e1cf4087ba7c2e7b092424
Author: Ward Vandewege <wvandewege at veritasgenetics.com>
Date: Fri May 31 14:51:50 2019 -0400
Move the population of the new columns on the collections table to a standalone
script that should be run separate from the migration. Add a note to the
upgrade documentation along those lines. Make the script not blow up on
collections with invalid manifests, but rather just skip them.
refs #15093
refs #14484
Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <wvandewege at veritasgenetics.com>
diff --git a/doc/admin/upgrading.html.textile.liquid b/doc/admin/upgrading.html.textile.liquid
index 09bef2a62..def8bed79 100644
--- a/doc/admin/upgrading.html.textile.liquid
+++ b/doc/admin/upgrading.html.textile.liquid
@@ -32,6 +32,10 @@ TODO: extract this information based on git commit messages and generate changel
h3. current master branch
+h4. Populating the new file_count and file_size_total columns on the collections table
+
+As part of story "#14484":https://dev.arvados.org/issues/14484, two new columns were added to the collections table in a database migration. These columns are initialized with a zero value. In order to populate them, it is necessary to run a script called <code class="userinput">populate-file-info-columns-in-collections.rb</code> from the scripts directory of the API server. This can be done out of band, ideally directly after the API server has been upgraded to v1.4.0.
+
h4. Stricter collection manifest validation on the API server
As a consequence of "#14482":https://dev.arvados.org/issues/14482, the Ruby SDK does a more rigorous collection manifest validation. Collections created after 2015-05 are unlikely to be invalid, however you may check for invalid manifests using the script below.
diff --git a/services/api/db/migrate/20190322174136_add_file_info_to_collection.rb b/services/api/db/migrate/20190322174136_add_file_info_to_collection.rb
old mode 100755
new mode 100644
index 61f9b2d88..c0cd40d28
--- a/services/api/db/migrate/20190322174136_add_file_info_to_collection.rb
+++ b/services/api/db/migrate/20190322174136_add_file_info_to_collection.rb
@@ -2,58 +2,16 @@
#
# SPDX-License-Identifier: AGPL-3.0
-require "arvados/keep"
-require "group_pdhs"
-
class AddFileInfoToCollection < ActiveRecord::Migration[4.2]
- def do_batch(pdhs)
- pdhs_str = ''
- pdhs.each do |pdh|
- pdhs_str << "'" << pdh << "'" << ","
- end
-
- collections = ActiveRecord::Base.connection.exec_query(
- "SELECT DISTINCT portable_data_hash, manifest_text FROM collections "\
- "WHERE portable_data_hash IN (#{pdhs_str[0..-2]}) "
- )
-
- collections.rows.each do |row|
- manifest = Keep::Manifest.new(row[1])
- ActiveRecord::Base.connection.exec_query("BEGIN")
- ActiveRecord::Base.connection.exec_query("UPDATE collections SET file_count=#{manifest.files_count}, "\
- "file_size_total=#{manifest.files_size} "\
- "WHERE portable_data_hash='#{row[0]}'")
- ActiveRecord::Base.connection.exec_query("COMMIT")
- end
- end
-
def up
add_column :collections, :file_count, :integer, default: 0, null: false
add_column :collections, :file_size_total, :integer, limit: 8, default: 0, null: false
- distinct_pdh_count = ActiveRecord::Base.connection.exec_query(
- "SELECT DISTINCT portable_data_hash FROM collections"
- ).rows.count
-
- # Generator that queries for all the distinct pdhs greater than last_pdh
- ordered_pdh_query = lambda { |last_pdh, &block|
- pdhs = ActiveRecord::Base.connection.exec_query(
- "SELECT DISTINCT portable_data_hash FROM collections "\
- "WHERE portable_data_hash > '#{last_pdh}' "\
- "ORDER BY portable_data_hash LIMIT 1000"
- )
- pdhs.rows.each do |row|
- block.call(row[0])
- end
- }
-
- batch_size_max = 1 << 28 # 256 MiB
- GroupPdhs.group_pdhs_for_multiple_transactions(ordered_pdh_query,
- distinct_pdh_count,
- batch_size_max,
- "AddFileInfoToCollection") do |pdhs|
- do_batch(pdhs)
- end
+ puts "Collections now have two new columns, file_count and file_size_total."
+ puts "They were initialized with a zero value. If you are upgrading an Arvados"
+ puts "installation, please run the populate-file-info-columns-in-collections.rb"
+ puts "script to populate the columns. If this is a new installation, that is not"
+ puts "necessary."
end
def down
diff --git a/services/api/script/populate-file-info-columns-in-collections.rb b/services/api/script/populate-file-info-columns-in-collections.rb
new file mode 100755
index 000000000..b0bc5a21a
--- /dev/null
+++ b/services/api/script/populate-file-info-columns-in-collections.rb
@@ -0,0 +1,97 @@
+#!/usr/bin/env ruby
+# Copyright (C) The Arvados Authors. All rights reserved.
+#
+# SPDX-License-Identifier: AGPL-3.0
+
+# Arvados version 1.4.0 introduces two new columns on the collections table named
+# file_count
+# file_size_total
+#
+# The database migration that adds these columns does not populate them with data,
+# it initializes them set to zero.
+#
+# This script will populate the columns, if file_count is zero. It will ignore
+# collections that have invalid manifests, but it will spit out details for those
+# collections.
+#
+# Run the script as
+#
+# cd scripts
+# RAILS_ENV=production bundle exec populate-file-info-columns-in-collections.rb
+#
+
+ENV["RAILS_ENV"] = ARGV[0] || ENV["RAILS_ENV"] || "development"
+require File.dirname(__FILE__) + '/../config/boot'
+require File.dirname(__FILE__) + '/../config/environment'
+
+require "arvados/keep"
+require "group_pdhs"
+
+ def do_batch(pdhs)
+ pdhs_str = ''
+ pdhs.each do |pdh|
+ pdhs_str << "'" << pdh << "'" << ","
+ end
+
+ collections = ActiveRecord::Base.connection.exec_query(
+ "SELECT DISTINCT portable_data_hash, manifest_text FROM collections "\
+ "WHERE portable_data_hash IN (#{pdhs_str[0..-2]}) "
+ )
+ collections.rows.each do |row|
+ begin
+ manifest = Keep::Manifest.new(row[1])
+ ActiveRecord::Base.connection.exec_query("BEGIN")
+ ActiveRecord::Base.connection.exec_query("UPDATE collections SET file_count=#{manifest.files_count}, "\
+ "file_size_total=#{manifest.files_size} "\
+ "WHERE portable_data_hash='#{row[0]}'")
+ ActiveRecord::Base.connection.exec_query("COMMIT")
+ rescue ArgumentError => detail
+ require 'pp'
+ puts
+ puts "*************** Row detail ***************"
+ puts
+ pp row
+ puts
+ puts "************ Collection detail ***********"
+ puts
+ pp Collection.find_by_portable_data_hash(row[0])
+ puts
+ puts "************** Error detail **************"
+ puts
+ pp detail
+ puts
+ puts "Skipping this collection, continuing!"
+ next
+ end
+ end
+ end
+
+
+def main
+
+ distinct_pdh_count = ActiveRecord::Base.connection.exec_query(
+ "SELECT DISTINCT portable_data_hash FROM collections"
+ ).rows.count
+
+ # Generator that queries for all the distinct pdhs greater than last_pdh
+ ordered_pdh_query = lambda { |last_pdh, &block|
+ pdhs = ActiveRecord::Base.connection.exec_query(
+ "SELECT DISTINCT portable_data_hash FROM collections "\
+ "WHERE file_count=0 and portable_data_hash > '#{last_pdh}' "\
+ "ORDER BY portable_data_hash LIMIT 1000"
+ )
+ pdhs.rows.each do |row|
+ block.call(row[0])
+ end
+ }
+
+ batch_size_max = 1 << 28 # 256 MiB
+ GroupPdhs.group_pdhs_for_multiple_transactions(ordered_pdh_query,
+ distinct_pdh_count,
+ batch_size_max,
+ "AddFileInfoToCollection") do |pdhs|
+ do_batch(pdhs)
+ end
+end
+
+main
-----------------------------------------------------------------------
hooks/post-receive
--
More information about the arvados-commits
mailing list