[ARVADOS] updated: 1.3.0-991-gd56b21609

Git user git at public.curoverse.com
Mon Jun 3 21:00:32 UTC 2019


Summary of changes:
 doc/admin/upgrading.html.textile.liquid            |  8 +-
 services/api/app/models/user.rb                    |  2 +-
 .../20190322174136_add_file_info_to_collection.rb  | 52 ++----------
 .../populate-file-info-columns-in-collections.rb   | 97 ++++++++++++++++++++++
 4 files changed, 110 insertions(+), 49 deletions(-)
 mode change 100755 => 100644 services/api/db/migrate/20190322174136_add_file_info_to_collection.rb
 create mode 100755 services/api/script/populate-file-info-columns-in-collections.rb

       via  d56b21609ddd2e2e096f5c30e991d24aa213f7f4 (commit)
       via  1a18a29729f91dd23f89387f5277c91607376fd9 (commit)
       via  65c515131a8243fd30c32a609b6b37a1fb4d8fc2 (commit)
       via  7f68130d098483c5169add6f0454e62fa2d7befa (commit)
       via  6f17bfcae56d0f6032e1cf4087ba7c2e7b092424 (commit)
      from  c792e4991e1d77620d61efaa2600a93d75227f06 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.


commit d56b21609ddd2e2e096f5c30e991d24aa213f7f4
Merge: c792e4991 1a18a2972
Author: Ward Vandewege <wvandewege at veritasgenetics.com>
Date:   Mon Jun 3 16:51:19 2019 -0400

    Merge branch '15286-fixes'
    
    Fixes from the 1.4 branch.
    
    refs #15286
    
    Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <wvandewege at veritasgenetics.com>


commit 1a18a29729f91dd23f89387f5277c91607376fd9
Author: Ward Vandewege <wvandewege at veritasgenetics.com>
Date:   Mon Jun 3 15:14:37 2019 -0400

    Do not blow up when Rails.configuration.Users.UserProfileNotificationAddress is
    set to the empty string, which is the default since #13996 (it defaulted to a
    dummy e-mail adress before).
    
    refs #15286
    refs #13996
    
    Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <wvandewege at veritasgenetics.com>

diff --git a/services/api/app/models/user.rb b/services/api/app/models/user.rb
index 989a97592..fc5ae0a49 100644
--- a/services/api/app/models/user.rb
+++ b/services/api/app/models/user.rb
@@ -580,7 +580,7 @@ class User < ArvadosModel
     if self.prefs_changed?
       if self.prefs_was.andand.empty? || !self.prefs_was.andand['profile']
         profile_notification_address = Rails.configuration.Users.UserProfileNotificationAddress
-        ProfileNotifier.profile_created(self, profile_notification_address).deliver_now if profile_notification_address
+        ProfileNotifier.profile_created(self, profile_notification_address).deliver_now if profile_notification_address and !profile_notification_address.empty?
       end
     end
   end

commit 65c515131a8243fd30c32a609b6b37a1fb4d8fc2
Author: Ward Vandewege <wvandewege at veritasgenetics.com>
Date:   Fri May 31 15:42:37 2019 -0400

    Update the 'upgrading' documentation to reflect the v1.4.0 release, and warn
    about the db migration that can take some time during upgrade.
    
    refs #15093
    
    Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <wvandewege at veritasgenetics.com>

diff --git a/doc/admin/upgrading.html.textile.liquid b/doc/admin/upgrading.html.textile.liquid
index def8bed79..053acb220 100644
--- a/doc/admin/upgrading.html.textile.liquid
+++ b/doc/admin/upgrading.html.textile.liquid
@@ -30,11 +30,13 @@ Note to developers: Add new items at the top. Include the date, issue number, co
 TODO: extract this information based on git commit messages and generate changelogs / release notes automatically.
 {% endcomment %}
 
-h3. current master branch
+h3. v1.4.0 (2019-05-31)
 
 h4. Populating the new file_count and file_size_total columns on the collections table
 
-As part of story "#14484":https://dev.arvados.org/issues/14484, two new columns were added to the collections table in a database migration. These columns are initialized with a zero value. In order to populate them, it is necessary to run a script called <code class="userinput">populate-file-info-columns-in-collections.rb</code> from the scripts directory of the API server. This can be done out of band, ideally directly after the API server has been upgraded to v1.4.0.
+As part of story "#14484":https://dev.arvados.org/issues/14484, two new columns were added to the collections table in a database migration. If your installation has a large collections table, this migration may take some time. We've seen it take ~5 minutes on an installation with 250k collections, but your mileage may vary.
+
+The new columns are initialized with a zero value. In order to populate them, it is necessary to run a script called <code class="userinput">populate-file-info-columns-in-collections.rb</code> from the scripts directory of the API server. This can be done out of band, ideally directly after the API server has been upgraded to v1.4.0.
 
 h4. Stricter collection manifest validation on the API server
 

commit 7f68130d098483c5169add6f0454e62fa2d7befa
Author: Ward Vandewege <wvandewege at veritasgenetics.com>
Date:   Fri May 31 15:27:38 2019 -0400

    Address review comments.
    
    refs #15093
    
    Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <wvandewege at veritasgenetics.com>

diff --git a/services/api/script/populate-file-info-columns-in-collections.rb b/services/api/script/populate-file-info-columns-in-collections.rb
index b0bc5a21a..f7cb024b2 100755
--- a/services/api/script/populate-file-info-columns-in-collections.rb
+++ b/services/api/script/populate-file-info-columns-in-collections.rb
@@ -70,7 +70,7 @@ require "group_pdhs"
 def main
 
   distinct_pdh_count = ActiveRecord::Base.connection.exec_query(
-    "SELECT DISTINCT portable_data_hash FROM collections"
+    "SELECT DISTINCT portable_data_hash FROM collections where file_count=0"
   ).rows.count
 
   # Generator that queries for all the distinct pdhs greater than last_pdh

commit 6f17bfcae56d0f6032e1cf4087ba7c2e7b092424
Author: Ward Vandewege <wvandewege at veritasgenetics.com>
Date:   Fri May 31 14:51:50 2019 -0400

    Move the population of the new columns on the collections table to a standalone
    script that should be run separate from the migration. Add a note to the
    upgrade documentation along those lines. Make the script not blow up on
    collections with invalid manifests, but rather just skip them.
    
    refs #15093
    refs #14484
    
    Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <wvandewege at veritasgenetics.com>

diff --git a/doc/admin/upgrading.html.textile.liquid b/doc/admin/upgrading.html.textile.liquid
index 09bef2a62..def8bed79 100644
--- a/doc/admin/upgrading.html.textile.liquid
+++ b/doc/admin/upgrading.html.textile.liquid
@@ -32,6 +32,10 @@ TODO: extract this information based on git commit messages and generate changel
 
 h3. current master branch
 
+h4. Populating the new file_count and file_size_total columns on the collections table
+
+As part of story "#14484":https://dev.arvados.org/issues/14484, two new columns were added to the collections table in a database migration. These columns are initialized with a zero value. In order to populate them, it is necessary to run a script called <code class="userinput">populate-file-info-columns-in-collections.rb</code> from the scripts directory of the API server. This can be done out of band, ideally directly after the API server has been upgraded to v1.4.0.
+
 h4. Stricter collection manifest validation on the API server
 
 As a consequence of "#14482":https://dev.arvados.org/issues/14482, the Ruby SDK does a more rigorous collection manifest validation. Collections created after 2015-05 are unlikely to be invalid, however you may check for invalid manifests using the script below.
diff --git a/services/api/db/migrate/20190322174136_add_file_info_to_collection.rb b/services/api/db/migrate/20190322174136_add_file_info_to_collection.rb
old mode 100755
new mode 100644
index 61f9b2d88..c0cd40d28
--- a/services/api/db/migrate/20190322174136_add_file_info_to_collection.rb
+++ b/services/api/db/migrate/20190322174136_add_file_info_to_collection.rb
@@ -2,58 +2,16 @@
 #
 # SPDX-License-Identifier: AGPL-3.0
 
-require "arvados/keep"
-require "group_pdhs"
-
 class AddFileInfoToCollection < ActiveRecord::Migration[4.2]
-  def do_batch(pdhs)
-    pdhs_str = ''
-    pdhs.each do |pdh|
-      pdhs_str << "'" << pdh << "'" << ","
-    end
-
-    collections = ActiveRecord::Base.connection.exec_query(
-      "SELECT DISTINCT portable_data_hash, manifest_text FROM collections "\
-      "WHERE portable_data_hash IN (#{pdhs_str[0..-2]}) "
-    )
-
-    collections.rows.each do |row|
-      manifest = Keep::Manifest.new(row[1])
-      ActiveRecord::Base.connection.exec_query("BEGIN")
-      ActiveRecord::Base.connection.exec_query("UPDATE collections SET file_count=#{manifest.files_count}, "\
-                                               "file_size_total=#{manifest.files_size} "\
-                                               "WHERE portable_data_hash='#{row[0]}'")
-      ActiveRecord::Base.connection.exec_query("COMMIT")
-    end
-  end
-
   def up
     add_column :collections, :file_count, :integer, default: 0, null: false
     add_column :collections, :file_size_total, :integer, limit: 8, default: 0, null: false
 
-    distinct_pdh_count = ActiveRecord::Base.connection.exec_query(
-      "SELECT DISTINCT portable_data_hash FROM collections"
-    ).rows.count
-
-    # Generator that queries for all the distinct pdhs greater than last_pdh
-    ordered_pdh_query = lambda { |last_pdh, &block|
-      pdhs = ActiveRecord::Base.connection.exec_query(
-        "SELECT DISTINCT portable_data_hash FROM collections "\
-        "WHERE portable_data_hash > '#{last_pdh}' "\
-        "ORDER BY portable_data_hash LIMIT 1000"
-      )
-      pdhs.rows.each do |row|
-        block.call(row[0])
-      end
-    }
-
-    batch_size_max = 1 << 28 # 256 MiB
-    GroupPdhs.group_pdhs_for_multiple_transactions(ordered_pdh_query,
-                                                   distinct_pdh_count,
-                                                   batch_size_max,
-                                                   "AddFileInfoToCollection") do |pdhs|
-      do_batch(pdhs)
-    end
+    puts "Collections now have two new columns, file_count and file_size_total."
+    puts "They were initialized with a zero value. If you are upgrading an Arvados"
+    puts "installation, please run the populate-file-info-columns-in-collections.rb"
+    puts "script to populate the columns. If this is a new installation, that is not"
+    puts "necessary."
   end
 
   def down
diff --git a/services/api/script/populate-file-info-columns-in-collections.rb b/services/api/script/populate-file-info-columns-in-collections.rb
new file mode 100755
index 000000000..b0bc5a21a
--- /dev/null
+++ b/services/api/script/populate-file-info-columns-in-collections.rb
@@ -0,0 +1,97 @@
+#!/usr/bin/env ruby
+# Copyright (C) The Arvados Authors. All rights reserved.
+#
+# SPDX-License-Identifier: AGPL-3.0
+
+# Arvados version 1.4.0 introduces two new columns on the collections table named
+#   file_count
+#   file_size_total
+#
+# The database migration that adds these columns does not populate them with data,
+# it initializes them set to zero.
+#
+# This script will populate the columns, if file_count is zero. It will ignore
+# collections that have invalid manifests, but it will spit out details for those
+# collections.
+#
+# Run the script as
+#
+# cd scripts
+# RAILS_ENV=production bundle exec populate-file-info-columns-in-collections.rb
+#
+
+ENV["RAILS_ENV"] = ARGV[0] || ENV["RAILS_ENV"] || "development"
+require File.dirname(__FILE__) + '/../config/boot'
+require File.dirname(__FILE__) + '/../config/environment'
+
+require "arvados/keep"
+require "group_pdhs"
+
+  def do_batch(pdhs)
+    pdhs_str = ''
+    pdhs.each do |pdh|
+      pdhs_str << "'" << pdh << "'" << ","
+    end
+
+    collections = ActiveRecord::Base.connection.exec_query(
+      "SELECT DISTINCT portable_data_hash, manifest_text FROM collections "\
+      "WHERE portable_data_hash IN (#{pdhs_str[0..-2]}) "
+    )
+    collections.rows.each do |row|
+      begin
+        manifest = Keep::Manifest.new(row[1])
+        ActiveRecord::Base.connection.exec_query("BEGIN")
+        ActiveRecord::Base.connection.exec_query("UPDATE collections SET file_count=#{manifest.files_count}, "\
+                                                 "file_size_total=#{manifest.files_size} "\
+                                                 "WHERE portable_data_hash='#{row[0]}'")
+        ActiveRecord::Base.connection.exec_query("COMMIT")
+      rescue ArgumentError => detail
+        require 'pp'
+        puts
+        puts "*************** Row detail ***************"
+        puts
+        pp row
+        puts
+        puts "************ Collection detail ***********"
+        puts
+        pp Collection.find_by_portable_data_hash(row[0])
+        puts
+        puts "************** Error detail **************"
+        puts
+        pp detail
+        puts
+        puts "Skipping this collection, continuing!"
+        next
+      end
+    end
+  end
+
+
+def main
+
+  distinct_pdh_count = ActiveRecord::Base.connection.exec_query(
+    "SELECT DISTINCT portable_data_hash FROM collections"
+  ).rows.count
+
+  # Generator that queries for all the distinct pdhs greater than last_pdh
+  ordered_pdh_query = lambda { |last_pdh, &block|
+    pdhs = ActiveRecord::Base.connection.exec_query(
+      "SELECT DISTINCT portable_data_hash FROM collections "\
+      "WHERE file_count=0 and portable_data_hash > '#{last_pdh}' "\
+      "ORDER BY portable_data_hash LIMIT 1000"
+    )
+    pdhs.rows.each do |row|
+      block.call(row[0])
+    end
+  }
+
+  batch_size_max = 1 << 28 # 256 MiB
+  GroupPdhs.group_pdhs_for_multiple_transactions(ordered_pdh_query,
+                                                 distinct_pdh_count,
+                                                 batch_size_max,
+                                                 "AddFileInfoToCollection") do |pdhs|
+    do_batch(pdhs)
+  end
+end
+
+main

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list