[ARVADOS] updated: 1af2d4f71f6a7ba4374f8490ef1b4f0b972e2dec

git at public.curoverse.com git at public.curoverse.com
Tue Dec 2 16:29:22 EST 2014


Summary of changes:
 services/api/lib/eventbus.rb | 27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)

       via  1af2d4f71f6a7ba4374f8490ef1b4f0b972e2dec (commit)
       via  bce68441123f4081e63b3fbefc82b60ee071b457 (commit)
       via  d51224880c6524fd1b47543f3f312c3a000805c1 (commit)
      from  0ecbb08536d02433272e62cf1878edd7927f66e1 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.


commit 1af2d4f71f6a7ba4374f8490ef1b4f0b972e2dec
Merge: 0ecbb08 bce6844
Author: Brett Smith <brett at curoverse.com>
Date:   Tue Dec 2 16:28:49 2014 -0500

    Merge branch '4591-websockets-raise-oom-wip'
    
    Closes #4591, #4701.


commit bce68441123f4081e63b3fbefc82b60ee071b457
Author: Brett Smith <brett at curoverse.com>
Date:   Tue Dec 2 10:23:21 2014 -0500

    4591: Websockets server fetches fewer logs at a time.
    
    Most of the out of memory errors we're seeing happen in the PostgreSQL
    driver, which runs out of space to store results.  Because Log records
    are relatively large (holding two other records as JSON text),
    fetching fewer in a batch should noticeably improve memory use.  I
    don't expect this to end the crashing, though—it seems like the
    Websockets server grows large for a variety of reasons.  Hopefully
    this change will help make some of the others clearer.

diff --git a/services/api/lib/eventbus.rb b/services/api/lib/eventbus.rb
index 3582cb4..35671d6 100644
--- a/services/api/lib/eventbus.rb
+++ b/services/api/lib/eventbus.rb
@@ -116,7 +116,7 @@ class EventBus
 
         # Execute query and actually send the matching log rows
         count = 0
-        limit = 100
+        limit = 20
 
         logs.limit(limit).each do |l|
           ws.send(l.as_api_response.to_json)

commit d51224880c6524fd1b47543f3f312c3a000805c1
Author: Brett Smith <brett at curoverse.com>
Date:   Tue Dec 2 09:59:55 2014 -0500

    4591: Avoid capturing critical exceptions in Websockets server.
    
    Based on the current logs, the troubles we're currently hitting in
    Websockets happen in push_events, where all the database work
    happens.  These exceptions wrap PostgreSQL driver errors; they inherit
    from StandardError, so they're being caught by the rescue block.
    This commit re-raises those exceptions, which will cause the server to
    crash (and presumably be restarted by a supervisor like runit).
    
    We do sometimes see NoMemoryError, but the block to catch is in
    ineffective because it usually manifests earlier in on_connect, when
    the connection is first made.  In this case, Ruby's default exception
    handling provides the behavior we want, so just remove the block.
    
    In keeping with the theme of improved exception handling, I tightened
    up the bad request detection.

diff --git a/services/api/lib/eventbus.rb b/services/api/lib/eventbus.rb
index 1754fc0..3582cb4 100644
--- a/services/api/lib/eventbus.rb
+++ b/services/api/lib/eventbus.rb
@@ -1,3 +1,7 @@
+# If any threads raise an unhandled exception, make them all die.
+# We trust a supervisor like runit to restart the server in this case.
+Thread.abort_on_exception = true
+
 require 'eventmachine'
 require 'oj'
 require 'faye/websocket'
@@ -141,14 +145,24 @@ class EventBus
       Rails.logger.warn "Backtrace:\n\t#{e.backtrace.join("\n\t")}"
       ws.send ({status: 500, message: 'error'}.to_json)
       ws.close
+      # These exceptions typically indicate serious server trouble:
+      # out of memory issues, database connection problems, etc.  Go ahead and
+      # crash; we expect that a supervisor service like runit will restart us.
+      raise
     end
   end
 
   # Handle inbound subscribe or unsubscribe message.
   def handle_message ws, event
     begin
-      # Parse event data as JSON
-      p = (Oj.load event.data).symbolize_keys
+      begin
+        # Parse event data as JSON
+        p = (Oj.load event.data).symbolize_keys
+        filter = Filter.new(p)
+      rescue Oj::Error => e
+        ws.send ({status: 400, message: "malformed request"}.to_json)
+        return
+      end
 
       if p[:method] == 'subscribe'
         # Handle subscribe event
@@ -162,7 +176,7 @@ class EventBus
         if ws.filters.length < MAX_FILTERS
           # Add a filter.  This gets the :filters field which is the same
           # format as used for regular index queries.
-          ws.filters << Filter.new(p)
+          ws.filters << filter
           ws.send ({status: 200, message: 'subscribe ok', filter: p}.to_json)
 
           # Send any pending events
@@ -185,8 +199,6 @@ class EventBus
       else
         ws.send ({status: 400, message: "missing or unrecognized method"}.to_json)
       end
-    rescue Oj::Error => e
-      ws.send ({status: 400, message: "malformed request"}.to_json)
     rescue => e
       Rails.logger.warn "Error handling message: #{$!}"
       Rails.logger.warn "Backtrace:\n\t#{e.backtrace.join("\n\t")}"
@@ -252,9 +264,6 @@ class EventBus
                   @channel.push payload.to_i
                 end
               end
-            rescue NoMemoryError
-              EventMachine::stop_event_loop
-              abort "Out of memory"
             ensure
               # Don't want the connection to still be listening once we return
               # it to the pool - could result in weird behavior for the next

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list