[ARVADOS] created: d7f8e442bb91243484c75e2e1278293f0a476068

git at public.curoverse.com git at public.curoverse.com
Tue Dec 2 10:59:36 EST 2014


        at  d7f8e442bb91243484c75e2e1278293f0a476068 (commit)


commit d7f8e442bb91243484c75e2e1278293f0a476068
Author: Brett Smith <brett at curoverse.com>
Date:   Tue Dec 2 10:23:21 2014 -0500

    4591: Websockets server fetches fewer logs at a time.
    
    Most of the out of memory errors we're seeing happen in the PostgreSQL
    driver, which runs out of space to store results.  Because Log records
    are relatively large (holding two other records as JSON text),
    fetching fewer in a batch should noticeably improve memory use.  I
    don't expect this to end the crashing, though—it seems like the
    Websockets server grows large for a variety of reasons.  Hopefully
    this change will help make some of the others clearer.

diff --git a/services/api/lib/eventbus.rb b/services/api/lib/eventbus.rb
index 96bc866..d1809e5 100644
--- a/services/api/lib/eventbus.rb
+++ b/services/api/lib/eventbus.rb
@@ -112,7 +112,7 @@ class EventBus
 
         # Execute query and actually send the matching log rows
         count = 0
-        limit = 100
+        limit = 20
 
         logs.limit(limit).each do |l|
           ws.send(l.as_api_response.to_json)

commit 3bbde3e33fea2fadd0b86abef35bdaa4400d9883
Author: Brett Smith <brett at curoverse.com>
Date:   Tue Dec 2 09:59:55 2014 -0500

    4591: Avoid capturing critical exceptions in Websockets server.
    
    Based on the current logs, the troubles we're currently hitting in
    Websockets happen in push_events, where all the database work
    happens.  These exceptions wrap PostgreSQL driver errors; they inherit
    from StandardError, so they're being caught by the rescue block.
    This commit re-raises those exceptions, which will cause the server to
    crash (and presumably be restarted by a supervisor like runit).
    
    We do sometimes see NoMemoryError, but the block to catch is in
    ineffective because it usually manifests earlier in on_connect, when
    the connection is first made.  In this case, Ruby's default exception
    handling provides the behavior we want, so just remove the block.
    
    In keeping with the theme of improved exception handling, I tightened
    up the bad request detection.

diff --git a/services/api/lib/eventbus.rb b/services/api/lib/eventbus.rb
index 1754fc0..96bc866 100644
--- a/services/api/lib/eventbus.rb
+++ b/services/api/lib/eventbus.rb
@@ -141,14 +141,24 @@ class EventBus
       Rails.logger.warn "Backtrace:\n\t#{e.backtrace.join("\n\t")}"
       ws.send ({status: 500, message: 'error'}.to_json)
       ws.close
+      # These exceptions typically indicate serious server trouble:
+      # out of memory issues, database connection problems, etc.  Go ahead and
+      # crash; we expect that a supervisor service like runit will restart us.
+      raise
     end
   end
 
   # Handle inbound subscribe or unsubscribe message.
   def handle_message ws, event
     begin
-      # Parse event data as JSON
-      p = (Oj.load event.data).symbolize_keys
+      begin
+        # Parse event data as JSON
+        p = (Oj.load event.data).symbolize_keys
+        filter = Filter.new(p)
+      rescue Oj::Error => e
+        ws.send ({status: 400, message: "malformed request"}.to_json)
+        return
+      end
 
       if p[:method] == 'subscribe'
         # Handle subscribe event
@@ -162,7 +172,7 @@ class EventBus
         if ws.filters.length < MAX_FILTERS
           # Add a filter.  This gets the :filters field which is the same
           # format as used for regular index queries.
-          ws.filters << Filter.new(p)
+          ws.filters << filter
           ws.send ({status: 200, message: 'subscribe ok', filter: p}.to_json)
 
           # Send any pending events
@@ -185,8 +195,6 @@ class EventBus
       else
         ws.send ({status: 400, message: "missing or unrecognized method"}.to_json)
       end
-    rescue Oj::Error => e
-      ws.send ({status: 400, message: "malformed request"}.to_json)
     rescue => e
       Rails.logger.warn "Error handling message: #{$!}"
       Rails.logger.warn "Backtrace:\n\t#{e.backtrace.join("\n\t")}"
@@ -252,9 +260,6 @@ class EventBus
                   @channel.push payload.to_i
                 end
               end
-            rescue NoMemoryError
-              EventMachine::stop_event_loop
-              abort "Out of memory"
             ensure
               # Don't want the connection to still be listening once we return
               # it to the pool - could result in weird behavior for the next

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list