[ARVADOS] updated: 1.1.4-698-g3ae140fa0

Git user git at public.curoverse.com
Tue Jul 24 15:03:14 EDT 2018


Summary of changes:
 doc/admin/health-checks.html.textile.liquid        | 10 +--
 doc/admin/management-token.html.textile.liquid     | 19 ++++--
 doc/admin/metrics.html.textile.liquid              | 74 +++++++++++++++++++++-
 services/nodemanager/tests/fake_azure.cfg.template |  6 +-
 4 files changed, 96 insertions(+), 13 deletions(-)

       via  3ae140fa072b2f2fbc8576c20ffd81fe463e78a5 (commit)
      from  0a3d7a02236cbec448203a1b2218b5e0630d1c00 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.


commit 3ae140fa072b2f2fbc8576c20ffd81fe463e78a5
Author: Peter Amstutz <pamstutz at veritasgenetics.com>
Date:   Tue Jul 24 15:02:52 2018 -0400

    13791: More detail about monitoring
    
    Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <pamstutz at veritasgenetics.com>

diff --git a/doc/admin/health-checks.html.textile.liquid b/doc/admin/health-checks.html.textile.liquid
index 9370c6ce6..630c6a178 100644
--- a/doc/admin/health-checks.html.textile.liquid
+++ b/doc/admin/health-checks.html.textile.liquid
@@ -10,11 +10,11 @@ Copyright (C) The Arvados Authors. All rights reserved.
 SPDX-License-Identifier: CC-BY-SA-3.0
 {% endcomment %}
 
-Health check endpoints are found at @/_health/ping@ on many Arvados services.  The purpose of the health check is to be a simple method of determining if a service can be contacted and if it believes it is functioning properly, suitable for integrating into operational alert systems.
+Health check endpoints are found at @/_health/ping@ on many Arvados services.  The purpose of the health check is to offer a simple method of determining if a service can be reached and allow the service to self-report any problems, suitable for integrating into operational alert systems.
 
-Health check endpoints must be configured with a "management token":management-token.html .
+To access health check endpoints, services must be configured with a "management token":management-token.html .
 
-This endpoint returns a JSON object with the field @health at .  This has a value of either @OK@ or @ERROR at .  On error, it may also include a  field @error@ with additional information.  Examples:
+Health check endpoints return a JSON object with the field @health at .  This has a value of either @OK@ or @ERROR at .  On error, it may also include a  field @error@ with additional information.  Examples:
 
 <pre>
 {
@@ -25,7 +25,7 @@ This endpoint returns a JSON object with the field @health at .  This has a value o
 <pre>
 {
   "health": "ERROR"
-  "error": "Inverted polarity of the warp core"
+  "error": "Inverted polarity in the warp core"
 }
 </pre>
 
@@ -33,7 +33,7 @@ h2. Healthcheck aggregator
 
 The service @arvados-health@ performs health checks on all configured services and returns a single value of @OK@ or @ERROR@ for the entire cluster.  It exposes the endpoint @/_health/all@ .
 
-The healthcheck aggregator uses the "NodeProfile" section of the cluster-wide configuration file.  Here is an example.
+The healthcheck aggregator uses the @NodeProfile@ section of the cluster-wide @arvados.yml@ configuration file.  Here is an example.
 
 <pre>
 Cluster:
diff --git a/doc/admin/management-token.html.textile.liquid b/doc/admin/management-token.html.textile.liquid
index 33027ad88..306314337 100644
--- a/doc/admin/management-token.html.textile.liquid
+++ b/doc/admin/management-token.html.textile.liquid
@@ -18,12 +18,13 @@ To access a monitoring endpoint, the requester must provide the HTTP header @Aut
 
 h2. API server
 
-Set @MangementToken@ in @application.yml@
+Set @MangementToken@ in the appropriate section of @application.yml@
 
 <pre>
+production:
   # Token to be included in all healthcheck requests. Disabled by default.
   # Server expects request header of the format "Authorization: Bearer xxx"
-  ManagementToken: ...
+  ManagementToken: xxx
 </pre>
 
 h2. Node Manager
@@ -32,13 +33,21 @@ Set @port@ (the listen port) and @MangementToken@ in the @Manage@ section of @no
 
 <pre>
 [Manage]
-port=8888
-ManagementToken=...
+# The management server responds to http://addr:port/status.json with
+# a snapshot of internal state.
+
+# Management server listening address (default 127.0.0.1)
+#address = 0.0.0.0
+
+# Management server port number (default -1, server is disabled)
+#port = 8989
+
+ManagementToken = xxx
 </pre>
 
 h2. Other services
 
-The following services also support health check.  Set @MangementToken@ in the respective yaml config file for each service.
+The following services also support monitoring.  Set @MangementToken@ in the respective yaml config file for each service.
 
 * keepstore
 * keep-web
diff --git a/doc/admin/metrics.html.textile.liquid b/doc/admin/metrics.html.textile.liquid
index 107431267..e41a96ffc 100644
--- a/doc/admin/metrics.html.textile.liquid
+++ b/doc/admin/metrics.html.textile.liquid
@@ -12,7 +12,7 @@ SPDX-License-Identifier: CC-BY-SA-3.0
 
 Metrics endpoints are found at @/status.json@ on many Arvados services.  The purpose of metrics are to provide statistics about the operation of a service, suitable for diagnosing how well a service is performing under load.
 
-Metrics endpoints must be configured with a "management token":management-token.html .
+To access metrics endpoints, services must be configured with a "management token":management-token.html .
 
 h2. Keepstore
 
@@ -73,6 +73,53 @@ table(table table-bordered table-condensed).
 |InProgress| int||
 |Queued|     int||
 
+h3. Example response
+
+<pre>
+{
+  "Volumes": [
+    {
+      "Label": "[UnixVolume /var/lib/arvados/keep0]",
+      "Status": {
+        "MountPoint": "/var/lib/arvados/keep0",
+        "DeviceNum": 65029,
+        "BytesFree": 222532972544,
+        "BytesUsed": 435456679936
+      },
+      "InternalStats": {
+        "Errors": 0,
+        "InBytes": 1111,
+        "OutBytes": 0,
+        "OpenOps": 1,
+        "StatOps": 4,
+        "FlockOps": 0,
+        "UtimesOps": 0,
+        "CreateOps": 0,
+        "RenameOps": 0,
+        "UnlinkOps": 0,
+        "ReaddirOps": 0
+      }
+    }
+  ],
+  "BufferPool": {
+    "BytesAllocatedCumulative": 67108864,
+    "BuffersMax": 20,
+    "BuffersInUse": 0
+  },
+  "PullQueue": {
+    "InProgress": 0,
+    "Queued": 0
+  },
+  "TrashQueue": {
+    "InProgress": 0,
+    "Queued": 0
+  },
+  "RequestsCurrent": 1,
+  "RequestsMax": 40,
+  "Version": "dev"
+}
+</pre>
+
 h2. Node manager
 
 The node manager status end point provides a snapshot of internal status at the time of the most recent wishlist update.
@@ -89,3 +136,28 @@ table(table table-bordered table-condensed).
 |nodes_wish|int|Number of nodes in the current wishlist|
 |node_quota|int|Current node count ceiling due to cloud quota limits|
 |config_max_nodes|int|Configured max node count|
+
+h3. Example
+
+<pre>
+{
+  "actor_exceptions": 0,
+  "idle_times": {
+    "compute1": 0,
+    "compute3": 0,
+    "compute2": 0,
+    "compute4": 0
+  },
+  "create_node_errors": 0,
+  "destroy_node_errors": 0,
+  "nodes_idle": 0,
+  "config_max_nodes": 8,
+  "list_nodes_errors": 0,
+  "node_quota": 8,
+  "Version": "1.1.4.20180719160944",
+  "nodes_wish": 0,
+  "nodes_unpaired": 0,
+  "nodes_busy": 4,
+  "boot_failures": 0
+}
+</pre>
diff --git a/services/nodemanager/tests/fake_azure.cfg.template b/services/nodemanager/tests/fake_azure.cfg.template
index a11a6d807..e5deac85d 100644
--- a/services/nodemanager/tests/fake_azure.cfg.template
+++ b/services/nodemanager/tests/fake_azure.cfg.template
@@ -10,10 +10,12 @@
 # a snapshot of internal state.
 
 # Management server listening address (default 127.0.0.1)
-#address = 0.0.0.0
+address = 0.0.0.0
 
 # Management server port number (default -1, server is disabled)
-#port = 8989
+port = 8989
+
+MangementToken = xxx
 
 [Daemon]
 # The dispatcher can customize the start and stop procedure for

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list