[ARVADOS] updated: 1.1.4-698-g3ae140fa0
Git user
git at public.curoverse.com
Tue Jul 24 15:03:14 EDT 2018
Summary of changes:
doc/admin/health-checks.html.textile.liquid | 10 +--
doc/admin/management-token.html.textile.liquid | 19 ++++--
doc/admin/metrics.html.textile.liquid | 74 +++++++++++++++++++++-
services/nodemanager/tests/fake_azure.cfg.template | 6 +-
4 files changed, 96 insertions(+), 13 deletions(-)
via 3ae140fa072b2f2fbc8576c20ffd81fe463e78a5 (commit)
from 0a3d7a02236cbec448203a1b2218b5e0630d1c00 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
commit 3ae140fa072b2f2fbc8576c20ffd81fe463e78a5
Author: Peter Amstutz <pamstutz at veritasgenetics.com>
Date: Tue Jul 24 15:02:52 2018 -0400
13791: More detail about monitoring
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <pamstutz at veritasgenetics.com>
diff --git a/doc/admin/health-checks.html.textile.liquid b/doc/admin/health-checks.html.textile.liquid
index 9370c6ce6..630c6a178 100644
--- a/doc/admin/health-checks.html.textile.liquid
+++ b/doc/admin/health-checks.html.textile.liquid
@@ -10,11 +10,11 @@ Copyright (C) The Arvados Authors. All rights reserved.
SPDX-License-Identifier: CC-BY-SA-3.0
{% endcomment %}
-Health check endpoints are found at @/_health/ping@ on many Arvados services. The purpose of the health check is to be a simple method of determining if a service can be contacted and if it believes it is functioning properly, suitable for integrating into operational alert systems.
+Health check endpoints are found at @/_health/ping@ on many Arvados services. The purpose of the health check is to offer a simple method of determining if a service can be reached and allow the service to self-report any problems, suitable for integrating into operational alert systems.
-Health check endpoints must be configured with a "management token":management-token.html .
+To access health check endpoints, services must be configured with a "management token":management-token.html .
-This endpoint returns a JSON object with the field @health at . This has a value of either @OK@ or @ERROR at . On error, it may also include a field @error@ with additional information. Examples:
+Health check endpoints return a JSON object with the field @health at . This has a value of either @OK@ or @ERROR at . On error, it may also include a field @error@ with additional information. Examples:
<pre>
{
@@ -25,7 +25,7 @@ This endpoint returns a JSON object with the field @health at . This has a value o
<pre>
{
"health": "ERROR"
- "error": "Inverted polarity of the warp core"
+ "error": "Inverted polarity in the warp core"
}
</pre>
@@ -33,7 +33,7 @@ h2. Healthcheck aggregator
The service @arvados-health@ performs health checks on all configured services and returns a single value of @OK@ or @ERROR@ for the entire cluster. It exposes the endpoint @/_health/all@ .
-The healthcheck aggregator uses the "NodeProfile" section of the cluster-wide configuration file. Here is an example.
+The healthcheck aggregator uses the @NodeProfile@ section of the cluster-wide @arvados.yml@ configuration file. Here is an example.
<pre>
Cluster:
diff --git a/doc/admin/management-token.html.textile.liquid b/doc/admin/management-token.html.textile.liquid
index 33027ad88..306314337 100644
--- a/doc/admin/management-token.html.textile.liquid
+++ b/doc/admin/management-token.html.textile.liquid
@@ -18,12 +18,13 @@ To access a monitoring endpoint, the requester must provide the HTTP header @Aut
h2. API server
-Set @MangementToken@ in @application.yml@
+Set @MangementToken@ in the appropriate section of @application.yml@
<pre>
+production:
# Token to be included in all healthcheck requests. Disabled by default.
# Server expects request header of the format "Authorization: Bearer xxx"
- ManagementToken: ...
+ ManagementToken: xxx
</pre>
h2. Node Manager
@@ -32,13 +33,21 @@ Set @port@ (the listen port) and @MangementToken@ in the @Manage@ section of @no
<pre>
[Manage]
-port=8888
-ManagementToken=...
+# The management server responds to http://addr:port/status.json with
+# a snapshot of internal state.
+
+# Management server listening address (default 127.0.0.1)
+#address = 0.0.0.0
+
+# Management server port number (default -1, server is disabled)
+#port = 8989
+
+ManagementToken = xxx
</pre>
h2. Other services
-The following services also support health check. Set @MangementToken@ in the respective yaml config file for each service.
+The following services also support monitoring. Set @MangementToken@ in the respective yaml config file for each service.
* keepstore
* keep-web
diff --git a/doc/admin/metrics.html.textile.liquid b/doc/admin/metrics.html.textile.liquid
index 107431267..e41a96ffc 100644
--- a/doc/admin/metrics.html.textile.liquid
+++ b/doc/admin/metrics.html.textile.liquid
@@ -12,7 +12,7 @@ SPDX-License-Identifier: CC-BY-SA-3.0
Metrics endpoints are found at @/status.json@ on many Arvados services. The purpose of metrics are to provide statistics about the operation of a service, suitable for diagnosing how well a service is performing under load.
-Metrics endpoints must be configured with a "management token":management-token.html .
+To access metrics endpoints, services must be configured with a "management token":management-token.html .
h2. Keepstore
@@ -73,6 +73,53 @@ table(table table-bordered table-condensed).
|InProgress| int||
|Queued| int||
+h3. Example response
+
+<pre>
+{
+ "Volumes": [
+ {
+ "Label": "[UnixVolume /var/lib/arvados/keep0]",
+ "Status": {
+ "MountPoint": "/var/lib/arvados/keep0",
+ "DeviceNum": 65029,
+ "BytesFree": 222532972544,
+ "BytesUsed": 435456679936
+ },
+ "InternalStats": {
+ "Errors": 0,
+ "InBytes": 1111,
+ "OutBytes": 0,
+ "OpenOps": 1,
+ "StatOps": 4,
+ "FlockOps": 0,
+ "UtimesOps": 0,
+ "CreateOps": 0,
+ "RenameOps": 0,
+ "UnlinkOps": 0,
+ "ReaddirOps": 0
+ }
+ }
+ ],
+ "BufferPool": {
+ "BytesAllocatedCumulative": 67108864,
+ "BuffersMax": 20,
+ "BuffersInUse": 0
+ },
+ "PullQueue": {
+ "InProgress": 0,
+ "Queued": 0
+ },
+ "TrashQueue": {
+ "InProgress": 0,
+ "Queued": 0
+ },
+ "RequestsCurrent": 1,
+ "RequestsMax": 40,
+ "Version": "dev"
+}
+</pre>
+
h2. Node manager
The node manager status end point provides a snapshot of internal status at the time of the most recent wishlist update.
@@ -89,3 +136,28 @@ table(table table-bordered table-condensed).
|nodes_wish|int|Number of nodes in the current wishlist|
|node_quota|int|Current node count ceiling due to cloud quota limits|
|config_max_nodes|int|Configured max node count|
+
+h3. Example
+
+<pre>
+{
+ "actor_exceptions": 0,
+ "idle_times": {
+ "compute1": 0,
+ "compute3": 0,
+ "compute2": 0,
+ "compute4": 0
+ },
+ "create_node_errors": 0,
+ "destroy_node_errors": 0,
+ "nodes_idle": 0,
+ "config_max_nodes": 8,
+ "list_nodes_errors": 0,
+ "node_quota": 8,
+ "Version": "1.1.4.20180719160944",
+ "nodes_wish": 0,
+ "nodes_unpaired": 0,
+ "nodes_busy": 4,
+ "boot_failures": 0
+}
+</pre>
diff --git a/services/nodemanager/tests/fake_azure.cfg.template b/services/nodemanager/tests/fake_azure.cfg.template
index a11a6d807..e5deac85d 100644
--- a/services/nodemanager/tests/fake_azure.cfg.template
+++ b/services/nodemanager/tests/fake_azure.cfg.template
@@ -10,10 +10,12 @@
# a snapshot of internal state.
# Management server listening address (default 127.0.0.1)
-#address = 0.0.0.0
+address = 0.0.0.0
# Management server port number (default -1, server is disabled)
-#port = 8989
+port = 8989
+
+MangementToken = xxx
[Daemon]
# The dispatcher can customize the start and stop procedure for
-----------------------------------------------------------------------
hooks/post-receive
--
More information about the arvados-commits
mailing list