[ARVADOS] updated: 2.1.0-56-g0a1c47884

Git user git at public.arvados.org
Wed Jan 20 22:13:31 UTC 2021


Summary of changes:
 doc/_config.yml                                    |  1 +
 .../keep-components-overview.html.textile.liquid   | 61 ++++++++++++++++++++++
 2 files changed, 62 insertions(+)
 create mode 100644 doc/architecture/keep-components-overview.html.textile.liquid

       via  0a1c47884b83135c13bbcf565a2325f0b0583903 (commit)
       via  1f8e31ec1c9b04d85fb18e4504ff3f61d06b84b6 (commit)
      from  06996071999666e6cc6cfb29d87e242ee981d1ef (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.


commit 0a1c47884b83135c13bbcf565a2325f0b0583903
Author: Ward Vandewege <ward at curii.com>
Date:   Wed Jan 20 16:39:54 2021 -0500

    17222: implement review feedback.
    
    Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <ward at curii.com>

diff --git a/doc/architecture/keep-components-overview.html.textile.liquid b/doc/architecture/keep-components-overview.html.textile.liquid
index 25fb0c6ae..b07716aac 100644
--- a/doc/architecture/keep-components-overview.html.textile.liquid
+++ b/doc/architecture/keep-components-overview.html.textile.liquid
@@ -11,7 +11,7 @@ SPDX-License-Identifier: CC-BY-SA-3.0
 
 Keep has a number of components. This page describes each component and the role it plays.
 
-h3. Keep client
+h3. Keep clients for data access
 
 In order to access data in Keep, a client is needed to store data in and retrieve data from Keep. Different types of Keep clients exist:
 * a command line client like "@arv-get@":/user/tutorials/tutorial-keep-get.html#download-using-arv or "@arv-put@":/user/tutorials/tutorial-keep.html#upload-using-command
@@ -20,7 +20,7 @@ In order to access data in Keep, a client is needed to store data in and retriev
 * an S3-compatible endpoint provided by @keep-web@
 * programmatic access via the "Arvados SDKs":/sdk/index.html
 
-In essense, these clients all do the same thing: they translate file and directory references into requests for Keep blocks and collection manifests.
+In essense, these clients all do the same thing: they translate file and directory references into requests for Keep blocks and collection manifests. How Keep clients work, and how they use rendezvous hashing, is described in greater detail in "the next section":/architecture/keep-clients.html.
 
 For example, when a request comes in to read a file from Keep, the client will
 * request the collection object (including its manifest) from the API server
@@ -32,7 +32,7 @@ All of those steps are subject to access control, which applies at the level of
 
 h3. API server
 
-The API server stores collection objects. It also stores the ACLs that control access to the collections.
+The API server stores collection objects and all associated metadata. That includes data about where the blocks for a collection are to be stored, e.g. when "storage classes":/admin/storage-classes.html are configured, as well as the desired and confirmed replication count for each block. It also stores the ACLs that control access to the collections. Finally, the API server provides Keep clients with time-based block signatures for access.
 
 h3. Keepstore
 
@@ -42,7 +42,9 @@ So what happens if the content of a file changes? When a client changes a file,
 
 A keepstore can store its blocks in object storage (S3 or an S3-compatible system, or Azure Blob Storage). It can also store blocks on a POSIX file system. A keepstore can be configured with multiple storage volumes. Each keepstore volume is configured with a replication number; e.g. a POSIX file system backed by a single disk would have a replication factor of 1, while an Azure 'LRS'  storage volume could be configured with a replication factor of 3 (that is how many copies LRS stores under the hood, according to the Azure documentation).
 
-By default, Arvados uses a replication factor of 2. See the @DefaultReplication@ configuration parameter in "the configuration reference":https://doc.arvados.org/admin/config.html. Additionally, each collection can be configured with its own replication factor.
+By default, Arvados uses a replication factor of 2. See the @DefaultReplication@ configuration parameter in "the configuration reference":https://doc.arvados.org/admin/config.html. Additionally, each collection can be configured with its own replication factor. It's worth noting that it is the responsibility of the Keep clients to make sure that all blocks are stored subject to their desired replica count, which is derived from the collections the blocks belong to. @keepstore@ itself does not provide replication; all it does is store blocks on the volumes it knows about. The @keepproxy@ and @keep-balance@ processes (see below) make sure that blocks are replicated properly.
+
+The maximum block size for @keepstore@ is 64 MiB, and keep clients typically combine small files into larger blocks. In a typical Arvados installation, the majority of blocks stored in Keep will be 64 MiB, though some fraction will be smaller.
 
 h3. Keepproxy
 
@@ -50,10 +52,10 @@ The @keepproxy@ server is a gateway into your Keep storage. Unlike the Keepstore
 
 h3. Keep-web
 
-The @keep-web@ server provides read/write access to files stored in Keep using WebDAV and S3 protocols. This makes it easy to access files in Keep from a browser, or mount Keep as a network folder using WebDAV support in various operating systems. It serves public data to unauthenticated clients, and serves private data to clients that supply Arvados API tokens.
+The @keep-web@ server provides read/write access to files stored in Keep using the HTTP, WebDAV and S3 protocols. This makes it easy to access files in Keep from a browser, or mount Keep as a network folder using WebDAV support in various operating systems. It serves public data to unauthenticated clients, and serves private data to clients that supply Arvados API tokens.
 
 h3. Keep-balance
 
 Keep is a garbage-collected system. When a block is no longer referenced in any collection manifest in the system, it becomes eligible for garbage collection. When the desired replication factor for a block (derived from the default replication factor, in addition to the replication factor of any collection(s) the block belongs to) does not match reality, the number of copies stored in the available Keepstore servers needs to be adjusted.
 
-The @keep-balance@ service takes care of these things. It runs as a service, and wakes up periodically to do a scan of the system and send instructions to the Keepstore servers. That process is described in more detail at "Balancing Keep servers":https://doc.arvados.org/admin/keep-balance.html.
+The @keep-balance@ program takes care of these things. It runs as a service, and wakes up periodically to do a scan of the system and send instructions to the Keepstore servers. That process is described in more detail at "Balancing Keep servers":https://doc.arvados.org/admin/keep-balance.html.

commit 1f8e31ec1c9b04d85fb18e4504ff3f61d06b84b6
Author: Ward Vandewege <ward at curii.com>
Date:   Wed Jan 20 13:38:20 2021 -0500

    17222: add overview documentation for the components of Keep.
    
    Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <ward at curii.com>

diff --git a/doc/_config.yml b/doc/_config.yml
index d56a95c1e..52cc6888c 100644
--- a/doc/_config.yml
+++ b/doc/_config.yml
@@ -150,6 +150,7 @@ navbar:
       - architecture/index.html.textile.liquid
     - Storage in Keep:
       - architecture/storage.html.textile.liquid
+      - architecture/keep-components-overview.html.textile.liquid
       - architecture/keep-clients.html.textile.liquid
       - architecture/keep-data-lifecycle.html.textile.liquid
       - architecture/manifest-format.html.textile.liquid
diff --git a/doc/architecture/keep-components-overview.html.textile.liquid b/doc/architecture/keep-components-overview.html.textile.liquid
new file mode 100644
index 000000000..25fb0c6ae
--- /dev/null
+++ b/doc/architecture/keep-components-overview.html.textile.liquid
@@ -0,0 +1,59 @@
+---
+layout: default
+navsection: architecture
+title: Keep components overview
+...
+{% comment %}
+Copyright (C) The Arvados Authors. All rights reserved.
+
+SPDX-License-Identifier: CC-BY-SA-3.0
+{% endcomment %}
+
+Keep has a number of components. This page describes each component and the role it plays.
+
+h3. Keep client
+
+In order to access data in Keep, a client is needed to store data in and retrieve data from Keep. Different types of Keep clients exist:
+* a command line client like "@arv-get@":/user/tutorials/tutorial-keep-get.html#download-using-arv or "@arv-put@":/user/tutorials/tutorial-keep.html#upload-using-command
+* a FUSE mount provided by "@arv-mount@":/user/tutorials/tutorial-keep-mount-gnu-linux.html
+* a WebDAV mount provided by @keep-web@
+* an S3-compatible endpoint provided by @keep-web@
+* programmatic access via the "Arvados SDKs":/sdk/index.html
+
+In essense, these clients all do the same thing: they translate file and directory references into requests for Keep blocks and collection manifests.
+
+For example, when a request comes in to read a file from Keep, the client will
+* request the collection object (including its manifest) from the API server
+* look up the file in the collection manifest, and retrieve the hashes of the block(s) that contain its content
+* ask the keepstore(s) for the block hashes
+* return the contents of the file to the requestor
+
+All of those steps are subject to access control, which applies at the level of the collection: in the example above, the API server and the keepstore daemons verify that the client has permission to read the collection, and will reject the request if it does not.
+
+h3. API server
+
+The API server stores collection objects. It also stores the ACLs that control access to the collections.
+
+h3. Keepstore
+
+The @keepstore@ daemon is Keep's workhorse, the storage server that stores and retrieves data from an underlying storage system. Keepstore exposes an HTTP REST API. Keepstore only handles requests for blocks. Because blocks are content-addressed, they can be written and deleted, but there is no _update_ operation: blocks are immutable.
+
+So what happens if the content of a file changes? When a client changes a file, it first writes any new blocks to the keepstore(s). Then, it updates the manifest for the collection the file belongs to with the references to the new blocks.
+
+A keepstore can store its blocks in object storage (S3 or an S3-compatible system, or Azure Blob Storage). It can also store blocks on a POSIX file system. A keepstore can be configured with multiple storage volumes. Each keepstore volume is configured with a replication number; e.g. a POSIX file system backed by a single disk would have a replication factor of 1, while an Azure 'LRS'  storage volume could be configured with a replication factor of 3 (that is how many copies LRS stores under the hood, according to the Azure documentation).
+
+By default, Arvados uses a replication factor of 2. See the @DefaultReplication@ configuration parameter in "the configuration reference":https://doc.arvados.org/admin/config.html. Additionally, each collection can be configured with its own replication factor.
+
+h3. Keepproxy
+
+The @keepproxy@ server is a gateway into your Keep storage. Unlike the Keepstore servers, which are only accessible on the local LAN, Keepproxy is suitable for clients located elsewhere on the internet. A client writing through Keepproxy only writes one copy of each block; the Keepproxy server will write additional copies of the data to the Keepstore servers, to fulfill the requested replication factor. Keepproxy also checks API token validity before processing requests.
+
+h3. Keep-web
+
+The @keep-web@ server provides read/write access to files stored in Keep using WebDAV and S3 protocols. This makes it easy to access files in Keep from a browser, or mount Keep as a network folder using WebDAV support in various operating systems. It serves public data to unauthenticated clients, and serves private data to clients that supply Arvados API tokens.
+
+h3. Keep-balance
+
+Keep is a garbage-collected system. When a block is no longer referenced in any collection manifest in the system, it becomes eligible for garbage collection. When the desired replication factor for a block (derived from the default replication factor, in addition to the replication factor of any collection(s) the block belongs to) does not match reality, the number of copies stored in the available Keepstore servers needs to be adjusted.
+
+The @keep-balance@ service takes care of these things. It runs as a service, and wakes up periodically to do a scan of the system and send instructions to the Keepstore servers. That process is described in more detail at "Balancing Keep servers":https://doc.arvados.org/admin/keep-balance.html.

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list