[ARVADOS] updated: 2.1.0-7-g948c9603d

Git user git at public.arvados.org
Fri Oct 16 21:27:40 UTC 2020


Summary of changes:
 doc/_config.yml                           |   3 +
 doc/api/keep-s3.html.textile.liquid       |  74 ++++++++++++++
 doc/api/keep-web-urls.html.textile.liquid |  75 ++++++++++++++
 doc/api/keep-webdav.html.textile.liquid   | 103 ++++++++++++++++++++
 services/keep-web/doc.go                  | 156 +-----------------------------
 5 files changed, 256 insertions(+), 155 deletions(-)
 create mode 100644 doc/api/keep-s3.html.textile.liquid
 create mode 100644 doc/api/keep-web-urls.html.textile.liquid
 create mode 100644 doc/api/keep-webdav.html.textile.liquid

       via  948c9603d86597d5455f66ce56834f43d7bf908c (commit)
      from  f8e0c7ac7a834733e189a906c2f9299f9ed010f5 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.


commit 948c9603d86597d5455f66ce56834f43d7bf908c
Author: Peter Amstutz <peter.amstutz at curii.com>
Date:   Fri Oct 9 18:48:07 2020 -0400

    Add WebDAV and S3 API documentation
    
    Remove sections of keep-web/doc.go are now part of the API
    documentation
    
    refs #16558
    
    Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz at curii.com>

diff --git a/doc/_config.yml b/doc/_config.yml
index 97db92f18..ef9d1d771 100644
--- a/doc/_config.yml
+++ b/doc/_config.yml
@@ -124,6 +124,9 @@ navbar:
       - api/methods/virtual_machines.html.textile.liquid
       - api/methods/keep_disks.html.textile.liquid
     - Data management:
+      - api/keep-webdav.html.textile.liquid
+      - api/keep-s3.html.textile.liquid
+      - api/keep-web-urls.html.textile.liquid
       - api/methods/collections.html.textile.liquid
       - api/methods/repositories.html.textile.liquid
     - Container engine:
diff --git a/doc/api/keep-s3.html.textile.liquid b/doc/api/keep-s3.html.textile.liquid
new file mode 100644
index 000000000..2cae81761
--- /dev/null
+++ b/doc/api/keep-s3.html.textile.liquid
@@ -0,0 +1,74 @@
+---
+layout: default
+navsection: api
+navmenu: API Methods
+title: "S3 API"
+
+...
+{% comment %}
+Copyright (C) The Arvados Authors. All rights reserved.
+
+SPDX-License-Identifier: CC-BY-SA-3.0
+{% endcomment %}
+
+The Simple Storage Service (S3) API is a de-facto standard for object storage originally developed by Amazon Web Services.  Arvados supports accessing files in Keep using the S3 API.
+
+S3 is supported by many "cloud native" applications, and client libraries exist in many languages for programmatic access.
+
+h3. Endpoints and Buckets
+
+To access Arvados S3 using an S3 client library, you must tell it to use the URL of the keep-web server (this is @Services.WebDAVDownload.ExternalURL@ in the public configuration) as the custom endpoint.  The keep-web server will decide to treat it as an S3 API request based on the presence of an AWS-format Authorization header.  Requests without an Authorization header, or differently formatted Authorization, will be treated as "WebDAV":keep-webdav.html .
+
+The "bucket name" is an Arvados collection uuid, portable data hash, or project uuid.
+
+The bucket name must be encoded as the first path segment of every request.  This is what the S3 documentation calls "Path-Style Requests".
+
+h3. Supported Operations
+
+h4. ListObjects
+
+Supports the following request query parameters:
+
+* delimiter
+* marker
+* max-keys
+* prefix
+
+h4. GetObject
+
+Supports the @Range@ header.
+
+h4. PutObject
+
+Can be used to create or replace a file in a collection.
+
+An empty PUT with a trailing slash and @Content-Type: application/x-directory@ will create a directory within a collection if Arvados configuration option @Collections.S3FolderObjects@ is true.
+
+Missing parent/intermediate directories within a collection are created automatically.
+
+Cannot be used to create a collection or project.
+
+h4. DeleteObject
+
+Can be used to remove files from a collection.
+
+If used on a directory marker, it will delete the directory only if the directory is empty.
+
+h4. HeadBucket
+
+Can be used to determine if a bucket exists and if client has read access to it.
+
+h4. HeadObject
+
+Can be used to determine if an object exists and if client has read access to it.
+
+h4. GetBucketVersioning
+
+Bucket versioning is presently not supported, so this will always respond that bucket versioning is not enabled.
+
+h3. Authorization mechanisms
+
+Keep-web accepts AWS Signature Version 4 (AWS4-HMAC-SHA256) as well as the older V2 AWS signature.
+
+* If your client uses V4 signatures exclusively: use the Arvados token's UUID part as AccessKey, and its secret part as SecretKey.  This is preferred.
+* If your client uses V2 signatures, or a combination of V2 and V4, or the Arvados token UUID is unknown: use the secret part of the Arvados token for both AccessKey and SecretKey.
diff --git a/doc/api/keep-web-urls.html.textile.liquid b/doc/api/keep-web-urls.html.textile.liquid
new file mode 100644
index 000000000..91e4f2085
--- /dev/null
+++ b/doc/api/keep-web-urls.html.textile.liquid
@@ -0,0 +1,75 @@
+---
+layout: default
+navsection: api
+navmenu: API Methods
+title: "Keep-web URL patterns"
+...
+{% comment %}
+Copyright (C) The Arvados Authors. All rights reserved.
+
+SPDX-License-Identifier: CC-BY-SA-3.0
+{% endcomment %}
+
+Files served by @keep-web@ can be rendered directly in the browser, or @keep-web@ can instruct the browser to only download the file.
+
+When serving files that will render directly in the browser, it is important to properly configure the keep-web service to migitate cross-site-scripting (XSS) attacks.  A HTML page can be stored in a collection.  If an attacker causes a victim to visit that page through Workbench, the HTML will be rendered by the browser.  If all collections are served at the same domain, the browser will consider collections as coming from the same origin, which will grant access to the same browsing data (cookies and local storage).  This would enable malicious Javascript on that page to access Arvados on behalf of the victim.
+
+This can be mitigated by having separate domains for each collection, or limiting preview to circumstances where the collection is not accessed with the user's regular full-access token.  For cluster administrators that understand the risks, this protection can also be turned off.
+
+The following "same origin" URL patterns are supported for public collections and collections shared anonymously via secret links (i.e., collections which can be served by keep-web without making use of any implicit credentials like cookies). See "Same-origin URLs" below.
+
+<pre>
+http://collections.example.com/c=uuid_or_pdh/path/file.txt
+http://collections.example.com/c=uuid_or_pdh/t=TOKEN/path/file.txt
+</pre>
+
+The following "multiple origin" URL patterns are supported for all collections:
+
+<pre>
+http://uuid_or_pdh--collections.example.com/path/file.txt
+http://uuid_or_pdh--collections.example.com/t=TOKEN/path/file.txt
+</pre>
+
+In the "multiple origin" form, the string @--@ can be replaced with @.@ with identical results (assuming the downstream proxy is configured accordingly). These two are equivalent:
+
+<pre>
+http://uuid_or_pdh--collections.example.com/path/file.txt
+http://uuid_or_pdh.collections.example.com/path/file.txt
+</pre>
+
+The first form (with @--@ instead of @.@) avoids the cost and effort of deploying a wildcard TLS certificate for @*.collections.example.com@ at sites that already have a wildcard certificate for @*.example.com@ . The second form is likely to be easier to configure, and more efficient to run, on a downstream proxy.
+
+In all of the above forms, the @collections.example.com@ part can be anything at all: keep-web itself ignores everything after the first @.@ or @-- at . (Of course, in order for clients to connect at all, DNS and any relevant proxies must be configured accordingly.)
+
+In all of the above forms, the @uuid_or_pdh@ part can be either a collection UUID or a portable data hash with the @+@ character optionally replaced by @-@ . (When @uuid_or_pdh@ appears in the domain name, replacing @+@ with @-@ is mandatory, because @+@ is not a valid character in a domain name.)
+
+In all of the above forms, a top level directory called @_@ is skipped. In cases where the @path/file.txt@ part might start with @t=@ or @c=@ or @_/@, links should be constructed with a leading @_/@ to ensure the top level directory is not interpreted as a token or collection ID.
+
+Assuming there is a collection with UUID @zzzzz-4zz18-znfnqtbbv4spc3w@ and portable data hash @1f4b0bc7583c2a7f9102c395f4ffc5e3+45@, the following URLs are interchangeable:
+
+<pre>
+http://zzzzz-4zz18-znfnqtbbv4spc3w.collections.example.com/foo/bar.txt
+http://zzzzz-4zz18-znfnqtbbv4spc3w.collections.example.com/_/foo/bar.txt
+http://zzzzz-4zz18-znfnqtbbv4spc3w--collections.example.com/_/foo/bar.txt
+</pre>
+
+The following URLs are read-only, but will return the same content as above:
+
+<pre>
+http://1f4b0bc7583c2a7f9102c395f4ffc5e3-45--foo.example.com/foo/bar.txt
+http://1f4b0bc7583c2a7f9102c395f4ffc5e3-45--.invalid/foo/bar.txt
+http://collections.example.com/by_id/1f4b0bc7583c2a7f9102c395f4ffc5e3%2B45/foo/bar.txt
+http://collections.example.com/by_id/zzzzz-4zz18-znfnqtbbv4spc3w/foo/bar.txt
+</pre>
+
+If the collection is named "MyCollection" and located in a project called "MyProject" which is in the home project of a user with username is "bob", the following read-only URL is also available when authenticating as bob:
+
+pre. http://collections.example.com/users/bob/MyProject/MyCollection/foo/bar.txt
+
+An additional form is supported specifically to make it more convenient to maintain support for existing Workbench download links:
+
+pre. http://collections.example.com/collections/download/uuid_or_pdh/TOKEN/foo/bar.txt
+
+A regular Workbench "download" link is also accepted, but credentials passed via cookie, header, etc. are ignored. Only public data can be served this way:
+
+pre. http://collections.example.com/collections/uuid_or_pdh/foo/bar.txt
diff --git a/doc/api/keep-webdav.html.textile.liquid b/doc/api/keep-webdav.html.textile.liquid
new file mode 100644
index 000000000..f068a49c2
--- /dev/null
+++ b/doc/api/keep-webdav.html.textile.liquid
@@ -0,0 +1,103 @@
+---
+layout: default
+navsection: api
+navmenu: API Methods
+title: "WebDAV"
+...
+
+{% comment %}
+Copyright (C) The Arvados Authors. All rights reserved.
+
+SPDX-License-Identifier: CC-BY-SA-3.0
+{% endcomment %}
+
+"Web Distributed Authoring and Versioning (WebDAV)":https://tools.ietf.org/html/rfc4918 is an IETF standard set of extensions to HTTP to manipulate and retrieve hierarchical web resources, similar to directories in a file system.  Arvados supports accessing files in Keep using WebDAV.
+
+Most major operating systems include built-in support for mounting WebDAV resources as network file systems, see user guide sections for "Windows":{{site.baseurl}}/user/tutorials/tutorial-keep-mount-windows.html , "macOS":{{site.baseurl}}/user/tutorials/tutorial-keep-mount-os-x.html , "Linux (Gnome)":{{site.baseurl}}/user/tutorials/tutorial-keep-mount-gnu-linux.html#gnome .  WebDAV is also supported by various standalone storage browser applications such as "Cyberduck":https://cyberduck.io/ and client libraries exist in many languages for programmatic access.
+
+Keep-web provides read/write HTTP (WebDAV) access to files stored in Keep. It serves public data to anonymous and unauthenticated clients, and serves private data to clients that supply Arvados API tokens.
+
+h3. Supported Operations
+
+Supports WebDAV HTTP methods @GET@, @PUT@, @DELETE@, @PROPFIND@, @COPY@, and @MOVE at .
+
+Does not support @LOCK@ or @UNLOCK at .  These methods will be accepted, but are no-ops.
+
+h3. Browsing
+
+Requests can be authenticated a variety of ways as described below in "Authentication mechanisms":#auth .  An unauthenticated request will return a 401 Unauthorized response with a @WWW-Authenticate@ header indicating "support for RFC 7617 Basic Authentication":https://tools.ietf.org/html/rfc7617 .
+
+Getting a listing from keep-web starting at the root path @/@ will return two folders, @by_id@ and @users at .
+
+The @by_id@ folder will return an empty listing.  However, a path which starts with /by_id/ followed by a collection uuid, portable data hash, or project uuid will return the listing of that object.
+
+The @users@ folder will return a listing of the users for whom the client has permission to read the "home" project of that user.  Browsing an individual user will return the collections and projects directly owned by that user.  Browsing those collections and projects return listings of the files, directories, collections, and subprojects they contain, and so forth.
+
+In addition to the @/by_id/@ path prefix, the collection or project can be specified using a path prefix of @/c=<uuid or pdh>/@ or (if the cluster is properly configured) as a virtual host.  This is described on "Keep-web URLs":keep-web-urls.html
+
+h3(#auth). Authentication mechanisms
+
+A token can be provided in an Authorization header as a @Bearer@ token:
+
+<pre>
+Authorization: Bearer o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK
+</pre>
+
+A token can also be provided with "RFC 7617 Basic Authentication":https://tools.ietf.org/html/rfc7617 in this case, the payload is formatted as @username:token@ and encoded with base64.  The username must be non-empty, but is ignored.  In this example, the username is "user":
+
+<pre>
+Authorization: Basic dXNlcjpvMDdqNHB4N1JsSks0Q3VNWXA3QzBMRFQ0Q3pSMUoxcUJFNUF2bzdlQ2NVak9UaWt4Swo=
+</pre>
+
+A base64-encoded token can be provided in a cookie named "api_token":
+
+<pre>
+Cookie: api_token=bzA3ajRweDdSbEpLNEN1TVlwN0MwTERUNEN6UjFKMXFCRTVBdm83ZUNjVWpPVGlreEs=
+</pre>
+
+A token can be provided in an URL-encoded query string:
+
+<pre>
+GET /foo/bar.txt?api_token=o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK
+</pre>
+
+A token can be provided in a URL-encoded path (as described in the previous section):
+
+<pre>
+GET /t=o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK/_/foo/bar.txt
+</pre>
+
+A suitably encoded token can be provided in a POST body if the request has a content type of application/x-www-form-urlencoded or multipart/form-data:
+
+<pre>
+POST /foo/bar.txt
+Content-Type: application/x-www-form-urlencoded
+[...]
+api_token=o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK
+</pre>
+
+If a token is provided in a query string or in a POST request, the response is an HTTP 303 redirect to an equivalent GET request, with the token stripped from the query string and added to a cookie instead.
+
+h3. Indexes
+
+Keep-web returns a generic HTML index listing when a directory is requested with the GET method. It does not serve a default file like "index.html". Directory listings are also returned for WebDAV PROPFIND requests.
+
+h3. Range requests
+
+Keep-web supports partial resource reads using the HTTP @Range@ header as specified in "RFC 7233":https://tools.ietf.org/html/rfc7233 .
+
+h3. Compatibility
+
+Client-provided authorization tokens are ignored if the client does not provide a @Host@ header.
+
+In order to use the query string or a POST form authorization mechanisms, the client must follow 303 redirects; the client must accept cookies with a 303 response and send those cookies when performing the redirect; and either the client or an intervening proxy must resolve a relative URL ("//host/path") if given in a response Location header.
+
+h3. Intranet mode
+
+Normally, Keep-web accepts requests for multiple collections using the same host name, provided the client's credentials are not being used. This provides insufficient XSS protection in an installation where the "anonymously accessible" data is not truly public, but merely protected by network topology.
+
+In such cases -- for example, a site which is not reachable from the internet, where some data is world-readable from Arvados's perspective but is intended to be available only to users within the local network -- the downstream proxy should configured to return 401 for all paths beginning with "/c=".
+
+h3. Same-origin URLs
+
+Without the same-origin protection outlined above, a web page stored in collection X could execute JavaScript code that uses the current viewer's credentials to download additional data from collection Y -- data which is accessible to the current viewer, but not to the author of collection X -- from the same origin (``https://collections.example.com/'') and upload it to some other site chosen by the author of collection X.
diff --git a/services/keep-web/doc.go b/services/keep-web/doc.go
index 8682eac2d..be81bb68c 100644
--- a/services/keep-web/doc.go
+++ b/services/keep-web/doc.go
@@ -91,161 +91,7 @@
 //
 // Download URLs
 //
-// The following "same origin" URL patterns are supported for public
-// collections and collections shared anonymously via secret links
-// (i.e., collections which can be served by keep-web without making
-// use of any implicit credentials like cookies). See "Same-origin
-// URLs" below.
-//
-//   http://collections.example.com/c=uuid_or_pdh/path/file.txt
-//   http://collections.example.com/c=uuid_or_pdh/t=TOKEN/path/file.txt
-//
-// The following "multiple origin" URL patterns are supported for all
-// collections:
-//
-//   http://uuid_or_pdh--collections.example.com/path/file.txt
-//   http://uuid_or_pdh--collections.example.com/t=TOKEN/path/file.txt
-//
-// In the "multiple origin" form, the string "--" can be replaced with
-// "." with identical results (assuming the downstream proxy is
-// configured accordingly). These two are equivalent:
-//
-//   http://uuid_or_pdh--collections.example.com/path/file.txt
-//   http://uuid_or_pdh.collections.example.com/path/file.txt
-//
-// The first form (with "--" instead of ".") avoids the cost and
-// effort of deploying a wildcard TLS certificate for
-// *.collections.example.com at sites that already have a wildcard
-// certificate for *.example.com. The second form is likely to be
-// easier to configure, and more efficient to run, on a downstream
-// proxy.
-//
-// In all of the above forms, the "collections.example.com" part can
-// be anything at all: keep-web itself ignores everything after the
-// first "." or "--". (Of course, in order for clients to connect at
-// all, DNS and any relevant proxies must be configured accordingly.)
-//
-// In all of the above forms, the "uuid_or_pdh" part can be either a
-// collection UUID or a portable data hash with the "+" character
-// optionally replaced by "-". (When "uuid_or_pdh" appears in the
-// domain name, replacing "+" with "-" is mandatory, because "+" is
-// not a valid character in a domain name.)
-//
-// In all of the above forms, a top level directory called "_" is
-// skipped. In cases where the "path/file.txt" part might start with
-// "t=" or "c=" or "_/", links should be constructed with a leading
-// "_/" to ensure the top level directory is not interpreted as a
-// token or collection ID.
-//
-// Assuming there is a collection with UUID
-// zzzzz-4zz18-znfnqtbbv4spc3w and portable data hash
-// 1f4b0bc7583c2a7f9102c395f4ffc5e3+45, the following URLs are
-// interchangeable:
-//
-//   http://zzzzz-4zz18-znfnqtbbv4spc3w.collections.example.com/foo/bar.txt
-//   http://zzzzz-4zz18-znfnqtbbv4spc3w.collections.example.com/_/foo/bar.txt
-//   http://zzzzz-4zz18-znfnqtbbv4spc3w--collections.example.com/_/foo/bar.txt
-//
-// The following URLs are read-only, but otherwise interchangeable
-// with the above:
-//
-//   http://1f4b0bc7583c2a7f9102c395f4ffc5e3-45--foo.example.com/foo/bar.txt
-//   http://1f4b0bc7583c2a7f9102c395f4ffc5e3-45--.invalid/foo/bar.txt
-//   http://collections.example.com/by_id/1f4b0bc7583c2a7f9102c395f4ffc5e3%2B45/foo/bar.txt
-//   http://collections.example.com/by_id/zzzzz-4zz18-znfnqtbbv4spc3w/foo/bar.txt
-//
-// If the collection is named "MyCollection" and located in a project
-// called "MyProject" which is in the home project of a user with
-// username is "bob", the following read-only URL is also available
-// when authenticating as bob:
-//
-//   http://collections.example.com/users/bob/MyProject/MyCollection/foo/bar.txt
-//
-// An additional form is supported specifically to make it more
-// convenient to maintain support for existing Workbench download
-// links:
-//
-//   http://collections.example.com/collections/download/uuid_or_pdh/TOKEN/foo/bar.txt
-//
-// A regular Workbench "download" link is also accepted, but
-// credentials passed via cookie, header, etc. are ignored. Only
-// public data can be served this way:
-//
-//   http://collections.example.com/collections/uuid_or_pdh/foo/bar.txt
-//
-// Collections can also be accessed (read-only) via "/by_id/X" where X
-// is a UUID or portable data hash.
-//
-// Authorization mechanisms
-//
-// A token can be provided in an Authorization header:
-//
-//   Authorization: OAuth2 o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK
-//
-// A base64-encoded token can be provided in a cookie named "api_token":
-//
-//   Cookie: api_token=bzA3ajRweDdSbEpLNEN1TVlwN0MwTERUNEN6UjFKMXFCRTVBdm83ZUNjVWpPVGlreEs=
-//
-// A token can be provided in an URL-encoded query string:
-//
-//   GET /foo/bar.txt?api_token=o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK
-//
-// A suitably encoded token can be provided in a POST body if the
-// request has a content type of application/x-www-form-urlencoded or
-// multipart/form-data:
-//
-//   POST /foo/bar.txt
-//   Content-Type: application/x-www-form-urlencoded
-//   [...]
-//   api_token=o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK
-//
-// If a token is provided in a query string or in a POST request, the
-// response is an HTTP 303 redirect to an equivalent GET request, with
-// the token stripped from the query string and added to a cookie
-// instead.
-//
-// Indexes
-//
-// Keep-web returns a generic HTML index listing when a directory is
-// requested with the GET method. It does not serve a default file
-// like "index.html". Directory listings are also returned for WebDAV
-// PROPFIND requests.
-//
-// Compatibility
-//
-// Client-provided authorization tokens are ignored if the client does
-// not provide a Host header.
-//
-// In order to use the query string or a POST form authorization
-// mechanisms, the client must follow 303 redirects; the client must
-// accept cookies with a 303 response and send those cookies when
-// performing the redirect; and either the client or an intervening
-// proxy must resolve a relative URL ("//host/path") if given in a
-// response Location header.
-//
-// Intranet mode
-//
-// Normally, Keep-web accepts requests for multiple collections using
-// the same host name, provided the client's credentials are not being
-// used. This provides insufficient XSS protection in an installation
-// where the "anonymously accessible" data is not truly public, but
-// merely protected by network topology.
-//
-// In such cases -- for example, a site which is not reachable from
-// the internet, where some data is world-readable from Arvados's
-// perspective but is intended to be available only to users within
-// the local network -- the downstream proxy should configured to
-// return 401 for all paths beginning with "/c=".
-//
-// Same-origin URLs
-//
-// Without the same-origin protection outlined above, a web page
-// stored in collection X could execute JavaScript code that uses the
-// current viewer's credentials to download additional data from
-// collection Y -- data which is accessible to the current viewer, but
-// not to the author of collection X -- from the same origin
-// (``https://collections.example.com/'') and upload it to some other
-// site chosen by the author of collection X.
+// See http://doc.arvados.org/api/keep-web-urls.html
 //
 // Attachment-Only host
 //

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list