[ARVADOS] created: 2.1.0-8-gc01459a74

Git user git at public.arvados.org
Wed Oct 14 22:23:54 UTC 2020


        at  c01459a74632120eef952eec68780d02aa26c94c (commit)


commit c01459a74632120eef952eec68780d02aa26c94c
Author: Peter Amstutz <peter.amstutz at curii.com>
Date:   Wed Oct 14 18:23:19 2020 -0400

    16558: Document S3 support.  Additional detail about WebDAV support.
    
    Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz at curii.com>

diff --git a/doc/api/keep-s3.html.textile.liquid b/doc/api/keep-s3.html.textile.liquid
index 2efadfb0b..d0346cb69 100644
--- a/doc/api/keep-s3.html.textile.liquid
+++ b/doc/api/keep-s3.html.textile.liquid
@@ -14,3 +14,65 @@ SPDX-License-Identifier: CC-BY-SA-3.0
 The Simple Storage Service (S3) is a de-facto standard for object storage originally developed by Amazon Web Services.  Arvados supports accessing files in Keep using the S3 API.
 
 S3 is supported by many "cloud native" applications, and client libraries exist in many languages for programmatic access.
+
+h3. Endpoints and Buckets
+
+To access Arvados S3 using an S3 client library, you must tell it to use URL of the keep-web server (this is @Services.WebDAVDownload.ExternalURL@ in the public configuration) as the custom endpoint.
+
+The "bucket name" must be encoded as the first path segment of every request.  This is what Amazon calls "Path-Style Requests".
+
+The bucket name must be an Arvados collection uuid, portable data hash, or project uuid.
+
+h3. Supported Operations
+
+h4. ListObjects
+
+Supports the following request query parameters:
+
+* delimiter
+* marker
+* max-keys
+* prefix
+
+h4. GetObject
+
+Supports the @Range@ header.
+
+h4. PutObject
+
+Can be used to create or replace a file in a collection.
+
+An empty PUT with a trailing slash and @Content-Type: application/x-directory@ will create a directory within a collection if Arvados configuration option @Collections.S3FolderObjects@ is true.
+
+Missing parent/intermediate directories within a collection are created automatically.
+
+Cannot be used to create a collection or project.
+
+h4. DeleteObject
+
+Can be used to remove files from a collection.
+
+If used on a directory marker, it will delete the directory only if the directory is empty.
+
+h4. HeadBucket
+
+Can be used to determine if a bucket exists and if client has read access to it.
+
+h4. HeadObject
+
+Can be used to determine if an object exists and if client has read access to it.
+
+h4. GetBucketVersioning
+
+This is a stub to avoid breaking clients that request information about versioning.  Versioning is not actually supported.
+
+h3. Authorization mechanisms
+
+Accepts AWS Signature Version 4 format @Authorization: AWS AWS4-HMAC-SHA256 Credential=...,SignedHeaders=...,Signature=...@
+
+Can be used one of two ways:
+
+* The "Credential" is the Arvados token uuid, and the "Signature" is derived from request headers and the Arvados token secret
+* The "Credential" is the Arvados token secret, and "Signature" is derived from the same Arvados token secret
+
+Also accepts older format @Authorization: AWS AccessKey:signature@ .  Provide the Arvados secret token for the "Access Key".  In this case, the signature is ignored.
diff --git a/doc/api/keep-webdav.html.textile.liquid b/doc/api/keep-webdav.html.textile.liquid
index 939bc23da..4b3f63d9b 100644
--- a/doc/api/keep-webdav.html.textile.liquid
+++ b/doc/api/keep-webdav.html.textile.liquid
@@ -17,8 +17,24 @@ Most major operating systems include built-in support for mounting WebDAV resour
 
 Keep-web provides read/write HTTP (WebDAV) access to files stored in Keep. It serves public data to anonymous and unauthenticated clients, and serves private data to clients that supply Arvados API tokens. It can be installed anywhere with access to Keep services, typically behind a web proxy that supports TLS.
 
+h3. Browsing
+
+Fetching the root path an Arvados WebDAV service will return a 401 Unauthorized response with a @WWW-Authenticate@ header indicating "support for RFC 7617 Basic Authentication":https://tools.ietf.org/html/rfc7617 .  Requests may provide an Arvados token using @Authorization: Basic@ as described in "Authorization mechanisms":#auth .
+
+Getting a listing from keep-web starting at the root path @/@ will return two folders, @by_id@ and @users at .
+
+The @by_id@ folder will return an empty listing.  However, a path which starts with /by_id/ followed by a collection uuid, portable data hash, or project uuid will return the listing of that object.
+
+The @users@ folder will return a listing of the users for whom the client has permission to read the "home" project of that user.  Browsing an individual user will return the collections and projects directly owned by that user.  Browsing those collections and projects return listings of the files, directories, collections, and subprojects they contain, and so forth.
+
 h3. URL structure
 
+Files served by @keep-web@ can be rendered directly in the browser, or @keep-web@ can instruct the browser to only download the file.
+
+When serving files that will render directly in the browser, it is important to properly configure the WebDAV service to migitate cross-site-scripting (XSS) attacks.  A HTML page can be stored in a collection.  If an attacker causes a victim to visit that page through Workbench, the HTML will be rendered by the browser.  If all collections are served at the same domain, the browser will consider collections as coming from the same origin, which will grant access to the same browsing data (cookies and local storage).  This would enable malicious Javascript on that page to access Arvados on behalf of the victim.
+
+This can be mitigated by having separate domains for each collection, or limiting preview to circumstances where the collection is not accessed with the user's regular full-access token.  For cluster administrators that understand the risks, this protection can also be turned off.
+
 The following "same origin" URL patterns are supported for public collections and collections shared anonymously via secret links (i.e., collections which can be served by keep-web without making use of any implicit credentials like cookies). See "Same-origin URLs" below.
 
 <pre>
@@ -79,12 +95,18 @@ pre. http://collections.example.com/collections/uuid_or_pdh/foo/bar.txt
 
 Collections can also be accessed (read-only) via "/by_id/X" where X is a UUID or portable data hash.
 
-h3. Authorization mechanisms
+h3(#auth). Authorization mechanisms
 
-A token can be provided in an Authorization header:
+A token can be provided in an Authorization header as a @Bearer@ token:
 
 <pre>
-Authorization: OAuth2 o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK
+Authorization: Bearer o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK
+</pre>
+
+A token can also be provided with "RFC 7617 Basic Authentication":https://tools.ietf.org/html/rfc7617 in this case, the payload is formatted as @username:token@ and encoded with base64.  The username must be non-empty, but is ignored.  In this example, the username is "user":
+
+<pre>
+Authorization: Basic dXNlcjpvMDdqNHB4N1JsSks0Q3VNWXA3QzBMRFQ0Q3pSMUoxcUJFNUF2bzdlQ2NVak9UaWt4Swo=
 </pre>
 
 A base64-encoded token can be provided in a cookie named "api_token":
@@ -99,6 +121,12 @@ A token can be provided in an URL-encoded query string:
 GET /foo/bar.txt?api_token=o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK<
 </pre>
 
+A token can be provided in a URL-encoded path (as described in the previous section):
+
+<pre>
+GET /t=o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK/_/foo/bar.txt
+</pre>
+
 A suitably encoded token can be provided in a POST body if the request has a content type of application/x-www-form-urlencoded or multipart/form-data:
 
 <pre>

commit a5815b3fa0065e395c2da68b28880350847e8eeb
Author: Peter Amstutz <peter.amstutz at curii.com>
Date:   Fri Oct 9 18:48:07 2020 -0400

    16558: Added WebDAV notes
    
    Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz at curii.com>

diff --git a/doc/_config.yml b/doc/_config.yml
index 97db92f18..7117f5f4a 100644
--- a/doc/_config.yml
+++ b/doc/_config.yml
@@ -124,6 +124,8 @@ navbar:
       - api/methods/virtual_machines.html.textile.liquid
       - api/methods/keep_disks.html.textile.liquid
     - Data management:
+      - api/keep-webdav.html.textile.liquid
+      - api/keep-s3.html.textile.liquid
       - api/methods/collections.html.textile.liquid
       - api/methods/repositories.html.textile.liquid
     - Container engine:
diff --git a/doc/api/keep-s3.html.textile.liquid b/doc/api/keep-s3.html.textile.liquid
new file mode 100644
index 000000000..2efadfb0b
--- /dev/null
+++ b/doc/api/keep-s3.html.textile.liquid
@@ -0,0 +1,16 @@
+---
+layout: default
+navsection: api
+navmenu: API Methods
+title: "S3 API"
+
+...
+{% comment %}
+Copyright (C) The Arvados Authors. All rights reserved.
+
+SPDX-License-Identifier: CC-BY-SA-3.0
+{% endcomment %}
+
+The Simple Storage Service (S3) is a de-facto standard for object storage originally developed by Amazon Web Services.  Arvados supports accessing files in Keep using the S3 API.
+
+S3 is supported by many "cloud native" applications, and client libraries exist in many languages for programmatic access.
diff --git a/doc/api/keep-webdav.html.textile.liquid b/doc/api/keep-webdav.html.textile.liquid
new file mode 100644
index 000000000..939bc23da
--- /dev/null
+++ b/doc/api/keep-webdav.html.textile.liquid
@@ -0,0 +1,135 @@
+---
+layout: default
+navsection: api
+navmenu: API Methods
+title: "WebDAV"
+
+...
+{% comment %}
+Copyright (C) The Arvados Authors. All rights reserved.
+
+SPDX-License-Identifier: CC-BY-SA-3.0
+{% endcomment %}
+
+"Web Distributed Authoring and Versioning (WebDAV)":https://tools.ietf.org/html/rfc2518 is an IETF standard set of extensions to HTTP to manipulate and retrieve hierarchical web resources, similar to directories in a file system.  Arvados supports accessing files in Keep using WebDAV.
+
+Most major operating systems include built-in support for mounting WebDAV resources as network file systems, see user guide sections for "Windows":{{site.baseurl}}/user/tutorials/tutorial-keep-mount-windows.html , "macOS":{{site.baseurl}}/user/tutorials/tutorial-keep-mount-os-x.html , "Linux (Gnome)":{{site.baseurl}}/user/tutorials/tutorial-keep-mount-gnu-linux.html#gnome .  WebDAV is also supported by various standalone storage browser applications such as "Cyberduck":https://cyberduck.io/ and client libraries exist in many languages for programmatic access.
+
+Keep-web provides read/write HTTP (WebDAV) access to files stored in Keep. It serves public data to anonymous and unauthenticated clients, and serves private data to clients that supply Arvados API tokens. It can be installed anywhere with access to Keep services, typically behind a web proxy that supports TLS.
+
+h3. URL structure
+
+The following "same origin" URL patterns are supported for public collections and collections shared anonymously via secret links (i.e., collections which can be served by keep-web without making use of any implicit credentials like cookies). See "Same-origin URLs" below.
+
+<pre>
+http://collections.example.com/c=uuid_or_pdh/path/file.txt
+http://collections.example.com/c=uuid_or_pdh/t=TOKEN/path/file.txt
+</pre>
+
+The following "multiple origin" URL patterns are supported for all collections:
+
+<pre>
+http://uuid_or_pdh--collections.example.com/path/file.txt
+http://uuid_or_pdh--collections.example.com/t=TOKEN/path/file.txt
+</pre>
+
+In the "multiple origin" form, the string "--" can be replaced with "." with identical results (assuming the downstream proxy is configured accordingly). These two are equivalent:
+
+<pre>
+http://uuid_or_pdh--collections.example.com/path/file.txt
+http://uuid_or_pdh.collections.example.com/path/file.txt
+</pre>
+
+The first form (with "--" instead of ".") avoids the cost and effort of deploying a wildcard TLS certificate for *.collections.example.com at sites that already have a wildcard certificate for *.example.com. The second form is likely to be easier to configure, and more efficient to run, on a downstream proxy.
+
+In all of the above forms, the "collections.example.com" part can be anything at all: keep-web itself ignores everything after the first "." or "--". (Of course, in order for clients to connect at all, DNS and any relevant proxies must be configured accordingly.)
+
+In all of the above forms, the "uuid_or_pdh" part can be either a collection UUID or a portable data hash with the "+" character optionally replaced by "-". (When "uuid_or_pdh" appears in the domain name, replacing "+" with "-" is mandatory, because "+" is not a valid character in a domain name.)
+
+In all of the above forms, a top level directory called "_" is skipped. In cases where the "path/file.txt" part might start with "t=" or "c=" or "_/", links should be constructed with a leading "_/" to ensure the top level directory is not interpreted as a token or collection ID.
+
+Assuming there is a collection with UUID zzzzz-4zz18-znfnqtbbv4spc3w and portable data hash 1f4b0bc7583c2a7f9102c395f4ffc5e3+45, the following URLs are interchangeable:
+
+<pre>
+http://zzzzz-4zz18-znfnqtbbv4spc3w.collections.example.com/foo/bar.txt
+http://zzzzz-4zz18-znfnqtbbv4spc3w.collections.example.com/_/foo/bar.txt
+http://zzzzz-4zz18-znfnqtbbv4spc3w--collections.example.com/_/foo/bar.txt
+</pre>
+
+The following URLs are read-only, but otherwise interchangeable with the above:
+
+<pre>
+http://1f4b0bc7583c2a7f9102c395f4ffc5e3-45--foo.example.com/foo/bar.txt
+http://1f4b0bc7583c2a7f9102c395f4ffc5e3-45--.invalid/foo/bar.txt
+http://collections.example.com/by_id/1f4b0bc7583c2a7f9102c395f4ffc5e3%2B45/foo/bar.txt
+http://collections.example.com/by_id/zzzzz-4zz18-znfnqtbbv4spc3w/foo/bar.txt
+</pre>
+
+If the collection is named "MyCollection" and located in a project called "MyProject" which is in the home project of a user with username is "bob", the following read-only URL is also available when authenticating as bob:
+
+pre. http://collections.example.com/users/bob/MyProject/MyCollection/foo/bar.txt
+
+An additional form is supported specifically to make it more convenient to maintain support for existing Workbench download links:
+
+pre. http://collections.example.com/collections/download/uuid_or_pdh/TOKEN/foo/bar.txt
+
+A regular Workbench "download" link is also accepted, but credentials passed via cookie, header, etc. are ignored. Only public data can be served this way:
+
+pre. http://collections.example.com/collections/uuid_or_pdh/foo/bar.txt
+
+Collections can also be accessed (read-only) via "/by_id/X" where X is a UUID or portable data hash.
+
+h3. Authorization mechanisms
+
+A token can be provided in an Authorization header:
+
+<pre>
+Authorization: OAuth2 o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK
+</pre>
+
+A base64-encoded token can be provided in a cookie named "api_token":
+
+<pre>
+Cookie: api_token=bzA3ajRweDdSbEpLNEN1TVlwN0MwTERUNEN6UjFKMXFCRTVBdm83ZUNjVWpPVGlreEs=
+</pre>
+
+A token can be provided in an URL-encoded query string:
+
+<pre>
+GET /foo/bar.txt?api_token=o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK<
+</pre>
+
+A suitably encoded token can be provided in a POST body if the request has a content type of application/x-www-form-urlencoded or multipart/form-data:
+
+<pre>
+POST /foo/bar.txt
+Content-Type: application/x-www-form-urlencoded
+[...]
+api_token=o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK
+</pre>
+
+If a token is provided in a query string or in a POST request, the response is an HTTP 303 redirect to an equivalent GET request, with the token stripped from the query string and added to a cookie instead.
+
+h3. Indexes
+
+Keep-web returns a generic HTML index listing when a directory is requested with the GET method. It does not serve a default file like "index.html". Directory listings are also returned for WebDAV PROPFIND requests.
+
+h3. Range requests
+
+Keep-web supports partial resource reads using the HTTP @Range@ header as specified in "RFC 7233":https://tools.ietf.org/html/rfc7233 .
+
+h3. Compatibility
+
+Client-provided authorization tokens are ignored if the client does not provide a Host header.
+
+In order to use the query string or a POST form authorization mechanisms, the client must follow 303 redirects; the client must accept cookies with a 303 response and send those cookies when performing the redirect; and either the client or an intervening proxy must resolve a relative URL ("//host/path") if given in a response Location header.
+
+h3. Intranet mode
+
+Normally, Keep-web accepts requests for multiple collections using the same host name, provided the client's credentials are not being used. This provides insufficient XSS protection in an installation where the "anonymously accessible" data is not truly public, but merely protected by network topology.
+
+In such cases -- for example, a site which is not reachable from the internet, where some data is world-readable from Arvados's perspective but is intended to be available only to users within the local network -- the downstream proxy should configured to return 401 for all paths beginning with "/c=".
+
+h3. Same-origin URLs
+
+Without the same-origin protection outlined above, a web page stored in collection X could execute JavaScript code that uses the current viewer's credentials to download additional data from collection Y -- data which is accessible to the current viewer, but not to the author of collection X -- from the same origin (``https://collections.example.com/'') and upload it to some other site chosen by the author of collection X.

-----------------------------------------------------------------------


hooks/post-receive
-- 




More information about the arvados-commits mailing list