From: Brian Warner Date: Thu, 6 Nov 2008 00:30:50 +0000 (-0700) Subject: docs: move webapi/ftp/sftp into a new frontends/ directory X-Git-Url: https://git.rkrishnan.org/%5B/%5D%20/uri/webapi.txt?a=commitdiff_plain;h=fc04afa5ddaa1128d42909f5c7ea1d36ff8b8acb;p=tahoe-lafs%2Ftahoe-lafs.git docs: move webapi/ftp/sftp into a new frontends/ directory --- diff --git a/docs/frontends/ftp.txt b/docs/frontends/ftp.txt new file mode 100644 index 00000000..50774c6e --- /dev/null +++ b/docs/frontends/ftp.txt @@ -0,0 +1,112 @@ += Tahoe FTP Frontend = + +All Tahoe client nodes can run a frontend FTP server, allowing regular FTP +clients to access the virtual filesystem. + +Since Tahoe does not use user accounts or passwords, the FTP server must be +configured with a way to translate USER+PASS into a root directory cap. Two +mechanisms are provided. The first is a simple flat file with one account per +line. The second is an HTTP-based login mechanism, backed by simple PHP +script and a database. The latter form is used by allmydata.com to provide +secure access to customer rootcaps. + +== Configuring an Account File == + +To configure the first form, create a file (probably in +BASEDIR/private/ftp.accounts) in which each non-comment/non-blank line is a +space-separated line of (USERNAME, PASSWORD, ROOTCAP), like so: + + % cat BASEDIR/private/ftp.accounts + # This is a password file, (username, password, rootcap) + alice password URI:DIR2:ioej8xmzrwilg772gzj4fhdg7a:wtiizszzz2rgmczv4wl6bqvbv33ag4kvbr6prz3u6w3geixa6m6a + bob sekrit URI:DIR2:6bdmeitystckbl9yqlw7g56f4e:serp5ioqxnh34mlbmzwvkp3odehsyrr7eytt5f64we3k9hhcrcja + +Then add the following lines to the BASEDIR/tahoe.cfg file: + + [ftpd] + enabled = true + ftp.port = 8021 + ftp.accounts.file = private/ftp.accounts + +The FTP server will listen on the given port number. The ftp.accounts.file +pathname will be interpreted relative to the node's BASEDIR. + +== Configuring an Account Server == + +Determine the URL of the account server, say https://example.com/login . Then +add the following lines to BASEDIR/tahoe.cfg: + + [ftpd] + enabled = true + ftp.port = 8021 + ftp.accounts.url = https://example.com/login + +== Dependencies == + +The FTP server requires code in Twisted that enables asynchronous closing of +file-upload operations. This code was not in the Twisted-8.1.0 release, and +has not been committed to SVN trunk as of r24943. So it may be necessary to +apply the following patch. The Tahoe node refuse to start the FTP server if +it detects that this patch has not been applied. + +Index: twisted/protocols/ftp.py +=================================================================== +--- twisted/protocols/ftp.py (revision 24956) ++++ twisted/protocols/ftp.py (working copy) +@@ -1049,7 +1049,6 @@ + cons = ASCIIConsumerWrapper(cons) + + d = self.dtpInstance.registerConsumer(cons) +- d.addCallbacks(cbSent, ebSent) + + # Tell them what to doooo + if self.dtpInstance.isConnected: +@@ -1062,6 +1061,8 @@ + def cbOpened(file): + d = file.receive() + d.addCallback(cbConsumer) ++ d.addCallback(lambda ignored: file.close()) ++ d.addCallbacks(cbSent, ebSent) + return d + + def ebOpened(err): +@@ -1434,7 +1435,14 @@ + @rtype: C{Deferred} of C{IConsumer} + """ + ++ def close(): ++ """ ++ Perform any post-write work that needs to be done. This method may ++ only be invoked once on each provider, and will always be invoked ++ after receive(). + ++ @rtype: C{Deferred} of anything: the value is ignored ++ """ + + def _getgroups(uid): + """Return the primary and supplementary groups for the given UID. +@@ -1795,6 +1803,8 @@ + # FileConsumer will close the file object + return defer.succeed(FileConsumer(self.fObj)) + ++ def close(self): ++ return defer.succeed(None) + + + class FTPRealm: +Index: twisted/vfs/adapters/ftp.py +=================================================================== +--- twisted/vfs/adapters/ftp.py (revision 24956) ++++ twisted/vfs/adapters/ftp.py (working copy) +@@ -295,6 +295,11 @@ + """ + return defer.succeed(IConsumer(self.node)) + ++ def close(self): ++ """ ++ Perform post-write actions. ++ """ ++ return defer.succeed(None) + + + class _FileToConsumerAdapter(object): diff --git a/docs/frontends/sftp.txt b/docs/frontends/sftp.txt new file mode 100644 index 00000000..cb085823 --- /dev/null +++ b/docs/frontends/sftp.txt @@ -0,0 +1,74 @@ += Tahoe SFTP Frontend = + +All Tahoe client nodes can run a frontend SFTP server, allowing regular SFTP +clients to access the virtual filesystem. + +Since Tahoe does not use user accounts or passwords, the FTP server must be +configured with a way to translate a username (and either a password or +public key) into a root directory cap. Two mechanisms are provided. The first +is a simple flat file with one account per line. The second is an HTTP-based +login mechanism, backed by simple PHP script and a database. The latter form +is used by allmydata.com to provide secure access to customer rootcaps. + +The SFTP server must also be given a public/private host keypair. + +== Configuring a Keypair == + +First, generate a keypair for your server: + +% cd BASEDIR +% ssh-keygen -f private/ssh_host_rsa_key + +You will then use the following lines in the tahoe.cfg file: + + [sftpd] + sftp.host_pubkey_file = private/ssh_host_rsa_key.pub + sftp.host_privkey_file = private/ssh_host_rsa_key + +== Configuring an Account File == + +To configure the first form, create a file (probably in +BASEDIR/private/sftp.accounts) in which each non-comment/non-blank line is a +space-separated line of (USERNAME, PASSWORD/PUBKEY, ROOTCAP), like so: + +[TODO: the PUBKEY form is not yet supported] + + % cat BASEDIR/private/sftp.accounts + # This is a password file, (username, password/pubkey, rootcap) + alice password URI:DIR2:ioej8xmzrwilg772gzj4fhdg7a:wtiizszzz2rgmczv4wl6bqvbv33ag4kvbr6prz3u6w3geixa6m6a + bob sekrit URI:DIR2:6bdmeitystckbl9yqlw7g56f4e:serp5ioqxnh34mlbmzwvkp3odehsyrr7eytt5f64we3k9hhcrcja + carol ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAv2xHRVBoXnwxHLzthRD1wOWtyZ08b8n9cMZfJ58CBdBwAYP2NVNXc0XjRvswm5hnnAO+jyWPVNpXJjm9XllzYhODSNtSN+TXuJlUjhzA/T+ZwdgsgSAeHuuMQBoWt4Qc9HV6rHCdAeMhcnyqm6Q0sRAsfA/wfwiIgbvE7+cWpFa2anB6WeAnvK8+dMN0nvnkPE7GNyf/WFR1Ffuh9ifKdRB6yDNp17bQAqA3OWSFjch6fGPhp94y4g2jmTHlEUTyVsilgGqvGOutOVYnmOMnFijugU1Vu33G39GGzXWla6+fXwTk/oiVPiCYD7A7WFKes3nqMg8iVN6a6sxujrhnHQ== warner@fluxx URI:DIR2:6bdmeitystckbl9yqlw7g56f4e:serp5ioqxnh34mlbmzwvkp3odehsyrr7eytt5f64we3k9hhcrcja + +Note that if the second word of the line is "ssh-rsa" or "ssh-dss", the rest +of the line is parsed differently, so users cannot have a password equal to +either of these strings. + +Then add the following lines to the BASEDIR/tahoe.cfg file: + + [sftpd] + enabled = true + sftp.port = 8022 + sftp.host_pubkey_file = private/ssh_host_rsa_key.pub + sftp.host_privkey_file = private/ssh_host_rsa_key + sftp.accounts.file = private/sftp.accounts + +The SFTP server will listen on the given port number. The sftp.accounts.file +pathname will be interpreted relative to the node's BASEDIR. + +== Configuring an Account Server == + +Determine the URL of the account server, say https://example.com/login . Then +add the following lines to BASEDIR/tahoe.cfg: + + [sftpd] + enabled = true + sftp.port = 8022 + sftp.host_pubkey_file = private/ssh_host_rsa_key.pub + sftp.host_privkey_file = private/ssh_host_rsa_key + sftp.accounts.url = https://example.com/login + +== Dependencies == + +The Tahoe SFTP server requires the Twisted "Conch" component, which itself +requires the pycrypto package (note that pycrypto is distinct from the +pycryptopp that Tahoe uses). diff --git a/docs/frontends/webapi.txt b/docs/frontends/webapi.txt new file mode 100644 index 00000000..59ac9ed1 --- /dev/null +++ b/docs/frontends/webapi.txt @@ -0,0 +1,1320 @@ + += The Tahoe REST-ful Web API = + +1. Enabling the web-API port +2. Basic Concepts: GET, PUT, DELETE, POST +3. URLs, Machine-Oriented Interfaces +4. Browser Operations: Human-Oriented Interfaces +5. Welcome / Debug / Status pages +6. Static Files in /public_html +7. Safety and security issues -- names vs. URIs +8. Concurrency Issues + + +== Enabling the web-API port == + +Every Tahoe node is capable of running a built-in HTTP server. To enable +this, just write a port number into a file named "webport" in the node's base +directory. For example, writing "8123" into $NODEDIR/webport will cause the +node to run a webserver on port 8123. + +This string is actually a Twisted "strports" specification, meaning you can +get more control over the interface to which the server binds by supplying +additional arguments. For more details, see the documentation on +twisted.application.strports: +http://twistedmatrix.com/documents/current/api/twisted.application.strports.html + +Writing "tcp:8123:interface=127.0.0.1" into $NODEDIR/webport does the same +but binds to the loopback interface, ensuring that only the programs on the +local host can connect. Using +"ssl:8123:privateKey=mykey.pem:certKey=cert.pem" runs an SSL server. + +This webport can be set when the node is created by passing a --webport +option to the 'tahoe create-client' command. By default, the node listens on +port 8123, on the loopback (127.0.0.1) interface. + +== Basic Concepts == + +As described in architecture.txt, each file and directory in a Tahoe virtual +filesystem is referenced by an identifier that combines the designation of +the object with the authority to do something with it (such as read or modify +the contents). This identifier is called a "read-cap" or "write-cap", +depending upon whether it enables read-only or read-write access. These +"caps" are also referred to as URIs. + +The Tahoe web-based API is "REST-ful", meaning it implements the concepts of +"REpresentational State Transfer": the original scheme by which the World +Wide Web was intended to work. Each object (file or directory) is referenced +by a URL that includes the read- or write- cap. HTTP methods (GET, PUT, and +DELETE) are used to manipulate these objects. You can think of the URL as a +noun, and the method as a verb. + +In REST, the GET method is used to retrieve information about an object, or +to retrieve some representation of the object itself. When the object is a +file, the basic GET method will simply return the contents of that file. +Other variations (generally implemented by adding query parameters to the +URL) will return information about the object, such as metadata. GET +operations are required to have no side-effects. + +PUT is used to upload new objects into the filesystem, or to replace an +existing object. DELETE it used to delete objects from the filesystem. Both +PUT and DELETE are required to be idempotent: performing the same operation +multiple times must have the same side-effects as only performing it once. + +POST is used for more complicated actions that cannot be expressed as a GET, +PUT, or DELETE. POST operations can be thought of as a method call: sending +some message to the object referenced by the URL. In Tahoe, POST is also used +for operations that must be triggered by an HTML form (including upload and +delete), because otherwise a regular web browser has no way to accomplish +these tasks. + +Tahoe's web API is designed for two different consumers. The first is a +program that needs to manipulate the virtual file system. Such programs are +expected to use the RESTful interface described above. The second is a human +using a standard web browser to work with the filesystem. This user is given +a series of HTML pages with links to download files, and forms that use POST +actions to upload, rename, and delete files. + +== URLs == + +Tahoe uses a variety of read- and write- caps to identify files and +directories. The most common of these is the "immutable file read-cap", which +is used for most uploaded files. These read-caps look like the following: + + URI:CHK:ime6pvkaxuetdfah2p2f35pe54:4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a:3:10:202 + +The next most common is a "directory write-cap", which provides both read and +write access to a directory, and look like this: + + URI:DIR2:djrdkfawoqihigoett4g6auz6a:jx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq + +There are also "directory read-caps", which start with "URI:DIR2-RO:", and +give read-only access to a directory. Finally there are also mutable file +read- and write- caps, which start with "URI:SSK", and give access to mutable +files. + +(later versions of Tahoe will make these strings shorter, and will remove the +unfortunate colons, which must be escaped when these caps are embedded in +URLs). + +To refer to any Tahoe object through the web API, you simply need to combine +a prefix (which indicates the HTTP server to use) with the cap (which +indicates which object inside that server to access). Since the default Tahoe +webport is 8123, the most common prefix is one that will use a local node +listening on this port: + + http://127.0.0.1:8123/uri/ + $CAP + +So, to access the directory named above (which happens to be the +publically-writable sample directory on the Tahoe test grid, described at +http://allmydata.org/trac/tahoe/wiki/TestGrid), the URL would be: + + http://127.0.0.1:8123/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/ + +(note that the colons in the directory-cap are url-encoded into "%3A" +sequences). + +Likewise, to access the file named above, use: + + http://127.0.0.1:8123/uri/URI%3ACHK%3Aime6pvkaxuetdfah2p2f35pe54%3A4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a%3A3%3A10%3A202 + +In the rest of this document, we'll use "$DIRCAP" as shorthand for a read-cap +or write-cap that refers to a directory, and "$FILECAP" to abbreviate a cap +that refers to a file (whether mutable or immutable). So those URLs above can +be abbreviated as: + + http://127.0.0.1:8123/uri/$DIRCAP/ + http://127.0.0.1:8123/uri/$FILECAP + +The operation summaries below will abbreviate these further, by eliding the +server prefix. They will be displayed like this: + + /uri/$DIRCAP/ + /uri/$FILECAP + + +=== Child Lookup === + +Tahoe directories contain named children, just like directories in a regular +local filesystem. These children can be either files or subdirectories. + +If you have a Tahoe URL that refers to a directory, and want to reference a +named child inside it, just append the child name to the URL. For example, if +our sample directory contains a file named "welcome.txt", we can refer to +that file with: + + http://127.0.0.1:8123/uri/$DIRCAP/welcome.txt + +(or http://127.0.0.1:8123/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/welcome.txt) + +Multiple levels of subdirectories can be handled this way: + + http://127.0.0.1:8123/uri/$DIRCAP/tahoe-source/docs/webapi.txt + +In this document, when we need to refer to a URL that references a file using +this child-of-some-directory format, we'll use the following string: + + /uri/$DIRCAP/[SUBDIRS../]FILENAME + +The "[SUBDIRS../]" part means that there are zero or more (optional) +subdirectory names in the middle of the URL. The "FILENAME" at the end means +that this whole URL refers to a file of some sort, rather than to a +directory. + +When we need to refer specifically to a directory in this way, we'll write: + + /uri/$DIRCAP/[SUBDIRS../]SUBDIR + + +Note that all components of pathnames in URLs are required to be UTF-8 +encoded, so "resume.doc" (with an acute accent on both E's) would be accessed +with: + + http://127.0.0.1:8123/uri/$DIRCAP/r%C3%A9sum%C3%A9.doc + +Also note that the filenames inside upload POST forms are interpreted using +whatever character set was provided in the conventional '_charset' field, and +defaults to UTF-8 if not otherwise specified. The JSON representation of each +directory contains native unicode strings. Tahoe directories are specified to +contain unicode filenames, and cannot contain binary strings that are not +representable as such. + +All Tahoe operations that refer to existing files or directories must include +a suitable read- or write- cap in the URL: the webapi server won't add one +for you. If you don't know the cap, you can't access the file. This allows +the security properties of Tahoe caps to be extended across the webapi +interface. + +== Slow Operations, Progress, and Cancelling == + +Certain operations can be expected to take a long time. The "t=deep-check", +described below, will recursively visit every file and directory reachable +from a given starting point, which can take minutes or even hours for +extremely large directory structures. A single long-running HTTP request is a +fragile thing: proxies, NAT boxes, browsers, and users may all grow impatient +with waiting and give up on the connection. + +For this reason, long-running operations have an "operation handle", which +can be used to poll for status/progress messages while the operation +proceeds. This handle can also be used to cancel the operation. These handles +are created by the client, and passed in as a an "ophandle=" query argument +to the POST or PUT request which starts the operation. The following +operations can then be used to retrieve status: + +GET /operations/$HANDLE?output=HTML (with or without t=status) +GET /operations/$HANDLE?output=JSON (same) + + These two retrieve the current status of the given operation. Each operation + presents a different sort of information, but in general the page retrieved + will indicate: + + * whether the operation is complete, or if it is still running + * how much of the operation is complete, and how much is left, if possible + + The HTML form will include a meta-refresh tag, which will cause a regular + web browser to reload the status page about 60 seconds later. This tag will + be removed once the operation has completed. + + There may be more status information available under + /operations/$HANDLE/$ETC : i.e., the handle forms the root of a URL space. + +POST /operations/$HANDLE?t=cancel + + This terminates the operation, and returns an HTML page explaining what was + cancelled. If the operation handle has already expired (see below), this + POST will return a 404, which indicates that the operation is no longer + running (either it was completed or terminated). The response body will be + the same as a GET /operations/$HANDLE on this operation handle, and the + handle will be expired immediately afterwards. + +The operation handle will eventually expire, to avoid consuming an unbounded +amount of memory. The handle's time-to-live can be reset at any time, by +passing a retain-for= argument (with a count of seconds) to either the +initial POST that starts the operation, or the subsequent GET request which +asks about the operation. For example, if a 'GET +/operations/$HANDLE?output=JSON&retain-for=600' query is performed, the +handle will remain active for 600 seconds (10 minutes) after the GET was +received. + +In addition, if the GET includes a release-after-complete=True argument, and +the operation has completed, the operation handle will be released +immediately. + +If a retain-for= argument is not used, the default handle lifetimes are: + + * handles will remain valid at least until their operation finishes + * uncollected handles for finished operations (i.e. handles for operations + which have finished but for which the GET page has not been accessed since + completion) will remain valid for one hour, or for the total time consumed + by the operation, whichever is greater. + * collected handles (i.e. the GET page has been retrieved at least once + since the operation completed) will remain valid for ten minutes. + + +== Programmatic Operations == + +Now that we know how to build URLs that refer to files and directories in a +Tahoe virtual filesystem, what sorts of operations can we do with those URLs? +This section contains a catalog of GET, PUT, DELETE, and POST operations that +can be performed on these URLs. This set of operations are aimed at programs +that use HTTP to communicate with a Tahoe node. The next section describes +operations that are intended for web browsers. + +=== Reading A File === + +GET /uri/$FILECAP +GET /uri/$DIRCAP/[SUBDIRS../]FILENAME + + This will retrieve the contents of the given file. The HTTP response body + will contain the sequence of bytes that make up the file. + + To view files in a web browser, you may want more control over the + Content-Type and Content-Disposition headers. Please see the next section + "Browser Operations", for details on how to modify these URLs for that + purpose. + +=== Writing/Uploading A File === + +PUT /uri/$FILECAP +PUT /uri/$DIRCAP/[SUBDIRS../]FILENAME + + Upload a file, using the data from the HTTP request body, and add whatever + child links and subdirectories are necessary to make the file available at + the given location. Once this operation succeeds, a GET on the same URL will + retrieve the same contents that were just uploaded. This will create any + necessary intermediate subdirectories. + + To use the /uri/$FILECAP form, $FILECAP be a write-cap for a mutable file. + + In the /uri/$DIRCAP/[SUBDIRS../]FILENAME form, if the target file is a + writable mutable file, that files contents will be overwritten in-place. If + it is a read-cap for a mutable file, an error will occur. If it is an + immutable file, the old file will be discarded, and a new one will be put in + its place. + + When creating a new file, if "mutable=true" is in the query arguments, the + operation will create a mutable file instead of an immutable one. + + This returns the file-cap of the resulting file. If a new file was created + by this method, the HTTP response code (as dictated by rfc2616) will be set + to 201 CREATED. If an existing file was replaced or modified, the response + code will be 200 OK. + + Note that the 'curl -T localfile http://127.0.0.1:8123/uri/$DIRCAP/foo.txt' + command can be used to invoke this operation. + +PUT /uri + + This uploads a file, and produces a file-cap for the contents, but does not + attach the file into the virtual drive. No directories will be modified by + this operation. The file-cap is returned as the body of the HTTP response. + + If "mutable=true" is in the query arguments, the operation will create a + mutable file, and return its write-cap in the HTTP respose. The default is + to create an immutable file, returning the read-cap as a response. + +=== Creating A New Directory === + +POST /uri?t=mkdir +PUT /uri?t=mkdir + + Create a new empty directory and return its write-cap as the HTTP response + body. This does not make the newly created directory visible from the + virtual drive. The "PUT" operation is provided for backwards compatibility: + new code should use POST. + +POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir +PUT /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir + + Create new directories as necessary to make sure that the named target + ($DIRCAP/SUBDIRS../SUBDIR) is a directory. This will create additional + intermediate directories as necessary. If the named target directory already + exists, this will make no changes to it. + + This will return an error if a blocking file is present at any of the parent + names, preventing the server from creating the necessary parent directory. + + The write-cap of the new directory will be returned as the HTTP response + body. + +POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=NAME + + Create a new empty directory and attach it to the given existing directory. + This will create additional intermediate directories as necessary. + + The URL of this form points to the parent of the bottom-most new directory, + whereas the previous form has a URL that points directly to the bottom-most + new directory. + +=== Get Information About A File Or Directory (as JSON) === + +GET /uri/$FILECAP?t=json +GET /uri/$DIRCAP?t=json +GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json +GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json + + This returns a machine-parseable JSON-encoded description of the given + object. The JSON always contains a list, and the first element of the list + is always a flag that indicates whether the referenced object is a file or a + directory. If it is a file, then the information includes file size and URI, + like this: + + GET /uri/$FILECAP?t=json : + GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json : + + [ "filenode", { "ro_uri": file_uri, + "size": bytes, + "mutable": false, + "metadata": {"ctime": 1202777696.7564139, + "mtime": 1202777696.7564139 + } + } ] + + If it is a directory, then it includes information about the children of + this directory, as a mapping from child name to a set of data about the + child (the same data that would appear in a corresponding GET?t=json of the + child itself). The child entries also include metadata about each child, + including creation- and modification- timestamps. The output looks like + this: + + GET /uri/$DIRCAP?t=json : + GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json : + + [ "dirnode", { "rw_uri": read_write_uri, + "ro_uri": read_only_uri, + "mutable": true, + "children": { + "foo.txt": [ "filenode", { "ro_uri": uri, + "size": bytes, + "metadata": { + "ctime": 1202777696.7564139, + "mtime": 1202777696.7564139 + } + } ], + "subdir": [ "dirnode", { "rw_uri": rwuri, + "ro_uri": rouri, + "metadata": { + "ctime": 1202778102.7589991, + "mtime": 1202778111.2160511, + } + } ] + } } ] + + In the above example, note how 'children' is a dictionary in which the keys + are child names and the values depend upon whether the child is a file or a + directory. The value is mostly the same as the JSON representation of the + child object (except that directories do not recurse -- the "children" + entry of the child is omitted, and the directory view includes the metadata + that is stored on the directory edge). + + Then the rw_uri field will be present in the information about a directory + if and only if you have read-write access to that directory, + + +=== Attaching an existing File or Directory by its read- or write- cap === + +PUT /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri + + This attaches a child object (either a file or directory) to a specified + location in the virtual filesystem. The child object is referenced by its + read- or write- cap, as provided in the HTTP request body. This will create + intermediate directories as necessary. + + This is similar to a UNIX hardlink: by referencing a previously-uploaded + file (or previously-created directory) instead of uploading/creating a new + one, you can create two references to the same object. + + The read- or write- cap of the child is provided in the body of the HTTP + request, and this same cap is returned in the response body. + + The default behavior is to overwrite any existing object at the same + location. To prevent this (and make the operation return an error instead of + overwriting), add a "replace=false" argument, as "?t=uri&replace=false". + With replace=false, this operation will return an HTTP 409 "Conflict" error + if there is already an object at the given location, rather than overwriting + the existing object. Note that "true", "t", and "1" are all synonyms for + "True", and "false", "f", and "0" are synonyms for "False". the parameter is + case-insensitive. + +=== Deleting a File or Directory === + +DELETE /uri/$DIRCAP/[SUBDIRS../]CHILDNAME + + This removes the given name from its parent directory. CHILDNAME is the + name to be removed, and $DIRCAP/SUBDIRS.. indicates the directory that will + be modified. + + Note that this does not actually delete the file or directory that the name + points to from the tahoe grid -- it only removes the named reference from + this directory. If there are other names in this directory or in other + directories that point to the resource, then it will remain accessible + through those paths. Even if all names pointing to this object are removed + from their parent directories, then someone with possession of its read-cap + can continue to access the object through that cap. + + The object will only become completely unreachable once 1: there are no + reachable directories that reference it, and 2: nobody is holding a read- + or write- cap to the object. (This behavior is very similar to the way + hardlinks and anonymous files work in traditional unix filesystems). + + This operation will not modify more than a single directory. Intermediate + directories which were implicitly created by PUT or POST methods will *not* + be automatically removed by DELETE. + + This method returns the file- or directory- cap of the object that was just + removed. + +== Browser Operations == + +This section describes the HTTP operations that provide support for humans +running a web browser. Most of these operations use HTML forms that use POST +to drive the Tahoe node. + +Note that for all POST operations, the arguments listed can be provided +either as URL query arguments or as form body fields. URL query arguments are +separated from the main URL by "?", and from each other by "&". For example, +"POST /uri/$DIRCAP?t=upload&mutable=true". Form body fields are usually +specified by using elements. For clarity, the +descriptions below display the most significant arguments as URL query args. + +=== Viewing A Directory (as HTML) === + +GET /uri/$DIRCAP/[SUBDIRS../] + + This returns an HTML page, intended to be displayed to a human by a web + browser, which contains HREF links to all files and directories reachable + from this directory. These HREF links do not have a t= argument, meaning + that a human who follows them will get pages also meant for a human. It also + contains forms to upload new files, and to delete files and directories. + Those forms use POST methods to do their job. + +=== Viewing/Downloading a File === + +GET /uri/$FILECAP +GET /uri/$DIRCAP/[SUBDIRS../]FILENAME + + This will retrieve the contents of the given file. The HTTP response body + will contain the sequence of bytes that make up the file. + + If you want the HTTP response to include a useful Content-Type header, + either use the second form (which starts with a $DIRCAP), or add a + "filename=foo" query argument, like "GET /uri/$FILECAP?filename=foo.jpg". + The bare "GET /uri/$FILECAP" does not give the Tahoe node enough information + to determine a Content-Type (since Tahoe immutable files are merely + sequences of bytes, not typed+named file objects). + + If the URL has both filename= and "save=true" in the query arguments, then + the server to add a "Content-Disposition: attachment" header, along with a + filename= parameter. When a user clicks on such a link, most browsers will + offer to let the user save the file instead of displaying it inline (indeed, + most browsers will refuse to display it inline). "true", "t", "1", and other + case-insensitive equivalents are all treated the same. + + Character-set handling in URLs and HTTP headers is a dubious art[1]. For + maximum compatibility, Tahoe simply copies the bytes from the filename= + argument into the Content-Disposition header's filename= parameter, without + trying to interpret them in any particular way. + + +GET /named/$FILECAP/FILENAME + + This is an alternate download form which makes it easier to get the correct + filename. The Tahoe server will provide the contents of the given file, with + a Content-Type header derived from the given filename. This form is used to + get browsers to use the "Save Link As" feature correctly, and also helps + command-line tools like "wget" and "curl" use the right filename. Note that + this form can *only* be used with file caps; it is an error to use a + directory cap after the /named/ prefix. + +=== Get Information About A File Or Directory (as HTML) === + +GET /uri/$FILECAP?t=info +GET /uri/$DIRCAP/?t=info +GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR/?t=info +GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=info + + This returns a human-oriented HTML page with more detail about the selected + file or directory object. This page contains the following items: + + object size + storage index + JSON representation + raw contents (text/plain) + access caps (URIs): verify-cap, read-cap, write-cap (for mutable objects) + check/verify/repair form + deep-check/deep-size/deep-stats/manifest (for directories) + replace-conents form (for mutable files) + +=== Creating a Directory === + +POST /uri?t=mkdir + + This creates a new directory, but does not attach it to the virtual + filesystem. + + If a "redirect_to_result=true" argument is provided, then the HTTP response + will cause the web browser to be redirected to a /uri/$DIRCAP page that + gives access to the newly-created directory. If you bookmark this page, + you'll be able to get back to the directory again in the future. This is the + recommended way to start working with a Tahoe server: create a new unlinked + directory (using redirect_to_result=true), then bookmark the resulting + /uri/$DIRCAP page. There is a "Create Directory" button on the Welcome page + to invoke this action. + + If "redirect_to_result=true" is not provided (or is given a value of + "false"), then the HTTP response body will simply be the write-cap of the + new directory. + +POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=CHILDNAME + + This creates a new directory as a child of the designated SUBDIR. This will + create additional intermediate directories as necessary. + + If a "when_done=URL" argument is provided, the HTTP response will cause the + web browser to redirect to the given URL. This provides a convenient way to + return the browser to the directory that was just modified. Without a + when_done= argument, the HTTP response will simply contain the write-cap of + the directory that was just created. + + +=== Uploading a File === + +POST /uri?t=upload + + This uploads a file, and produces a file-cap for the contents, but does not + attach the file into the virtual drive. No directories will be modified by + this operation. + + The file must be provided as the "file" field of an HTML encoded form body, + produced in response to an HTML form like this: +
+ + + +
+ + If a "when_done=URL" argument is provided, the response body will cause the + browser to redirect to the given URL. If the when_done= URL has the string + "%(uri)s" in it, that string will be replaced by a URL-escaped form of the + newly created file-cap. (Note that without this substitution, there is no + way to access the file that was just uploaded). + + The default (in the absence of when_done=) is to return an HTML page that + describes the results of the upload. This page will contain information + about which storage servers were used for the upload, how long each + operation took, etc. + + If a "mutable=true" argument is provided, the operation will create a + mutable file, and the response body will contain the write-cap instead of + the upload results page. The default is to create an immutable file, + returning the upload results page as a response. + + +POST /uri/$DIRCAP/[SUBDIRS../]?t=upload + + This uploads a file, and attaches it as a new child of the given directory. + The file must be provided as the "file" field of an HTML encoded form body, + produced in response to an HTML form like this: +
+ + + +
+ + A "name=" argument can be provided to specify the new child's name, + otherwise it will be taken from the "filename" field of the upload form + (most web browsers will copy the last component of the original file's + pathname into this field). To avoid confusion, name= is not allowed to + contain a slash. + + If there is already a child with that name, and it is a mutable file, then + its contents are replaced with the data being uploaded. If it is not a + mutable file, the default behavior is to remove the existing child before + creating a new one. To prevent this (and make the operation return an error + instead of overwriting the old child), add a "replace=false" argument, as + "?t=upload&replace=false". With replace=false, this operation will return an + HTTP 409 "Conflict" error if there is already an object at the given + location, rather than overwriting the existing object. Note that "true", + "t", and "1" are all synonyms for "True", and "false", "f", and "0" are + synonyms for "False". the parameter is case-insensitive. + + This will create additional intermediate directories as necessary, although + since it is expected to be triggered by a form that was retrieved by "GET + /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will + already exist. + + If a "mutable=true" argument is provided, any new file that is created will + be a mutable file instead of an immutable one. will give the user a way to set this option. + + If a "when_done=URL" argument is provided, the HTTP response will cause the + web browser to redirect to the given URL. This provides a convenient way to + return the browser to the directory that was just modified. Without a + when_done= argument, the HTTP response will simply contain the file-cap of + the file that was just uploaded (a write-cap for mutable files, or a + read-cap for immutable files). + +POST /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=upload + + This also uploads a file and attaches it as a new child of the given + directory. It is a slight variant of the previous operation, as the URL + refers to the target file rather than the parent directory. It is otherwise + identical: this accepts mutable= and when_done= arguments too. + +POST /uri/$FILECAP?t=upload + +=== Attaching An Existing File Or Directory (by URI) === + +POST /uri/$DIRCAP/[SUBDIRS../]?t=uri&name=CHILDNAME&uri=CHILDCAP + + This attaches a given read- or write- cap "CHILDCAP" to the designated + directory, with a specified child name. This behaves much like the PUT t=uri + operation, and is a lot like a UNIX hardlink. + + This will create additional intermediate directories as necessary, although + since it is expected to be triggered by a form that was retrieved by "GET + /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will + already exist. + +=== Deleting A Child === + +POST /uri/$DIRCAP/[SUBDIRS../]?t=delete&name=CHILDNAME + + This instructs the node to delete a child object (file or subdirectory) from + the given directory. Note that the entire subtree is removed. This is + somewhat like "rm -rf" (from the point of view of the parent), but other + references into the subtree will see that the child subdirectories are not + modified by this operation. Only the link from the given directory to its + child is severed. + +=== Renaming A Child === + +POST /uri/$DIRCAP/[SUBDIRS../]?t=rename&from_name=OLD&to_name=NEW + + This instructs the node to rename a child of the given directory. This is + exactly the same as removing the child, then adding the same child-cap under + the new name. This operation cannot move the child to a different directory. + + This operation will replace any existing child of the new name, making it + behave like the UNIX "mv -f" command. + +=== Other Utilities === + +GET /uri?uri=$CAP + + This causes a redirect to /uri/$CAP, and retains any additional query + arguments (like filename= or save=). This is for the convenience of web + forms which allow the user to paste in a read- or write- cap (obtained + through some out-of-band channel, like IM or email). + + Note that this form merely redirects to the specific file or directory + indicated by the $CAP: unlike the GET /uri/$DIRCAP form, you cannot + traverse to children by appending additional path segments to the URL. + +GET /uri/$DIRCAP/[SUBDIRS../]?t=rename-form&name=$CHILDNAME + + This provides a useful facility to browser-based user interfaces. It + returns a page containing a form targetting the "POST $DIRCAP t=rename" + functionality described above, with the provided $CHILDNAME present in the + 'from_name' field of that form. I.e. this presents a form offering to + rename $CHILDNAME, requesting the new name, and submitting POST rename. + +GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri + + This returns the file- or directory- cap for the specified object. + +GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=readonly-uri + + This returns a read-only file- or directory- cap for the specified object. + If the object is an immutable file, this will return the same value as + t=uri. + +=== Debugging and Testing Features === + +These URLs are less-likely to be helpful to the casual Tahoe user, and are +mainly intended for developers. + +POST $URL?t=check + + This triggers the FileChecker to determine the current "health" of the + given file or directory, by counting how many shares are available. The + page that is returned will display the results. This can be used as a "show + me detailed information about this file" page. + + If a verify=true argument is provided, the node will perform a more + intensive check, downloading and verifying every single bit of every share. + + If an output=JSON argument is provided, the response will be + machine-readable JSON instead of human-oriented HTML. The data is a + dictionary with the following keys: + + storage-index: a base32-encoded string with the objects's storage index, + or an empty string for LIT files + results: a dictionary that describes the state of the file. For LIT files, + this dictionary has only the 'healthy' key, which will always be + True. For distributed files, this dictionary has the following + keys: + count-shares-good: the number of good shares that were found + count-shares-needed: 'k', the number of shares required for recovery + count-shares-expected: 'N', the number of total shares generated + count-good-share-hosts: the number of distinct storage servers with + good shares. If this number is less than + count-shares-good, then some shares are doubled + up, increasing the correlation of failures. This + indicates that one or more shares should be + moved to an otherwise unused server, if one is + available. + count-wrong-shares: for mutable files, the number of shares for + versions other than the 'best' one (highest + sequence number, highest roothash). These are + either old ... + count-recoverable-versions: for mutable files, the number of + recoverable versions of the file. For + a healthy file, this will equal 1. + count-unrecoverable-versions: for mutable files, the number of + unrecoverable versions of the file. + For a healthy file, this will be 0. + count-corrupt-shares: the number of shares with integrity failures + list-corrupt-shares: a list of "share locators", one for each share + that was found to be corrupt. Each share locator + is a list of (serverid, storage_index, sharenum). + needs-rebalancing: (bool) True if there are multiple shares on a single + storage server, indicating a reduction in reliability + that could be resolved by moving shares to new + servers. + servers-responding: list of base32-encoded storage server identifiers, + one for each server which responded to the share + query. + healthy: (bool) True if the file is completely healthy, False otherwise. + Healthy files have at least N good shares. Overlapping shares + (indicated by count-good-share-hosts < count-shares-good) do not + currently cause a file to be marked unhealthy. If there are at + least N good shares, then corrupt shares do not cause the file to + be marked unhealthy, although the corrupt shares will be listed + in the results (list-corrupt-shares) and should be manually + removed to wasting time in subsequent downloads (as the + downloader rediscovers the corruption and uses alternate shares). + sharemap: dict mapping share identifier to list of serverids + (base32-encoded strings). This indicates which servers are + holding which shares. For immutable files, the shareid is + an integer (the share number, from 0 to N-1). For + immutable files, it is a string of the form + 'seq%d-%s-sh%d', containing the sequence number, the + roothash, and the share number. + +POST $URL?t=start-deep-check (must add &ophandle=XYZ) + + This initiates a recursive walk of all files and directories reachable from + the target, performing a check on each one just like t=check. The result + page will contain a summary of the results, including details on any + file/directory that was not fully healthy. + + t=start-deep-check can only be invoked on a directory. An error (400 + BAD_REQUEST) will be signalled if it is invoked on a file. The recursive + walker will deal with loops safely. + + This accepts the same verify= argument as t=check. + + Since this operation can take a long time (perhaps a second per object), + the ophandle= argument is required (see "Slow Operations, Progress, and + Cancelling" above). The response to this POST will be a redirect to the + corresponding /operations/$HANDLE page (with output=HTML or output=JSON to + match the output= argument given to the POST). The deep-check operation + will continue to run in the background, and the /operations page should be + used to find out when the operation is done. + + Detailed checker results for non-healthy files and directories will be + available under /operations/$HANDLE/$STORAGEINDEX, and the HTML status will + contain links to these detailed results. + + The HTML /operations/$HANDLE page for incomplete operations will contain a + meta-refresh tag, set to 60 seconds, so that a browser which uses + deep-check will automatically poll until the operation has completed. + + The JSON page (/options/$HANDLE?output=JSON) will contain a + machine-readable JSON dictionary with the following keys: + + finished: a boolean, True if the operation is complete, else False. Some + of the remaining keys may not be present until the operation + is complete. + root-storage-index: a base32-encoded string with the storage index of the + starting point of the deep-check operation + count-objects-checked: count of how many objects were checked. Note that + non-distributed objects (i.e. small immutable LIT + files) are not checked, since for these objects, + the data is contained entirely in the URI. + count-objects-healthy: how many of those objects were completely healthy + count-objects-unhealthy: how many were damaged in some way + count-corrupt-shares: how many shares were found to have corruption, + summed over all objects examined + list-corrupt-shares: a list of "share identifiers", one for each share + that was found to be corrupt. Each share identifier + is a list of (serverid, storage_index, sharenum). + list-unhealthy-files: a list of (pathname, check-results) tuples, for + each file that was not fully healthy. 'pathname' is + a list of strings (which can be joined by "/" + characters to turn it into a single string), + relative to the directory on which deep-check was + invoked. The 'check-results' field is the same as + that returned by t=check&output=JSON, described + above. + stats: a dictionary with the same keys as the t=deep-stats command + (described below) + +POST $URL?t=check&repair=true + + This performs a health check of the given file or directory, and if the + checker determines that the object is not healthy (some shares are missing + or corrupted), it will perform a "repair". During repair, any missing + shares will be regenerated and uploaded to new servers. + + This accepts the same verify=true argument as t=check. When an output=JSON + argument is provided, the machine-readable JSON response will contain the + following keys: + + storage-index: a base32-encoded string with the objects's storage index, + or an empty string for LIT files + repair-attempted: (bool) True if repair was attempted + repair-successful: (bool) True if repair was attempted and the file was + fully healthy afterwards. False if no repair was + attempted, or if a repair attempt failed. + pre-repair-results: a dictionary that describes the state of the file + before any repair was performed. This contains exactly + the same keys as the 'results' value of the t=check + response, described above. + post-repair-results: a dictionary that describes the state of the file + after any repair was performed. If no repair was + performed, post-repair-results and pre-repair-results + will be the same. This contains exactly the same keys + as the 'results' value of the t=check response, + described above. + +POST $URL?t=start-deep-check&repair=true (must add &ophandle=XYZ) + + This triggers a recursive walk of all files and directories, performing a + t=check&repair=true on each one. + + Like t=start-deep-check without the repair= argument, this can only be + invoked on a directory. An error (400 BAD_REQUEST) will be signalled if it + is invoked on a file. The recursive walker will deal with loops safely. + + This accepts the same verify=true argument as t=start-deep-check. It uses + the same ophandle= mechanism as start-deep-check. When an output=JSON + argument is provided, the response will contain the following keys: + + finished: (bool) True if the operation has completed, else False + root-storage-index: a base32-encoded string with the storage index of the + starting point of the deep-check operation + count-objects-checked: count of how many objects were checked + + count-objects-healthy-pre-repair: how many of those objects were completely + healthy, before any repair + count-objects-unhealthy-pre-repair: how many were damaged in some way + count-objects-healthy-post-repair: how many of those objects were completely + healthy, after any repair + count-objects-unhealthy-post-repair: how many were damaged in some way + + count-repairs-attempted: repairs were attempted on this many objects. + count-repairs-successful: how many repairs resulted in healthy objects + count-repairs-unsuccessful: how many repairs resulted did not results in + completely healthy objects + count-corrupt-shares-pre-repair: how many shares were found to have + corruption, summed over all objects + examined, before any repair + count-corrupt-shares-post-repair: how many shares were found to have + corruption, summed over all objects + examined, after any repair + list-corrupt-shares: a list of "share identifiers", one for each share + that was found to be corrupt (before any repair). + Each share identifier is a list of (serverid, + storage_index, sharenum). + list-remaining-corrupt-shares: like list-corrupt-shares, but mutable shares + that were successfully repaired are not + included. These are shares that need + manual processing. Since immutable shares + cannot be modified by clients, all corruption + in immutable shares will be listed here. + list-unhealthy-files: a list of (pathname, check-results) tuples, for + each file that was not fully healthy. 'pathname' is + relative to the directory on which deep-check was + invoked. The 'check-results' field is the same as + that returned by t=check&repair=true&output=JSON, + described above. + stats: a dictionary with the same keys as the t=deep-stats command + (described below) + +POST $DIRURL?t=start-manifest (must add &ophandle=XYZ) + + This operation generates a "manfest" of the given directory tree, mostly + for debugging. This is a table of (path, filecap/dircap), for every object + reachable from the starting directory. The path will be slash-joined, and + the filecap/dircap will contain a link to the object in question. This page + gives immediate access to every object in the virtual filesystem subtree. + + This operation uses the same ophandle= mechanism as deep-check. The + corresponding /operations/$HANDLE page has three different forms. The + default is output=HTML. + + If output=text is added to the query args, the results will be a text/plain + list. The first line is special: it is either "finished: yes" or "finished: + no"; if the operation is not finished, you must periodically reload the + page until it completes. The rest of the results are a plaintext list, with + one file/dir per line, slash-separated, with the filecap/dircap separated + by a space. + + If output=JSON is added to the queryargs, then the results will be a + JSON-formatted dictionary with three keys: + + finished (bool): if False then you must reload the page until True + origin_si (str): the storage index of the starting point + manifest: list of (path, cap) tuples, where path is a list of strings. + +POST $DIRURL?t=start-deep-size (must add &ophandle=XYZ) + + This operation generates a number (in bytes) containing the sum of the + filesize of all directories and immutable files reachable from the given + directory. This is a rough lower bound of the total space consumed by this + subtree. It does not include space consumed by mutable files, nor does it + take expansion or encoding overhead into account. Later versions of the + code may improve this estimate upwards. + + The /operations/$HANDLE status output consists of two lines of text: + + finished: yes + size: 1234 + +POST $DIRURL?t=start-deep-stats (must add &ophandle=XYZ) + + This operation performs a recursive walk of all files and directories + reachable from the given directory, and generates a collection of + statistics about those objects. + + The result (obtained from the /operations/$OPHANDLE page) is a + JSON-serialized dictionary with the following keys (note that some of these + keys may be missing until 'finished' is True): + + finished: (bool) True if the operation has finished, else False + count-immutable-files: count of how many CHK files are in the set + count-mutable-files: same, for mutable files (does not include directories) + count-literal-files: same, for LIT files (data contained inside the URI) + count-files: sum of the above three + count-directories: count of directories + size-immutable-files: total bytes for all CHK files in the set, =deep-size + size-mutable-files (TODO): same, for current version of all mutable files + size-literal-files: same, for LIT files + size-directories: size of directories (includes size-literal-files) + size-files-histogram: list of (minsize, maxsize, count) buckets, + with a histogram of filesizes, 5dB/bucket, + for both literal and immutable files + largest-directory: number of children in the largest directory + largest-immutable-file: number of bytes in the largest CHK file + + size-mutable-files is not implemented, because it would require extra + queries to each mutable file to get their size. This may be implemented in + the future. + + Assuming no sharing, the basic space consumed by a single root directory is + the sum of size-immutable-files, size-mutable-files, and size-directories. + The actual disk space used by the shares is larger, because of the + following sources of overhead: + + integrity data + expansion due to erasure coding + share management data (leases) + backend (ext3) minimum block size + +== Other Useful Pages == + +The portion of the web namespace that begins with "/uri" (and "/named") is +dedicated to giving users (both humans and programs) access to the Tahoe +virtual filesystem. The rest of the namespace provides status information +about the state of the Tahoe node. + +GET / (the root page) + +This is the "Welcome Page", and contains a few distinct sections: + + Node information: library versions, local nodeid, services being provided. + + Filesystem Access Forms: create a new directory, view a file/directory by + URI, upload a file (unlinked), download a file by + URI. + + Grid Status: introducer information, helper information, connected storage + servers. + +GET /status/ + + This page lists all active uploads and downloads, and contains a short list + of recent upload/download operations. Each operation has a link to a page + that describes file sizes, servers that were involved, and the time consumed + in each phase of the operation. + + A GET of /status/?t=json will contain a machine-readable subset of the same + data. It returns a JSON-encoded dictionary. The only key defined at this + time is "active", with a value that is a list of operation dictionaries, one + for each active operation. Once an operation is completed, it will no longer + appear in data["active"] . + + Each op-dict contains a "type" key, one of "upload", "download", + "mapupdate", "publish", or "retrieve" (the first two are for immutable + files, while the latter three are for mutable files and directories). + + The "upload" op-dict will contain the following keys: + + type (string): "upload" + storage-index-string (string): a base32-encoded storage index + total-size (int): total size of the file + status (string): current status of the operation + progress-hash (float): 1.0 when the file has been hashed + progress-ciphertext (float): 1.0 when the file has been encrypted. + progress-encode-push (float): 1.0 when the file has been encoded and + pushed to the storage servers. For helper + uploads, the ciphertext value climbs to 1.0 + first, then encoding starts. For unassisted + uploads, ciphertext and encode-push progress + will climb at the same pace. + + The "download" op-dict will contain the following keys: + + type (string): "download" + storage-index-string (string): a base32-encoded storage index + total-size (int): total size of the file + status (string): current status of the operation + progress (float): 1.0 when the file has been fully downloaded + + Front-ends which want to report progress information are advised to simply + average together all the progress-* indicators. A slightly more accurate + value can be found by ignoring the progress-hash value (since the current + implementation hashes synchronously, so clients will probably never see + progress-hash!=1.0). + +GET /provisioning/ + + This page provides a basic tool to predict the likely storage and bandwidth + requirements of a large Tahoe grid. It provides forms to input things like + total number of users, number of files per user, average file size, number + of servers, expansion ratio, hard drive failure rate, etc. It then provides + numbers like how many disks per server will be needed, how many read + operations per second should be expected, and the likely MTBF for files in + the grid. This information is very preliminary, and the model upon which it + is based still needs a lot of work. + +GET /helper_status/ + + If the node is running a helper (i.e. if "$BASEDIR/run_helper" is + non-empty), then this page will provide a list of all the helper operations + currently in progress. If "?t=json" is added to the URL, it will return a + JSON-formatted list of helper statistics, which can then be used to produce + graphs to indicate how busy the helper is. + +GET /statistics/ + + This page provides "node statistics", which are collected from a variety of + sources. + + load_monitor: every second, the node schedules a timer for one second in + the future, then measures how late the subsequent callback + is. The "load_average" is this tardiness, measured in + seconds, averaged over the last minute. It is an indication + of a busy node, one which is doing more work than can be + completed in a timely fashion. The "max_load" value is the + highest value that has been seen in the last 60 seconds. + + cpu_monitor: every minute, the node uses time.clock() to measure how much + CPU time it has used, and it uses this value to produce + 1min/5min/15min moving averages. These values range from 0% + (0.0) to 100% (1.0), and indicate what fraction of the CPU + has been used by the Tahoe node. Not all operating systems + provide meaningful data to time.clock(): they may report 100% + CPU usage at all times. + + uploader: this counts how many immutable files (and bytes) have been + uploaded since the node was started + + downloader: this counts how many immutable files have been downloaded + since the node was started + + publishes: this counts how many mutable files (including directories) have + been modified since the node was started + + retrieves: this counts how many mutable files (including directories) have + been read since the node was started + + There are other statistics that are tracked by the node. The "raw stats" + section shows a formatted dump of all of them. + + By adding "?t=json" to the URL, the node will return a JSON-formatted + dictionary of stats values, which can be used by other tools to produce + graphs of node behavior. The misc/munin/ directory in the source + distribution provides some tools to produce these graphs. + +GET / (introducer status) + + For Introducer nodes, the welcome page displays information about both + clients and servers which are connected to the introducer. Servers make + "service announcements", and these are listed in a table. Clients will + subscribe to hear about service announcements, and these subscriptions are + listed in a separate table. Both tables contain information about what + version of Tahoe is being run by the remote node, their advertised and + outbound IP addresses, their nodeid and nickname, and how long they have + been available. + + By adding "?t=json" to the URL, the node will return a JSON-formatted + dictionary of stats values, which can be used to produce graphs of connected + clients over time. + + +== Static Files in /public_html == + +The webapi server will take any request for a URL that starts with /static +and serve it from a configurable directory which defaults to +$BASEDIR/public_html . This is configured by setting the "[node]web.static" +value in $BASEDIR/tahoe.cfg . If this is left at the default value of +"public_html", then http://localhost:8123/static/subdir/foo.html will be +served with the contents of the file $BASEDIR/public_html/subdir/foo.html . + +This can be useful to serve a javascript application which provides a +prettier front-end to the rest of the Tahoe webapi. + + +== safety and security issues -- names vs. URIs == + +Summary: use explicit file- and dir- caps whenever possible, to reduce the +potential for surprises when the virtual drive is changed while you aren't +looking. + +The vdrive provides a mutable filesystem, but the ways that the filesystem +can change are limited. The only thing that can change is that the mapping +from child names to child objects that each directory contains can be changed +by adding a new child name pointing to an object, removing an existing child +name, or changing an existing child name to point to a different object. + +Obviously if you query tahoe for information about the filesystem and then +act upon the filesystem (such as by getting a listing of the contents of a +directory and then adding a file to the directory), then the filesystem might +have been changed after you queried it and before you acted upon it. +However, if you use the URI instead of the pathname of an object when you act +upon the object, then the only change that can happen is when the object is a +directory then the set of child names it has might be different. If, on the +other hand, you act upon the object using its pathname, then a different +object might be in that place, which can result in more kinds of surprises. + +For example, suppose you are writing code which recursively downloads the +contents of a directory. The first thing your code does is fetch the listing +of the contents of the directory. For each child that it fetched, if that +child is a file then it downloads the file, and if that child is a directory +then it recurses into that directory. Now, if the download and the recurse +actions are performed using the child's name, then the results might be +wrong, because for example a child name that pointed to a sub-directory when +you listed the directory might have been changed to point to a file (in which +case your attempt to recurse into it would result in an error and the file +would be skipped), or a child name that pointed to a file when you listed the +directory might now point to a sub-directory (in which case your attempt to +download the child would result in a file containing HTML text describing the +sub-directory!). + +If your recursive algorithm uses the uri of the child instead of the name of +the child, then those kinds of mistakes just can't happen. Note that both the +child's name and the child's URI are included in the results of listing the +parent directory, so it isn't any harder to use the URI for this purpose. + +In general, use names if you want "whatever object (whether file or +directory) is found by following this name (or sequence of names) when my +request reaches the server". Use URIs if you want "this particular object". + +== Concurrency Issues == + +Tahoe uses both mutable and immutable files. Mutable files can be created +explicitly by doing an upload with ?mutable=true added, or implicitly by +creating a new directory (since a directory is just a special way to +interpret a given mutable file). + +Mutable files suffer from the same consistency-vs-availability tradeoff that +all distributed data storage systems face. It is not possible to +simultaneously achieve perfect consistency and perfect availability in the +face of network partitions (servers being unreachable or faulty). + +Tahoe tries to achieve a reasonable compromise, but there is a basic rule in +place, known as the Prime Coordination Directive: "Don't Do That". What this +means is that if write-access to a mutable file is available to several +parties, then those parties are responsible for coordinating their activities +to avoid multiple simultaneous updates. This could be achieved by having +these parties talk to each other and using some sort of locking mechanism, or +by serializing all changes through a single writer. + +The consequences of performing uncoordinated writes can vary. Some of the +writers may lose their changes, as somebody else wins the race condition. In +many cases the file will be left in an "unhealthy" state, meaning that there +are not as many redundant shares as we would like (reducing the reliability +of the file against server failures). In the worst case, the file can be left +in such an unhealthy state that no version is recoverable, even the old ones. +It is this small possibility of data loss that prompts us to issue the Prime +Coordination Directive. + +Tahoe nodes implement internal serialization to make sure that a single Tahoe +node cannot conflict with itself. For example, it is safe to issue two +directory modification requests to a single tahoe node's webapi server at the +same time, because the Tahoe node will internally delay one of them until +after the other has finished being applied. (This feature was introduced in +Tahoe-1.1; back with Tahoe-1.0 the web client was responsible for serializing +web requests themselves). + +For more details, please see the "Consistency vs Availability" and "The Prime +Coordination Directive" sections of mutable.txt, in the same directory as +this file. + + +[1]: URLs and HTTP and UTF-8, Oh My + + HTTP does not provide a mechanism to specify the character set used to + encode non-ascii names in URLs (rfc2396#2.1). We prefer the convention that + the filename= argument shall be a URL-encoded UTF-8 encoded unicode object. + For example, suppose we want to provoke the server into using a filename of + "f i a n c e-acute e" (i.e. F I A N C U+00E9 E). The UTF-8 encoding of this + is 0x66 0x69 0x61 0x6e 0x63 0xc3 0xa9 0x65 (or "fianc\xC3\xA9e", as python's + repr() function would show). To encode this into a URL, the non-printable + characters must be escaped with the urlencode '%XX' mechansim, giving us + "fianc%C3%A9e". Thus, the first line of the HTTP request will be "GET + /uri/CAP...?save=true&filename=fianc%C3%A9e HTTP/1.1". Not all browsers + provide this: IE7 uses the Latin-1 encoding, which is fianc%E9e. + + The response header will need to indicate a non-ASCII filename. The actual + mechanism to do this is not clear. For ASCII filenames, the response header + would look like: + + Content-Disposition: attachment; filename="english.txt" + + If Tahoe were to enforce the utf-8 convention, it would need to decode the + URL argument into a unicode string, and then encode it back into a sequence + of bytes when creating the response header. One possibility would be to use + unencoded utf-8. Developers suggest that IE7 might accept this: + + #1: Content-Disposition: attachment; filename="fianc\xC3\xA9e" + (note, the last four bytes of that line, not including the newline, are + 0xC3 0xA9 0x65 0x22) + + RFC2231#4 (dated 1997): suggests that the following might work, and some + developers (http://markmail.org/message/dsjyokgl7hv64ig3) have reported that + it is supported by firefox (but not IE7): + + #2: Content-Disposition: attachment; filename*=utf-8''fianc%C3%A9e + + My reading of RFC2616#19.5.1 (which defines Content-Disposition) says that + the filename= parameter is defined to be wrapped in quotes (presumeably to + allow spaces without breaking the parsing of subsequent parameters), which + would give us: + + #3: Content-Disposition: attachment; filename*=utf-8''"fianc%C3%A9e" + + However this is contrary to the examples in the email thread listed above. + + Developers report that IE7 (when it is configured for UTF-8 URL encoding, + which is not the default in asian countries), will accept: + + #4: Content-Disposition: attachment; filename=fianc%C3%A9e + + However, for maximum compatibility, Tahoe simply copies bytes from the URL + into the response header, rather than enforcing the utf-8 convention. This + means it does not try to decode the filename from the URL argument, nor does + it encode the filename into the response header. diff --git a/docs/ftp.txt b/docs/ftp.txt deleted file mode 100644 index 50774c6e..00000000 --- a/docs/ftp.txt +++ /dev/null @@ -1,112 +0,0 @@ -= Tahoe FTP Frontend = - -All Tahoe client nodes can run a frontend FTP server, allowing regular FTP -clients to access the virtual filesystem. - -Since Tahoe does not use user accounts or passwords, the FTP server must be -configured with a way to translate USER+PASS into a root directory cap. Two -mechanisms are provided. The first is a simple flat file with one account per -line. The second is an HTTP-based login mechanism, backed by simple PHP -script and a database. The latter form is used by allmydata.com to provide -secure access to customer rootcaps. - -== Configuring an Account File == - -To configure the first form, create a file (probably in -BASEDIR/private/ftp.accounts) in which each non-comment/non-blank line is a -space-separated line of (USERNAME, PASSWORD, ROOTCAP), like so: - - % cat BASEDIR/private/ftp.accounts - # This is a password file, (username, password, rootcap) - alice password URI:DIR2:ioej8xmzrwilg772gzj4fhdg7a:wtiizszzz2rgmczv4wl6bqvbv33ag4kvbr6prz3u6w3geixa6m6a - bob sekrit URI:DIR2:6bdmeitystckbl9yqlw7g56f4e:serp5ioqxnh34mlbmzwvkp3odehsyrr7eytt5f64we3k9hhcrcja - -Then add the following lines to the BASEDIR/tahoe.cfg file: - - [ftpd] - enabled = true - ftp.port = 8021 - ftp.accounts.file = private/ftp.accounts - -The FTP server will listen on the given port number. The ftp.accounts.file -pathname will be interpreted relative to the node's BASEDIR. - -== Configuring an Account Server == - -Determine the URL of the account server, say https://example.com/login . Then -add the following lines to BASEDIR/tahoe.cfg: - - [ftpd] - enabled = true - ftp.port = 8021 - ftp.accounts.url = https://example.com/login - -== Dependencies == - -The FTP server requires code in Twisted that enables asynchronous closing of -file-upload operations. This code was not in the Twisted-8.1.0 release, and -has not been committed to SVN trunk as of r24943. So it may be necessary to -apply the following patch. The Tahoe node refuse to start the FTP server if -it detects that this patch has not been applied. - -Index: twisted/protocols/ftp.py -=================================================================== ---- twisted/protocols/ftp.py (revision 24956) -+++ twisted/protocols/ftp.py (working copy) -@@ -1049,7 +1049,6 @@ - cons = ASCIIConsumerWrapper(cons) - - d = self.dtpInstance.registerConsumer(cons) -- d.addCallbacks(cbSent, ebSent) - - # Tell them what to doooo - if self.dtpInstance.isConnected: -@@ -1062,6 +1061,8 @@ - def cbOpened(file): - d = file.receive() - d.addCallback(cbConsumer) -+ d.addCallback(lambda ignored: file.close()) -+ d.addCallbacks(cbSent, ebSent) - return d - - def ebOpened(err): -@@ -1434,7 +1435,14 @@ - @rtype: C{Deferred} of C{IConsumer} - """ - -+ def close(): -+ """ -+ Perform any post-write work that needs to be done. This method may -+ only be invoked once on each provider, and will always be invoked -+ after receive(). - -+ @rtype: C{Deferred} of anything: the value is ignored -+ """ - - def _getgroups(uid): - """Return the primary and supplementary groups for the given UID. -@@ -1795,6 +1803,8 @@ - # FileConsumer will close the file object - return defer.succeed(FileConsumer(self.fObj)) - -+ def close(self): -+ return defer.succeed(None) - - - class FTPRealm: -Index: twisted/vfs/adapters/ftp.py -=================================================================== ---- twisted/vfs/adapters/ftp.py (revision 24956) -+++ twisted/vfs/adapters/ftp.py (working copy) -@@ -295,6 +295,11 @@ - """ - return defer.succeed(IConsumer(self.node)) - -+ def close(self): -+ """ -+ Perform post-write actions. -+ """ -+ return defer.succeed(None) - - - class _FileToConsumerAdapter(object): diff --git a/docs/sftp.txt b/docs/sftp.txt deleted file mode 100644 index cb085823..00000000 --- a/docs/sftp.txt +++ /dev/null @@ -1,74 +0,0 @@ -= Tahoe SFTP Frontend = - -All Tahoe client nodes can run a frontend SFTP server, allowing regular SFTP -clients to access the virtual filesystem. - -Since Tahoe does not use user accounts or passwords, the FTP server must be -configured with a way to translate a username (and either a password or -public key) into a root directory cap. Two mechanisms are provided. The first -is a simple flat file with one account per line. The second is an HTTP-based -login mechanism, backed by simple PHP script and a database. The latter form -is used by allmydata.com to provide secure access to customer rootcaps. - -The SFTP server must also be given a public/private host keypair. - -== Configuring a Keypair == - -First, generate a keypair for your server: - -% cd BASEDIR -% ssh-keygen -f private/ssh_host_rsa_key - -You will then use the following lines in the tahoe.cfg file: - - [sftpd] - sftp.host_pubkey_file = private/ssh_host_rsa_key.pub - sftp.host_privkey_file = private/ssh_host_rsa_key - -== Configuring an Account File == - -To configure the first form, create a file (probably in -BASEDIR/private/sftp.accounts) in which each non-comment/non-blank line is a -space-separated line of (USERNAME, PASSWORD/PUBKEY, ROOTCAP), like so: - -[TODO: the PUBKEY form is not yet supported] - - % cat BASEDIR/private/sftp.accounts - # This is a password file, (username, password/pubkey, rootcap) - alice password URI:DIR2:ioej8xmzrwilg772gzj4fhdg7a:wtiizszzz2rgmczv4wl6bqvbv33ag4kvbr6prz3u6w3geixa6m6a - bob sekrit URI:DIR2:6bdmeitystckbl9yqlw7g56f4e:serp5ioqxnh34mlbmzwvkp3odehsyrr7eytt5f64we3k9hhcrcja - carol ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAv2xHRVBoXnwxHLzthRD1wOWtyZ08b8n9cMZfJ58CBdBwAYP2NVNXc0XjRvswm5hnnAO+jyWPVNpXJjm9XllzYhODSNtSN+TXuJlUjhzA/T+ZwdgsgSAeHuuMQBoWt4Qc9HV6rHCdAeMhcnyqm6Q0sRAsfA/wfwiIgbvE7+cWpFa2anB6WeAnvK8+dMN0nvnkPE7GNyf/WFR1Ffuh9ifKdRB6yDNp17bQAqA3OWSFjch6fGPhp94y4g2jmTHlEUTyVsilgGqvGOutOVYnmOMnFijugU1Vu33G39GGzXWla6+fXwTk/oiVPiCYD7A7WFKes3nqMg8iVN6a6sxujrhnHQ== warner@fluxx URI:DIR2:6bdmeitystckbl9yqlw7g56f4e:serp5ioqxnh34mlbmzwvkp3odehsyrr7eytt5f64we3k9hhcrcja - -Note that if the second word of the line is "ssh-rsa" or "ssh-dss", the rest -of the line is parsed differently, so users cannot have a password equal to -either of these strings. - -Then add the following lines to the BASEDIR/tahoe.cfg file: - - [sftpd] - enabled = true - sftp.port = 8022 - sftp.host_pubkey_file = private/ssh_host_rsa_key.pub - sftp.host_privkey_file = private/ssh_host_rsa_key - sftp.accounts.file = private/sftp.accounts - -The SFTP server will listen on the given port number. The sftp.accounts.file -pathname will be interpreted relative to the node's BASEDIR. - -== Configuring an Account Server == - -Determine the URL of the account server, say https://example.com/login . Then -add the following lines to BASEDIR/tahoe.cfg: - - [sftpd] - enabled = true - sftp.port = 8022 - sftp.host_pubkey_file = private/ssh_host_rsa_key.pub - sftp.host_privkey_file = private/ssh_host_rsa_key - sftp.accounts.url = https://example.com/login - -== Dependencies == - -The Tahoe SFTP server requires the Twisted "Conch" component, which itself -requires the pycrypto package (note that pycrypto is distinct from the -pycryptopp that Tahoe uses). diff --git a/docs/webapi.txt b/docs/webapi.txt deleted file mode 100644 index 59ac9ed1..00000000 --- a/docs/webapi.txt +++ /dev/null @@ -1,1320 +0,0 @@ - -= The Tahoe REST-ful Web API = - -1. Enabling the web-API port -2. Basic Concepts: GET, PUT, DELETE, POST -3. URLs, Machine-Oriented Interfaces -4. Browser Operations: Human-Oriented Interfaces -5. Welcome / Debug / Status pages -6. Static Files in /public_html -7. Safety and security issues -- names vs. URIs -8. Concurrency Issues - - -== Enabling the web-API port == - -Every Tahoe node is capable of running a built-in HTTP server. To enable -this, just write a port number into a file named "webport" in the node's base -directory. For example, writing "8123" into $NODEDIR/webport will cause the -node to run a webserver on port 8123. - -This string is actually a Twisted "strports" specification, meaning you can -get more control over the interface to which the server binds by supplying -additional arguments. For more details, see the documentation on -twisted.application.strports: -http://twistedmatrix.com/documents/current/api/twisted.application.strports.html - -Writing "tcp:8123:interface=127.0.0.1" into $NODEDIR/webport does the same -but binds to the loopback interface, ensuring that only the programs on the -local host can connect. Using -"ssl:8123:privateKey=mykey.pem:certKey=cert.pem" runs an SSL server. - -This webport can be set when the node is created by passing a --webport -option to the 'tahoe create-client' command. By default, the node listens on -port 8123, on the loopback (127.0.0.1) interface. - -== Basic Concepts == - -As described in architecture.txt, each file and directory in a Tahoe virtual -filesystem is referenced by an identifier that combines the designation of -the object with the authority to do something with it (such as read or modify -the contents). This identifier is called a "read-cap" or "write-cap", -depending upon whether it enables read-only or read-write access. These -"caps" are also referred to as URIs. - -The Tahoe web-based API is "REST-ful", meaning it implements the concepts of -"REpresentational State Transfer": the original scheme by which the World -Wide Web was intended to work. Each object (file or directory) is referenced -by a URL that includes the read- or write- cap. HTTP methods (GET, PUT, and -DELETE) are used to manipulate these objects. You can think of the URL as a -noun, and the method as a verb. - -In REST, the GET method is used to retrieve information about an object, or -to retrieve some representation of the object itself. When the object is a -file, the basic GET method will simply return the contents of that file. -Other variations (generally implemented by adding query parameters to the -URL) will return information about the object, such as metadata. GET -operations are required to have no side-effects. - -PUT is used to upload new objects into the filesystem, or to replace an -existing object. DELETE it used to delete objects from the filesystem. Both -PUT and DELETE are required to be idempotent: performing the same operation -multiple times must have the same side-effects as only performing it once. - -POST is used for more complicated actions that cannot be expressed as a GET, -PUT, or DELETE. POST operations can be thought of as a method call: sending -some message to the object referenced by the URL. In Tahoe, POST is also used -for operations that must be triggered by an HTML form (including upload and -delete), because otherwise a regular web browser has no way to accomplish -these tasks. - -Tahoe's web API is designed for two different consumers. The first is a -program that needs to manipulate the virtual file system. Such programs are -expected to use the RESTful interface described above. The second is a human -using a standard web browser to work with the filesystem. This user is given -a series of HTML pages with links to download files, and forms that use POST -actions to upload, rename, and delete files. - -== URLs == - -Tahoe uses a variety of read- and write- caps to identify files and -directories. The most common of these is the "immutable file read-cap", which -is used for most uploaded files. These read-caps look like the following: - - URI:CHK:ime6pvkaxuetdfah2p2f35pe54:4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a:3:10:202 - -The next most common is a "directory write-cap", which provides both read and -write access to a directory, and look like this: - - URI:DIR2:djrdkfawoqihigoett4g6auz6a:jx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq - -There are also "directory read-caps", which start with "URI:DIR2-RO:", and -give read-only access to a directory. Finally there are also mutable file -read- and write- caps, which start with "URI:SSK", and give access to mutable -files. - -(later versions of Tahoe will make these strings shorter, and will remove the -unfortunate colons, which must be escaped when these caps are embedded in -URLs). - -To refer to any Tahoe object through the web API, you simply need to combine -a prefix (which indicates the HTTP server to use) with the cap (which -indicates which object inside that server to access). Since the default Tahoe -webport is 8123, the most common prefix is one that will use a local node -listening on this port: - - http://127.0.0.1:8123/uri/ + $CAP - -So, to access the directory named above (which happens to be the -publically-writable sample directory on the Tahoe test grid, described at -http://allmydata.org/trac/tahoe/wiki/TestGrid), the URL would be: - - http://127.0.0.1:8123/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/ - -(note that the colons in the directory-cap are url-encoded into "%3A" -sequences). - -Likewise, to access the file named above, use: - - http://127.0.0.1:8123/uri/URI%3ACHK%3Aime6pvkaxuetdfah2p2f35pe54%3A4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a%3A3%3A10%3A202 - -In the rest of this document, we'll use "$DIRCAP" as shorthand for a read-cap -or write-cap that refers to a directory, and "$FILECAP" to abbreviate a cap -that refers to a file (whether mutable or immutable). So those URLs above can -be abbreviated as: - - http://127.0.0.1:8123/uri/$DIRCAP/ - http://127.0.0.1:8123/uri/$FILECAP - -The operation summaries below will abbreviate these further, by eliding the -server prefix. They will be displayed like this: - - /uri/$DIRCAP/ - /uri/$FILECAP - - -=== Child Lookup === - -Tahoe directories contain named children, just like directories in a regular -local filesystem. These children can be either files or subdirectories. - -If you have a Tahoe URL that refers to a directory, and want to reference a -named child inside it, just append the child name to the URL. For example, if -our sample directory contains a file named "welcome.txt", we can refer to -that file with: - - http://127.0.0.1:8123/uri/$DIRCAP/welcome.txt - -(or http://127.0.0.1:8123/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/welcome.txt) - -Multiple levels of subdirectories can be handled this way: - - http://127.0.0.1:8123/uri/$DIRCAP/tahoe-source/docs/webapi.txt - -In this document, when we need to refer to a URL that references a file using -this child-of-some-directory format, we'll use the following string: - - /uri/$DIRCAP/[SUBDIRS../]FILENAME - -The "[SUBDIRS../]" part means that there are zero or more (optional) -subdirectory names in the middle of the URL. The "FILENAME" at the end means -that this whole URL refers to a file of some sort, rather than to a -directory. - -When we need to refer specifically to a directory in this way, we'll write: - - /uri/$DIRCAP/[SUBDIRS../]SUBDIR - - -Note that all components of pathnames in URLs are required to be UTF-8 -encoded, so "resume.doc" (with an acute accent on both E's) would be accessed -with: - - http://127.0.0.1:8123/uri/$DIRCAP/r%C3%A9sum%C3%A9.doc - -Also note that the filenames inside upload POST forms are interpreted using -whatever character set was provided in the conventional '_charset' field, and -defaults to UTF-8 if not otherwise specified. The JSON representation of each -directory contains native unicode strings. Tahoe directories are specified to -contain unicode filenames, and cannot contain binary strings that are not -representable as such. - -All Tahoe operations that refer to existing files or directories must include -a suitable read- or write- cap in the URL: the webapi server won't add one -for you. If you don't know the cap, you can't access the file. This allows -the security properties of Tahoe caps to be extended across the webapi -interface. - -== Slow Operations, Progress, and Cancelling == - -Certain operations can be expected to take a long time. The "t=deep-check", -described below, will recursively visit every file and directory reachable -from a given starting point, which can take minutes or even hours for -extremely large directory structures. A single long-running HTTP request is a -fragile thing: proxies, NAT boxes, browsers, and users may all grow impatient -with waiting and give up on the connection. - -For this reason, long-running operations have an "operation handle", which -can be used to poll for status/progress messages while the operation -proceeds. This handle can also be used to cancel the operation. These handles -are created by the client, and passed in as a an "ophandle=" query argument -to the POST or PUT request which starts the operation. The following -operations can then be used to retrieve status: - -GET /operations/$HANDLE?output=HTML (with or without t=status) -GET /operations/$HANDLE?output=JSON (same) - - These two retrieve the current status of the given operation. Each operation - presents a different sort of information, but in general the page retrieved - will indicate: - - * whether the operation is complete, or if it is still running - * how much of the operation is complete, and how much is left, if possible - - The HTML form will include a meta-refresh tag, which will cause a regular - web browser to reload the status page about 60 seconds later. This tag will - be removed once the operation has completed. - - There may be more status information available under - /operations/$HANDLE/$ETC : i.e., the handle forms the root of a URL space. - -POST /operations/$HANDLE?t=cancel - - This terminates the operation, and returns an HTML page explaining what was - cancelled. If the operation handle has already expired (see below), this - POST will return a 404, which indicates that the operation is no longer - running (either it was completed or terminated). The response body will be - the same as a GET /operations/$HANDLE on this operation handle, and the - handle will be expired immediately afterwards. - -The operation handle will eventually expire, to avoid consuming an unbounded -amount of memory. The handle's time-to-live can be reset at any time, by -passing a retain-for= argument (with a count of seconds) to either the -initial POST that starts the operation, or the subsequent GET request which -asks about the operation. For example, if a 'GET -/operations/$HANDLE?output=JSON&retain-for=600' query is performed, the -handle will remain active for 600 seconds (10 minutes) after the GET was -received. - -In addition, if the GET includes a release-after-complete=True argument, and -the operation has completed, the operation handle will be released -immediately. - -If a retain-for= argument is not used, the default handle lifetimes are: - - * handles will remain valid at least until their operation finishes - * uncollected handles for finished operations (i.e. handles for operations - which have finished but for which the GET page has not been accessed since - completion) will remain valid for one hour, or for the total time consumed - by the operation, whichever is greater. - * collected handles (i.e. the GET page has been retrieved at least once - since the operation completed) will remain valid for ten minutes. - - -== Programmatic Operations == - -Now that we know how to build URLs that refer to files and directories in a -Tahoe virtual filesystem, what sorts of operations can we do with those URLs? -This section contains a catalog of GET, PUT, DELETE, and POST operations that -can be performed on these URLs. This set of operations are aimed at programs -that use HTTP to communicate with a Tahoe node. The next section describes -operations that are intended for web browsers. - -=== Reading A File === - -GET /uri/$FILECAP -GET /uri/$DIRCAP/[SUBDIRS../]FILENAME - - This will retrieve the contents of the given file. The HTTP response body - will contain the sequence of bytes that make up the file. - - To view files in a web browser, you may want more control over the - Content-Type and Content-Disposition headers. Please see the next section - "Browser Operations", for details on how to modify these URLs for that - purpose. - -=== Writing/Uploading A File === - -PUT /uri/$FILECAP -PUT /uri/$DIRCAP/[SUBDIRS../]FILENAME - - Upload a file, using the data from the HTTP request body, and add whatever - child links and subdirectories are necessary to make the file available at - the given location. Once this operation succeeds, a GET on the same URL will - retrieve the same contents that were just uploaded. This will create any - necessary intermediate subdirectories. - - To use the /uri/$FILECAP form, $FILECAP be a write-cap for a mutable file. - - In the /uri/$DIRCAP/[SUBDIRS../]FILENAME form, if the target file is a - writable mutable file, that files contents will be overwritten in-place. If - it is a read-cap for a mutable file, an error will occur. If it is an - immutable file, the old file will be discarded, and a new one will be put in - its place. - - When creating a new file, if "mutable=true" is in the query arguments, the - operation will create a mutable file instead of an immutable one. - - This returns the file-cap of the resulting file. If a new file was created - by this method, the HTTP response code (as dictated by rfc2616) will be set - to 201 CREATED. If an existing file was replaced or modified, the response - code will be 200 OK. - - Note that the 'curl -T localfile http://127.0.0.1:8123/uri/$DIRCAP/foo.txt' - command can be used to invoke this operation. - -PUT /uri - - This uploads a file, and produces a file-cap for the contents, but does not - attach the file into the virtual drive. No directories will be modified by - this operation. The file-cap is returned as the body of the HTTP response. - - If "mutable=true" is in the query arguments, the operation will create a - mutable file, and return its write-cap in the HTTP respose. The default is - to create an immutable file, returning the read-cap as a response. - -=== Creating A New Directory === - -POST /uri?t=mkdir -PUT /uri?t=mkdir - - Create a new empty directory and return its write-cap as the HTTP response - body. This does not make the newly created directory visible from the - virtual drive. The "PUT" operation is provided for backwards compatibility: - new code should use POST. - -POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir -PUT /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir - - Create new directories as necessary to make sure that the named target - ($DIRCAP/SUBDIRS../SUBDIR) is a directory. This will create additional - intermediate directories as necessary. If the named target directory already - exists, this will make no changes to it. - - This will return an error if a blocking file is present at any of the parent - names, preventing the server from creating the necessary parent directory. - - The write-cap of the new directory will be returned as the HTTP response - body. - -POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=NAME - - Create a new empty directory and attach it to the given existing directory. - This will create additional intermediate directories as necessary. - - The URL of this form points to the parent of the bottom-most new directory, - whereas the previous form has a URL that points directly to the bottom-most - new directory. - -=== Get Information About A File Or Directory (as JSON) === - -GET /uri/$FILECAP?t=json -GET /uri/$DIRCAP?t=json -GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json -GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json - - This returns a machine-parseable JSON-encoded description of the given - object. The JSON always contains a list, and the first element of the list - is always a flag that indicates whether the referenced object is a file or a - directory. If it is a file, then the information includes file size and URI, - like this: - - GET /uri/$FILECAP?t=json : - GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json : - - [ "filenode", { "ro_uri": file_uri, - "size": bytes, - "mutable": false, - "metadata": {"ctime": 1202777696.7564139, - "mtime": 1202777696.7564139 - } - } ] - - If it is a directory, then it includes information about the children of - this directory, as a mapping from child name to a set of data about the - child (the same data that would appear in a corresponding GET?t=json of the - child itself). The child entries also include metadata about each child, - including creation- and modification- timestamps. The output looks like - this: - - GET /uri/$DIRCAP?t=json : - GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json : - - [ "dirnode", { "rw_uri": read_write_uri, - "ro_uri": read_only_uri, - "mutable": true, - "children": { - "foo.txt": [ "filenode", { "ro_uri": uri, - "size": bytes, - "metadata": { - "ctime": 1202777696.7564139, - "mtime": 1202777696.7564139 - } - } ], - "subdir": [ "dirnode", { "rw_uri": rwuri, - "ro_uri": rouri, - "metadata": { - "ctime": 1202778102.7589991, - "mtime": 1202778111.2160511, - } - } ] - } } ] - - In the above example, note how 'children' is a dictionary in which the keys - are child names and the values depend upon whether the child is a file or a - directory. The value is mostly the same as the JSON representation of the - child object (except that directories do not recurse -- the "children" - entry of the child is omitted, and the directory view includes the metadata - that is stored on the directory edge). - - Then the rw_uri field will be present in the information about a directory - if and only if you have read-write access to that directory, - - -=== Attaching an existing File or Directory by its read- or write- cap === - -PUT /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri - - This attaches a child object (either a file or directory) to a specified - location in the virtual filesystem. The child object is referenced by its - read- or write- cap, as provided in the HTTP request body. This will create - intermediate directories as necessary. - - This is similar to a UNIX hardlink: by referencing a previously-uploaded - file (or previously-created directory) instead of uploading/creating a new - one, you can create two references to the same object. - - The read- or write- cap of the child is provided in the body of the HTTP - request, and this same cap is returned in the response body. - - The default behavior is to overwrite any existing object at the same - location. To prevent this (and make the operation return an error instead of - overwriting), add a "replace=false" argument, as "?t=uri&replace=false". - With replace=false, this operation will return an HTTP 409 "Conflict" error - if there is already an object at the given location, rather than overwriting - the existing object. Note that "true", "t", and "1" are all synonyms for - "True", and "false", "f", and "0" are synonyms for "False". the parameter is - case-insensitive. - -=== Deleting a File or Directory === - -DELETE /uri/$DIRCAP/[SUBDIRS../]CHILDNAME - - This removes the given name from its parent directory. CHILDNAME is the - name to be removed, and $DIRCAP/SUBDIRS.. indicates the directory that will - be modified. - - Note that this does not actually delete the file or directory that the name - points to from the tahoe grid -- it only removes the named reference from - this directory. If there are other names in this directory or in other - directories that point to the resource, then it will remain accessible - through those paths. Even if all names pointing to this object are removed - from their parent directories, then someone with possession of its read-cap - can continue to access the object through that cap. - - The object will only become completely unreachable once 1: there are no - reachable directories that reference it, and 2: nobody is holding a read- - or write- cap to the object. (This behavior is very similar to the way - hardlinks and anonymous files work in traditional unix filesystems). - - This operation will not modify more than a single directory. Intermediate - directories which were implicitly created by PUT or POST methods will *not* - be automatically removed by DELETE. - - This method returns the file- or directory- cap of the object that was just - removed. - -== Browser Operations == - -This section describes the HTTP operations that provide support for humans -running a web browser. Most of these operations use HTML forms that use POST -to drive the Tahoe node. - -Note that for all POST operations, the arguments listed can be provided -either as URL query arguments or as form body fields. URL query arguments are -separated from the main URL by "?", and from each other by "&". For example, -"POST /uri/$DIRCAP?t=upload&mutable=true". Form body fields are usually -specified by using elements. For clarity, the -descriptions below display the most significant arguments as URL query args. - -=== Viewing A Directory (as HTML) === - -GET /uri/$DIRCAP/[SUBDIRS../] - - This returns an HTML page, intended to be displayed to a human by a web - browser, which contains HREF links to all files and directories reachable - from this directory. These HREF links do not have a t= argument, meaning - that a human who follows them will get pages also meant for a human. It also - contains forms to upload new files, and to delete files and directories. - Those forms use POST methods to do their job. - -=== Viewing/Downloading a File === - -GET /uri/$FILECAP -GET /uri/$DIRCAP/[SUBDIRS../]FILENAME - - This will retrieve the contents of the given file. The HTTP response body - will contain the sequence of bytes that make up the file. - - If you want the HTTP response to include a useful Content-Type header, - either use the second form (which starts with a $DIRCAP), or add a - "filename=foo" query argument, like "GET /uri/$FILECAP?filename=foo.jpg". - The bare "GET /uri/$FILECAP" does not give the Tahoe node enough information - to determine a Content-Type (since Tahoe immutable files are merely - sequences of bytes, not typed+named file objects). - - If the URL has both filename= and "save=true" in the query arguments, then - the server to add a "Content-Disposition: attachment" header, along with a - filename= parameter. When a user clicks on such a link, most browsers will - offer to let the user save the file instead of displaying it inline (indeed, - most browsers will refuse to display it inline). "true", "t", "1", and other - case-insensitive equivalents are all treated the same. - - Character-set handling in URLs and HTTP headers is a dubious art[1]. For - maximum compatibility, Tahoe simply copies the bytes from the filename= - argument into the Content-Disposition header's filename= parameter, without - trying to interpret them in any particular way. - - -GET /named/$FILECAP/FILENAME - - This is an alternate download form which makes it easier to get the correct - filename. The Tahoe server will provide the contents of the given file, with - a Content-Type header derived from the given filename. This form is used to - get browsers to use the "Save Link As" feature correctly, and also helps - command-line tools like "wget" and "curl" use the right filename. Note that - this form can *only* be used with file caps; it is an error to use a - directory cap after the /named/ prefix. - -=== Get Information About A File Or Directory (as HTML) === - -GET /uri/$FILECAP?t=info -GET /uri/$DIRCAP/?t=info -GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR/?t=info -GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=info - - This returns a human-oriented HTML page with more detail about the selected - file or directory object. This page contains the following items: - - object size - storage index - JSON representation - raw contents (text/plain) - access caps (URIs): verify-cap, read-cap, write-cap (for mutable objects) - check/verify/repair form - deep-check/deep-size/deep-stats/manifest (for directories) - replace-conents form (for mutable files) - -=== Creating a Directory === - -POST /uri?t=mkdir - - This creates a new directory, but does not attach it to the virtual - filesystem. - - If a "redirect_to_result=true" argument is provided, then the HTTP response - will cause the web browser to be redirected to a /uri/$DIRCAP page that - gives access to the newly-created directory. If you bookmark this page, - you'll be able to get back to the directory again in the future. This is the - recommended way to start working with a Tahoe server: create a new unlinked - directory (using redirect_to_result=true), then bookmark the resulting - /uri/$DIRCAP page. There is a "Create Directory" button on the Welcome page - to invoke this action. - - If "redirect_to_result=true" is not provided (or is given a value of - "false"), then the HTTP response body will simply be the write-cap of the - new directory. - -POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=CHILDNAME - - This creates a new directory as a child of the designated SUBDIR. This will - create additional intermediate directories as necessary. - - If a "when_done=URL" argument is provided, the HTTP response will cause the - web browser to redirect to the given URL. This provides a convenient way to - return the browser to the directory that was just modified. Without a - when_done= argument, the HTTP response will simply contain the write-cap of - the directory that was just created. - - -=== Uploading a File === - -POST /uri?t=upload - - This uploads a file, and produces a file-cap for the contents, but does not - attach the file into the virtual drive. No directories will be modified by - this operation. - - The file must be provided as the "file" field of an HTML encoded form body, - produced in response to an HTML form like this: -
- - - -
- - If a "when_done=URL" argument is provided, the response body will cause the - browser to redirect to the given URL. If the when_done= URL has the string - "%(uri)s" in it, that string will be replaced by a URL-escaped form of the - newly created file-cap. (Note that without this substitution, there is no - way to access the file that was just uploaded). - - The default (in the absence of when_done=) is to return an HTML page that - describes the results of the upload. This page will contain information - about which storage servers were used for the upload, how long each - operation took, etc. - - If a "mutable=true" argument is provided, the operation will create a - mutable file, and the response body will contain the write-cap instead of - the upload results page. The default is to create an immutable file, - returning the upload results page as a response. - - -POST /uri/$DIRCAP/[SUBDIRS../]?t=upload - - This uploads a file, and attaches it as a new child of the given directory. - The file must be provided as the "file" field of an HTML encoded form body, - produced in response to an HTML form like this: -
- - - -
- - A "name=" argument can be provided to specify the new child's name, - otherwise it will be taken from the "filename" field of the upload form - (most web browsers will copy the last component of the original file's - pathname into this field). To avoid confusion, name= is not allowed to - contain a slash. - - If there is already a child with that name, and it is a mutable file, then - its contents are replaced with the data being uploaded. If it is not a - mutable file, the default behavior is to remove the existing child before - creating a new one. To prevent this (and make the operation return an error - instead of overwriting the old child), add a "replace=false" argument, as - "?t=upload&replace=false". With replace=false, this operation will return an - HTTP 409 "Conflict" error if there is already an object at the given - location, rather than overwriting the existing object. Note that "true", - "t", and "1" are all synonyms for "True", and "false", "f", and "0" are - synonyms for "False". the parameter is case-insensitive. - - This will create additional intermediate directories as necessary, although - since it is expected to be triggered by a form that was retrieved by "GET - /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will - already exist. - - If a "mutable=true" argument is provided, any new file that is created will - be a mutable file instead of an immutable one. will give the user a way to set this option. - - If a "when_done=URL" argument is provided, the HTTP response will cause the - web browser to redirect to the given URL. This provides a convenient way to - return the browser to the directory that was just modified. Without a - when_done= argument, the HTTP response will simply contain the file-cap of - the file that was just uploaded (a write-cap for mutable files, or a - read-cap for immutable files). - -POST /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=upload - - This also uploads a file and attaches it as a new child of the given - directory. It is a slight variant of the previous operation, as the URL - refers to the target file rather than the parent directory. It is otherwise - identical: this accepts mutable= and when_done= arguments too. - -POST /uri/$FILECAP?t=upload - -=== Attaching An Existing File Or Directory (by URI) === - -POST /uri/$DIRCAP/[SUBDIRS../]?t=uri&name=CHILDNAME&uri=CHILDCAP - - This attaches a given read- or write- cap "CHILDCAP" to the designated - directory, with a specified child name. This behaves much like the PUT t=uri - operation, and is a lot like a UNIX hardlink. - - This will create additional intermediate directories as necessary, although - since it is expected to be triggered by a form that was retrieved by "GET - /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will - already exist. - -=== Deleting A Child === - -POST /uri/$DIRCAP/[SUBDIRS../]?t=delete&name=CHILDNAME - - This instructs the node to delete a child object (file or subdirectory) from - the given directory. Note that the entire subtree is removed. This is - somewhat like "rm -rf" (from the point of view of the parent), but other - references into the subtree will see that the child subdirectories are not - modified by this operation. Only the link from the given directory to its - child is severed. - -=== Renaming A Child === - -POST /uri/$DIRCAP/[SUBDIRS../]?t=rename&from_name=OLD&to_name=NEW - - This instructs the node to rename a child of the given directory. This is - exactly the same as removing the child, then adding the same child-cap under - the new name. This operation cannot move the child to a different directory. - - This operation will replace any existing child of the new name, making it - behave like the UNIX "mv -f" command. - -=== Other Utilities === - -GET /uri?uri=$CAP - - This causes a redirect to /uri/$CAP, and retains any additional query - arguments (like filename= or save=). This is for the convenience of web - forms which allow the user to paste in a read- or write- cap (obtained - through some out-of-band channel, like IM or email). - - Note that this form merely redirects to the specific file or directory - indicated by the $CAP: unlike the GET /uri/$DIRCAP form, you cannot - traverse to children by appending additional path segments to the URL. - -GET /uri/$DIRCAP/[SUBDIRS../]?t=rename-form&name=$CHILDNAME - - This provides a useful facility to browser-based user interfaces. It - returns a page containing a form targetting the "POST $DIRCAP t=rename" - functionality described above, with the provided $CHILDNAME present in the - 'from_name' field of that form. I.e. this presents a form offering to - rename $CHILDNAME, requesting the new name, and submitting POST rename. - -GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri - - This returns the file- or directory- cap for the specified object. - -GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=readonly-uri - - This returns a read-only file- or directory- cap for the specified object. - If the object is an immutable file, this will return the same value as - t=uri. - -=== Debugging and Testing Features === - -These URLs are less-likely to be helpful to the casual Tahoe user, and are -mainly intended for developers. - -POST $URL?t=check - - This triggers the FileChecker to determine the current "health" of the - given file or directory, by counting how many shares are available. The - page that is returned will display the results. This can be used as a "show - me detailed information about this file" page. - - If a verify=true argument is provided, the node will perform a more - intensive check, downloading and verifying every single bit of every share. - - If an output=JSON argument is provided, the response will be - machine-readable JSON instead of human-oriented HTML. The data is a - dictionary with the following keys: - - storage-index: a base32-encoded string with the objects's storage index, - or an empty string for LIT files - results: a dictionary that describes the state of the file. For LIT files, - this dictionary has only the 'healthy' key, which will always be - True. For distributed files, this dictionary has the following - keys: - count-shares-good: the number of good shares that were found - count-shares-needed: 'k', the number of shares required for recovery - count-shares-expected: 'N', the number of total shares generated - count-good-share-hosts: the number of distinct storage servers with - good shares. If this number is less than - count-shares-good, then some shares are doubled - up, increasing the correlation of failures. This - indicates that one or more shares should be - moved to an otherwise unused server, if one is - available. - count-wrong-shares: for mutable files, the number of shares for - versions other than the 'best' one (highest - sequence number, highest roothash). These are - either old ... - count-recoverable-versions: for mutable files, the number of - recoverable versions of the file. For - a healthy file, this will equal 1. - count-unrecoverable-versions: for mutable files, the number of - unrecoverable versions of the file. - For a healthy file, this will be 0. - count-corrupt-shares: the number of shares with integrity failures - list-corrupt-shares: a list of "share locators", one for each share - that was found to be corrupt. Each share locator - is a list of (serverid, storage_index, sharenum). - needs-rebalancing: (bool) True if there are multiple shares on a single - storage server, indicating a reduction in reliability - that could be resolved by moving shares to new - servers. - servers-responding: list of base32-encoded storage server identifiers, - one for each server which responded to the share - query. - healthy: (bool) True if the file is completely healthy, False otherwise. - Healthy files have at least N good shares. Overlapping shares - (indicated by count-good-share-hosts < count-shares-good) do not - currently cause a file to be marked unhealthy. If there are at - least N good shares, then corrupt shares do not cause the file to - be marked unhealthy, although the corrupt shares will be listed - in the results (list-corrupt-shares) and should be manually - removed to wasting time in subsequent downloads (as the - downloader rediscovers the corruption and uses alternate shares). - sharemap: dict mapping share identifier to list of serverids - (base32-encoded strings). This indicates which servers are - holding which shares. For immutable files, the shareid is - an integer (the share number, from 0 to N-1). For - immutable files, it is a string of the form - 'seq%d-%s-sh%d', containing the sequence number, the - roothash, and the share number. - -POST $URL?t=start-deep-check (must add &ophandle=XYZ) - - This initiates a recursive walk of all files and directories reachable from - the target, performing a check on each one just like t=check. The result - page will contain a summary of the results, including details on any - file/directory that was not fully healthy. - - t=start-deep-check can only be invoked on a directory. An error (400 - BAD_REQUEST) will be signalled if it is invoked on a file. The recursive - walker will deal with loops safely. - - This accepts the same verify= argument as t=check. - - Since this operation can take a long time (perhaps a second per object), - the ophandle= argument is required (see "Slow Operations, Progress, and - Cancelling" above). The response to this POST will be a redirect to the - corresponding /operations/$HANDLE page (with output=HTML or output=JSON to - match the output= argument given to the POST). The deep-check operation - will continue to run in the background, and the /operations page should be - used to find out when the operation is done. - - Detailed checker results for non-healthy files and directories will be - available under /operations/$HANDLE/$STORAGEINDEX, and the HTML status will - contain links to these detailed results. - - The HTML /operations/$HANDLE page for incomplete operations will contain a - meta-refresh tag, set to 60 seconds, so that a browser which uses - deep-check will automatically poll until the operation has completed. - - The JSON page (/options/$HANDLE?output=JSON) will contain a - machine-readable JSON dictionary with the following keys: - - finished: a boolean, True if the operation is complete, else False. Some - of the remaining keys may not be present until the operation - is complete. - root-storage-index: a base32-encoded string with the storage index of the - starting point of the deep-check operation - count-objects-checked: count of how many objects were checked. Note that - non-distributed objects (i.e. small immutable LIT - files) are not checked, since for these objects, - the data is contained entirely in the URI. - count-objects-healthy: how many of those objects were completely healthy - count-objects-unhealthy: how many were damaged in some way - count-corrupt-shares: how many shares were found to have corruption, - summed over all objects examined - list-corrupt-shares: a list of "share identifiers", one for each share - that was found to be corrupt. Each share identifier - is a list of (serverid, storage_index, sharenum). - list-unhealthy-files: a list of (pathname, check-results) tuples, for - each file that was not fully healthy. 'pathname' is - a list of strings (which can be joined by "/" - characters to turn it into a single string), - relative to the directory on which deep-check was - invoked. The 'check-results' field is the same as - that returned by t=check&output=JSON, described - above. - stats: a dictionary with the same keys as the t=deep-stats command - (described below) - -POST $URL?t=check&repair=true - - This performs a health check of the given file or directory, and if the - checker determines that the object is not healthy (some shares are missing - or corrupted), it will perform a "repair". During repair, any missing - shares will be regenerated and uploaded to new servers. - - This accepts the same verify=true argument as t=check. When an output=JSON - argument is provided, the machine-readable JSON response will contain the - following keys: - - storage-index: a base32-encoded string with the objects's storage index, - or an empty string for LIT files - repair-attempted: (bool) True if repair was attempted - repair-successful: (bool) True if repair was attempted and the file was - fully healthy afterwards. False if no repair was - attempted, or if a repair attempt failed. - pre-repair-results: a dictionary that describes the state of the file - before any repair was performed. This contains exactly - the same keys as the 'results' value of the t=check - response, described above. - post-repair-results: a dictionary that describes the state of the file - after any repair was performed. If no repair was - performed, post-repair-results and pre-repair-results - will be the same. This contains exactly the same keys - as the 'results' value of the t=check response, - described above. - -POST $URL?t=start-deep-check&repair=true (must add &ophandle=XYZ) - - This triggers a recursive walk of all files and directories, performing a - t=check&repair=true on each one. - - Like t=start-deep-check without the repair= argument, this can only be - invoked on a directory. An error (400 BAD_REQUEST) will be signalled if it - is invoked on a file. The recursive walker will deal with loops safely. - - This accepts the same verify=true argument as t=start-deep-check. It uses - the same ophandle= mechanism as start-deep-check. When an output=JSON - argument is provided, the response will contain the following keys: - - finished: (bool) True if the operation has completed, else False - root-storage-index: a base32-encoded string with the storage index of the - starting point of the deep-check operation - count-objects-checked: count of how many objects were checked - - count-objects-healthy-pre-repair: how many of those objects were completely - healthy, before any repair - count-objects-unhealthy-pre-repair: how many were damaged in some way - count-objects-healthy-post-repair: how many of those objects were completely - healthy, after any repair - count-objects-unhealthy-post-repair: how many were damaged in some way - - count-repairs-attempted: repairs were attempted on this many objects. - count-repairs-successful: how many repairs resulted in healthy objects - count-repairs-unsuccessful: how many repairs resulted did not results in - completely healthy objects - count-corrupt-shares-pre-repair: how many shares were found to have - corruption, summed over all objects - examined, before any repair - count-corrupt-shares-post-repair: how many shares were found to have - corruption, summed over all objects - examined, after any repair - list-corrupt-shares: a list of "share identifiers", one for each share - that was found to be corrupt (before any repair). - Each share identifier is a list of (serverid, - storage_index, sharenum). - list-remaining-corrupt-shares: like list-corrupt-shares, but mutable shares - that were successfully repaired are not - included. These are shares that need - manual processing. Since immutable shares - cannot be modified by clients, all corruption - in immutable shares will be listed here. - list-unhealthy-files: a list of (pathname, check-results) tuples, for - each file that was not fully healthy. 'pathname' is - relative to the directory on which deep-check was - invoked. The 'check-results' field is the same as - that returned by t=check&repair=true&output=JSON, - described above. - stats: a dictionary with the same keys as the t=deep-stats command - (described below) - -POST $DIRURL?t=start-manifest (must add &ophandle=XYZ) - - This operation generates a "manfest" of the given directory tree, mostly - for debugging. This is a table of (path, filecap/dircap), for every object - reachable from the starting directory. The path will be slash-joined, and - the filecap/dircap will contain a link to the object in question. This page - gives immediate access to every object in the virtual filesystem subtree. - - This operation uses the same ophandle= mechanism as deep-check. The - corresponding /operations/$HANDLE page has three different forms. The - default is output=HTML. - - If output=text is added to the query args, the results will be a text/plain - list. The first line is special: it is either "finished: yes" or "finished: - no"; if the operation is not finished, you must periodically reload the - page until it completes. The rest of the results are a plaintext list, with - one file/dir per line, slash-separated, with the filecap/dircap separated - by a space. - - If output=JSON is added to the queryargs, then the results will be a - JSON-formatted dictionary with three keys: - - finished (bool): if False then you must reload the page until True - origin_si (str): the storage index of the starting point - manifest: list of (path, cap) tuples, where path is a list of strings. - -POST $DIRURL?t=start-deep-size (must add &ophandle=XYZ) - - This operation generates a number (in bytes) containing the sum of the - filesize of all directories and immutable files reachable from the given - directory. This is a rough lower bound of the total space consumed by this - subtree. It does not include space consumed by mutable files, nor does it - take expansion or encoding overhead into account. Later versions of the - code may improve this estimate upwards. - - The /operations/$HANDLE status output consists of two lines of text: - - finished: yes - size: 1234 - -POST $DIRURL?t=start-deep-stats (must add &ophandle=XYZ) - - This operation performs a recursive walk of all files and directories - reachable from the given directory, and generates a collection of - statistics about those objects. - - The result (obtained from the /operations/$OPHANDLE page) is a - JSON-serialized dictionary with the following keys (note that some of these - keys may be missing until 'finished' is True): - - finished: (bool) True if the operation has finished, else False - count-immutable-files: count of how many CHK files are in the set - count-mutable-files: same, for mutable files (does not include directories) - count-literal-files: same, for LIT files (data contained inside the URI) - count-files: sum of the above three - count-directories: count of directories - size-immutable-files: total bytes for all CHK files in the set, =deep-size - size-mutable-files (TODO): same, for current version of all mutable files - size-literal-files: same, for LIT files - size-directories: size of directories (includes size-literal-files) - size-files-histogram: list of (minsize, maxsize, count) buckets, - with a histogram of filesizes, 5dB/bucket, - for both literal and immutable files - largest-directory: number of children in the largest directory - largest-immutable-file: number of bytes in the largest CHK file - - size-mutable-files is not implemented, because it would require extra - queries to each mutable file to get their size. This may be implemented in - the future. - - Assuming no sharing, the basic space consumed by a single root directory is - the sum of size-immutable-files, size-mutable-files, and size-directories. - The actual disk space used by the shares is larger, because of the - following sources of overhead: - - integrity data - expansion due to erasure coding - share management data (leases) - backend (ext3) minimum block size - -== Other Useful Pages == - -The portion of the web namespace that begins with "/uri" (and "/named") is -dedicated to giving users (both humans and programs) access to the Tahoe -virtual filesystem. The rest of the namespace provides status information -about the state of the Tahoe node. - -GET / (the root page) - -This is the "Welcome Page", and contains a few distinct sections: - - Node information: library versions, local nodeid, services being provided. - - Filesystem Access Forms: create a new directory, view a file/directory by - URI, upload a file (unlinked), download a file by - URI. - - Grid Status: introducer information, helper information, connected storage - servers. - -GET /status/ - - This page lists all active uploads and downloads, and contains a short list - of recent upload/download operations. Each operation has a link to a page - that describes file sizes, servers that were involved, and the time consumed - in each phase of the operation. - - A GET of /status/?t=json will contain a machine-readable subset of the same - data. It returns a JSON-encoded dictionary. The only key defined at this - time is "active", with a value that is a list of operation dictionaries, one - for each active operation. Once an operation is completed, it will no longer - appear in data["active"] . - - Each op-dict contains a "type" key, one of "upload", "download", - "mapupdate", "publish", or "retrieve" (the first two are for immutable - files, while the latter three are for mutable files and directories). - - The "upload" op-dict will contain the following keys: - - type (string): "upload" - storage-index-string (string): a base32-encoded storage index - total-size (int): total size of the file - status (string): current status of the operation - progress-hash (float): 1.0 when the file has been hashed - progress-ciphertext (float): 1.0 when the file has been encrypted. - progress-encode-push (float): 1.0 when the file has been encoded and - pushed to the storage servers. For helper - uploads, the ciphertext value climbs to 1.0 - first, then encoding starts. For unassisted - uploads, ciphertext and encode-push progress - will climb at the same pace. - - The "download" op-dict will contain the following keys: - - type (string): "download" - storage-index-string (string): a base32-encoded storage index - total-size (int): total size of the file - status (string): current status of the operation - progress (float): 1.0 when the file has been fully downloaded - - Front-ends which want to report progress information are advised to simply - average together all the progress-* indicators. A slightly more accurate - value can be found by ignoring the progress-hash value (since the current - implementation hashes synchronously, so clients will probably never see - progress-hash!=1.0). - -GET /provisioning/ - - This page provides a basic tool to predict the likely storage and bandwidth - requirements of a large Tahoe grid. It provides forms to input things like - total number of users, number of files per user, average file size, number - of servers, expansion ratio, hard drive failure rate, etc. It then provides - numbers like how many disks per server will be needed, how many read - operations per second should be expected, and the likely MTBF for files in - the grid. This information is very preliminary, and the model upon which it - is based still needs a lot of work. - -GET /helper_status/ - - If the node is running a helper (i.e. if "$BASEDIR/run_helper" is - non-empty), then this page will provide a list of all the helper operations - currently in progress. If "?t=json" is added to the URL, it will return a - JSON-formatted list of helper statistics, which can then be used to produce - graphs to indicate how busy the helper is. - -GET /statistics/ - - This page provides "node statistics", which are collected from a variety of - sources. - - load_monitor: every second, the node schedules a timer for one second in - the future, then measures how late the subsequent callback - is. The "load_average" is this tardiness, measured in - seconds, averaged over the last minute. It is an indication - of a busy node, one which is doing more work than can be - completed in a timely fashion. The "max_load" value is the - highest value that has been seen in the last 60 seconds. - - cpu_monitor: every minute, the node uses time.clock() to measure how much - CPU time it has used, and it uses this value to produce - 1min/5min/15min moving averages. These values range from 0% - (0.0) to 100% (1.0), and indicate what fraction of the CPU - has been used by the Tahoe node. Not all operating systems - provide meaningful data to time.clock(): they may report 100% - CPU usage at all times. - - uploader: this counts how many immutable files (and bytes) have been - uploaded since the node was started - - downloader: this counts how many immutable files have been downloaded - since the node was started - - publishes: this counts how many mutable files (including directories) have - been modified since the node was started - - retrieves: this counts how many mutable files (including directories) have - been read since the node was started - - There are other statistics that are tracked by the node. The "raw stats" - section shows a formatted dump of all of them. - - By adding "?t=json" to the URL, the node will return a JSON-formatted - dictionary of stats values, which can be used by other tools to produce - graphs of node behavior. The misc/munin/ directory in the source - distribution provides some tools to produce these graphs. - -GET / (introducer status) - - For Introducer nodes, the welcome page displays information about both - clients and servers which are connected to the introducer. Servers make - "service announcements", and these are listed in a table. Clients will - subscribe to hear about service announcements, and these subscriptions are - listed in a separate table. Both tables contain information about what - version of Tahoe is being run by the remote node, their advertised and - outbound IP addresses, their nodeid and nickname, and how long they have - been available. - - By adding "?t=json" to the URL, the node will return a JSON-formatted - dictionary of stats values, which can be used to produce graphs of connected - clients over time. - - -== Static Files in /public_html == - -The webapi server will take any request for a URL that starts with /static -and serve it from a configurable directory which defaults to -$BASEDIR/public_html . This is configured by setting the "[node]web.static" -value in $BASEDIR/tahoe.cfg . If this is left at the default value of -"public_html", then http://localhost:8123/static/subdir/foo.html will be -served with the contents of the file $BASEDIR/public_html/subdir/foo.html . - -This can be useful to serve a javascript application which provides a -prettier front-end to the rest of the Tahoe webapi. - - -== safety and security issues -- names vs. URIs == - -Summary: use explicit file- and dir- caps whenever possible, to reduce the -potential for surprises when the virtual drive is changed while you aren't -looking. - -The vdrive provides a mutable filesystem, but the ways that the filesystem -can change are limited. The only thing that can change is that the mapping -from child names to child objects that each directory contains can be changed -by adding a new child name pointing to an object, removing an existing child -name, or changing an existing child name to point to a different object. - -Obviously if you query tahoe for information about the filesystem and then -act upon the filesystem (such as by getting a listing of the contents of a -directory and then adding a file to the directory), then the filesystem might -have been changed after you queried it and before you acted upon it. -However, if you use the URI instead of the pathname of an object when you act -upon the object, then the only change that can happen is when the object is a -directory then the set of child names it has might be different. If, on the -other hand, you act upon the object using its pathname, then a different -object might be in that place, which can result in more kinds of surprises. - -For example, suppose you are writing code which recursively downloads the -contents of a directory. The first thing your code does is fetch the listing -of the contents of the directory. For each child that it fetched, if that -child is a file then it downloads the file, and if that child is a directory -then it recurses into that directory. Now, if the download and the recurse -actions are performed using the child's name, then the results might be -wrong, because for example a child name that pointed to a sub-directory when -you listed the directory might have been changed to point to a file (in which -case your attempt to recurse into it would result in an error and the file -would be skipped), or a child name that pointed to a file when you listed the -directory might now point to a sub-directory (in which case your attempt to -download the child would result in a file containing HTML text describing the -sub-directory!). - -If your recursive algorithm uses the uri of the child instead of the name of -the child, then those kinds of mistakes just can't happen. Note that both the -child's name and the child's URI are included in the results of listing the -parent directory, so it isn't any harder to use the URI for this purpose. - -In general, use names if you want "whatever object (whether file or -directory) is found by following this name (or sequence of names) when my -request reaches the server". Use URIs if you want "this particular object". - -== Concurrency Issues == - -Tahoe uses both mutable and immutable files. Mutable files can be created -explicitly by doing an upload with ?mutable=true added, or implicitly by -creating a new directory (since a directory is just a special way to -interpret a given mutable file). - -Mutable files suffer from the same consistency-vs-availability tradeoff that -all distributed data storage systems face. It is not possible to -simultaneously achieve perfect consistency and perfect availability in the -face of network partitions (servers being unreachable or faulty). - -Tahoe tries to achieve a reasonable compromise, but there is a basic rule in -place, known as the Prime Coordination Directive: "Don't Do That". What this -means is that if write-access to a mutable file is available to several -parties, then those parties are responsible for coordinating their activities -to avoid multiple simultaneous updates. This could be achieved by having -these parties talk to each other and using some sort of locking mechanism, or -by serializing all changes through a single writer. - -The consequences of performing uncoordinated writes can vary. Some of the -writers may lose their changes, as somebody else wins the race condition. In -many cases the file will be left in an "unhealthy" state, meaning that there -are not as many redundant shares as we would like (reducing the reliability -of the file against server failures). In the worst case, the file can be left -in such an unhealthy state that no version is recoverable, even the old ones. -It is this small possibility of data loss that prompts us to issue the Prime -Coordination Directive. - -Tahoe nodes implement internal serialization to make sure that a single Tahoe -node cannot conflict with itself. For example, it is safe to issue two -directory modification requests to a single tahoe node's webapi server at the -same time, because the Tahoe node will internally delay one of them until -after the other has finished being applied. (This feature was introduced in -Tahoe-1.1; back with Tahoe-1.0 the web client was responsible for serializing -web requests themselves). - -For more details, please see the "Consistency vs Availability" and "The Prime -Coordination Directive" sections of mutable.txt, in the same directory as -this file. - - -[1]: URLs and HTTP and UTF-8, Oh My - - HTTP does not provide a mechanism to specify the character set used to - encode non-ascii names in URLs (rfc2396#2.1). We prefer the convention that - the filename= argument shall be a URL-encoded UTF-8 encoded unicode object. - For example, suppose we want to provoke the server into using a filename of - "f i a n c e-acute e" (i.e. F I A N C U+00E9 E). The UTF-8 encoding of this - is 0x66 0x69 0x61 0x6e 0x63 0xc3 0xa9 0x65 (or "fianc\xC3\xA9e", as python's - repr() function would show). To encode this into a URL, the non-printable - characters must be escaped with the urlencode '%XX' mechansim, giving us - "fianc%C3%A9e". Thus, the first line of the HTTP request will be "GET - /uri/CAP...?save=true&filename=fianc%C3%A9e HTTP/1.1". Not all browsers - provide this: IE7 uses the Latin-1 encoding, which is fianc%E9e. - - The response header will need to indicate a non-ASCII filename. The actual - mechanism to do this is not clear. For ASCII filenames, the response header - would look like: - - Content-Disposition: attachment; filename="english.txt" - - If Tahoe were to enforce the utf-8 convention, it would need to decode the - URL argument into a unicode string, and then encode it back into a sequence - of bytes when creating the response header. One possibility would be to use - unencoded utf-8. Developers suggest that IE7 might accept this: - - #1: Content-Disposition: attachment; filename="fianc\xC3\xA9e" - (note, the last four bytes of that line, not including the newline, are - 0xC3 0xA9 0x65 0x22) - - RFC2231#4 (dated 1997): suggests that the following might work, and some - developers (http://markmail.org/message/dsjyokgl7hv64ig3) have reported that - it is supported by firefox (but not IE7): - - #2: Content-Disposition: attachment; filename*=utf-8''fianc%C3%A9e - - My reading of RFC2616#19.5.1 (which defines Content-Disposition) says that - the filename= parameter is defined to be wrapped in quotes (presumeably to - allow spaces without breaking the parsing of subsequent parameters), which - would give us: - - #3: Content-Disposition: attachment; filename*=utf-8''"fianc%C3%A9e" - - However this is contrary to the examples in the email thread listed above. - - Developers report that IE7 (when it is configured for UTF-8 URL encoding, - which is not the default in asian countries), will accept: - - #4: Content-Disposition: attachment; filename=fianc%C3%A9e - - However, for maximum compatibility, Tahoe simply copies bytes from the URL - into the response header, rather than enforcing the utf-8 convention. This - means it does not try to decode the filename from the URL argument, nor does - it encode the filename into the response header.