--- /dev/null
+= Tahoe FTP Frontend =
+
+All Tahoe client nodes can run a frontend FTP server, allowing regular FTP
+clients to access the virtual filesystem.
+
+Since Tahoe does not use user accounts or passwords, the FTP server must be
+configured with a way to translate USER+PASS into a root directory cap. Two
+mechanisms are provided. The first is a simple flat file with one account per
+line. The second is an HTTP-based login mechanism, backed by simple PHP
+script and a database. The latter form is used by allmydata.com to provide
+secure access to customer rootcaps.
+
+== Configuring an Account File ==
+
+To configure the first form, create a file (probably in
+BASEDIR/private/ftp.accounts) in which each non-comment/non-blank line is a
+space-separated line of (USERNAME, PASSWORD, ROOTCAP), like so:
+
+ % cat BASEDIR/private/ftp.accounts
+ # This is a password file, (username, password, rootcap)
+ alice password URI:DIR2:ioej8xmzrwilg772gzj4fhdg7a:wtiizszzz2rgmczv4wl6bqvbv33ag4kvbr6prz3u6w3geixa6m6a
+ bob sekrit URI:DIR2:6bdmeitystckbl9yqlw7g56f4e:serp5ioqxnh34mlbmzwvkp3odehsyrr7eytt5f64we3k9hhcrcja
+
+Then add the following lines to the BASEDIR/tahoe.cfg file:
+
+ [ftpd]
+ enabled = true
+ ftp.port = 8021
+ ftp.accounts.file = private/ftp.accounts
+
+The FTP server will listen on the given port number. The ftp.accounts.file
+pathname will be interpreted relative to the node's BASEDIR.
+
+== Configuring an Account Server ==
+
+Determine the URL of the account server, say https://example.com/login . Then
+add the following lines to BASEDIR/tahoe.cfg:
+
+ [ftpd]
+ enabled = true
+ ftp.port = 8021
+ ftp.accounts.url = https://example.com/login
+
+== Dependencies ==
+
+The FTP server requires code in Twisted that enables asynchronous closing of
+file-upload operations. This code was not in the Twisted-8.1.0 release, and
+has not been committed to SVN trunk as of r24943. So it may be necessary to
+apply the following patch. The Tahoe node refuse to start the FTP server if
+it detects that this patch has not been applied.
+
+Index: twisted/protocols/ftp.py
+===================================================================
+--- twisted/protocols/ftp.py (revision 24956)
++++ twisted/protocols/ftp.py (working copy)
+@@ -1049,7 +1049,6 @@
+ cons = ASCIIConsumerWrapper(cons)
+
+ d = self.dtpInstance.registerConsumer(cons)
+- d.addCallbacks(cbSent, ebSent)
+
+ # Tell them what to doooo
+ if self.dtpInstance.isConnected:
+@@ -1062,6 +1061,8 @@
+ def cbOpened(file):
+ d = file.receive()
+ d.addCallback(cbConsumer)
++ d.addCallback(lambda ignored: file.close())
++ d.addCallbacks(cbSent, ebSent)
+ return d
+
+ def ebOpened(err):
+@@ -1434,7 +1435,14 @@
+ @rtype: C{Deferred} of C{IConsumer}
+ """
+
++ def close():
++ """
++ Perform any post-write work that needs to be done. This method may
++ only be invoked once on each provider, and will always be invoked
++ after receive().
+
++ @rtype: C{Deferred} of anything: the value is ignored
++ """
+
+ def _getgroups(uid):
+ """Return the primary and supplementary groups for the given UID.
+@@ -1795,6 +1803,8 @@
+ # FileConsumer will close the file object
+ return defer.succeed(FileConsumer(self.fObj))
+
++ def close(self):
++ return defer.succeed(None)
+
+
+ class FTPRealm:
+Index: twisted/vfs/adapters/ftp.py
+===================================================================
+--- twisted/vfs/adapters/ftp.py (revision 24956)
++++ twisted/vfs/adapters/ftp.py (working copy)
+@@ -295,6 +295,11 @@
+ """
+ return defer.succeed(IConsumer(self.node))
+
++ def close(self):
++ """
++ Perform post-write actions.
++ """
++ return defer.succeed(None)
+
+
+ class _FileToConsumerAdapter(object):
--- /dev/null
+= Tahoe SFTP Frontend =
+
+All Tahoe client nodes can run a frontend SFTP server, allowing regular SFTP
+clients to access the virtual filesystem.
+
+Since Tahoe does not use user accounts or passwords, the FTP server must be
+configured with a way to translate a username (and either a password or
+public key) into a root directory cap. Two mechanisms are provided. The first
+is a simple flat file with one account per line. The second is an HTTP-based
+login mechanism, backed by simple PHP script and a database. The latter form
+is used by allmydata.com to provide secure access to customer rootcaps.
+
+The SFTP server must also be given a public/private host keypair.
+
+== Configuring a Keypair ==
+
+First, generate a keypair for your server:
+
+% cd BASEDIR
+% ssh-keygen -f private/ssh_host_rsa_key
+
+You will then use the following lines in the tahoe.cfg file:
+
+ [sftpd]
+ sftp.host_pubkey_file = private/ssh_host_rsa_key.pub
+ sftp.host_privkey_file = private/ssh_host_rsa_key
+
+== Configuring an Account File ==
+
+To configure the first form, create a file (probably in
+BASEDIR/private/sftp.accounts) in which each non-comment/non-blank line is a
+space-separated line of (USERNAME, PASSWORD/PUBKEY, ROOTCAP), like so:
+
+[TODO: the PUBKEY form is not yet supported]
+
+ % cat BASEDIR/private/sftp.accounts
+ # This is a password file, (username, password/pubkey, rootcap)
+ alice password URI:DIR2:ioej8xmzrwilg772gzj4fhdg7a:wtiizszzz2rgmczv4wl6bqvbv33ag4kvbr6prz3u6w3geixa6m6a
+ bob sekrit URI:DIR2:6bdmeitystckbl9yqlw7g56f4e:serp5ioqxnh34mlbmzwvkp3odehsyrr7eytt5f64we3k9hhcrcja
+ carol ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAv2xHRVBoXnwxHLzthRD1wOWtyZ08b8n9cMZfJ58CBdBwAYP2NVNXc0XjRvswm5hnnAO+jyWPVNpXJjm9XllzYhODSNtSN+TXuJlUjhzA/T+ZwdgsgSAeHuuMQBoWt4Qc9HV6rHCdAeMhcnyqm6Q0sRAsfA/wfwiIgbvE7+cWpFa2anB6WeAnvK8+dMN0nvnkPE7GNyf/WFR1Ffuh9ifKdRB6yDNp17bQAqA3OWSFjch6fGPhp94y4g2jmTHlEUTyVsilgGqvGOutOVYnmOMnFijugU1Vu33G39GGzXWla6+fXwTk/oiVPiCYD7A7WFKes3nqMg8iVN6a6sxujrhnHQ== warner@fluxx URI:DIR2:6bdmeitystckbl9yqlw7g56f4e:serp5ioqxnh34mlbmzwvkp3odehsyrr7eytt5f64we3k9hhcrcja
+
+Note that if the second word of the line is "ssh-rsa" or "ssh-dss", the rest
+of the line is parsed differently, so users cannot have a password equal to
+either of these strings.
+
+Then add the following lines to the BASEDIR/tahoe.cfg file:
+
+ [sftpd]
+ enabled = true
+ sftp.port = 8022
+ sftp.host_pubkey_file = private/ssh_host_rsa_key.pub
+ sftp.host_privkey_file = private/ssh_host_rsa_key
+ sftp.accounts.file = private/sftp.accounts
+
+The SFTP server will listen on the given port number. The sftp.accounts.file
+pathname will be interpreted relative to the node's BASEDIR.
+
+== Configuring an Account Server ==
+
+Determine the URL of the account server, say https://example.com/login . Then
+add the following lines to BASEDIR/tahoe.cfg:
+
+ [sftpd]
+ enabled = true
+ sftp.port = 8022
+ sftp.host_pubkey_file = private/ssh_host_rsa_key.pub
+ sftp.host_privkey_file = private/ssh_host_rsa_key
+ sftp.accounts.url = https://example.com/login
+
+== Dependencies ==
+
+The Tahoe SFTP server requires the Twisted "Conch" component, which itself
+requires the pycrypto package (note that pycrypto is distinct from the
+pycryptopp that Tahoe uses).
--- /dev/null
+
+= The Tahoe REST-ful Web API =
+
+1. Enabling the web-API port
+2. Basic Concepts: GET, PUT, DELETE, POST
+3. URLs, Machine-Oriented Interfaces
+4. Browser Operations: Human-Oriented Interfaces
+5. Welcome / Debug / Status pages
+6. Static Files in /public_html
+7. Safety and security issues -- names vs. URIs
+8. Concurrency Issues
+
+
+== Enabling the web-API port ==
+
+Every Tahoe node is capable of running a built-in HTTP server. To enable
+this, just write a port number into a file named "webport" in the node's base
+directory. For example, writing "8123" into $NODEDIR/webport will cause the
+node to run a webserver on port 8123.
+
+This string is actually a Twisted "strports" specification, meaning you can
+get more control over the interface to which the server binds by supplying
+additional arguments. For more details, see the documentation on
+twisted.application.strports:
+http://twistedmatrix.com/documents/current/api/twisted.application.strports.html
+
+Writing "tcp:8123:interface=127.0.0.1" into $NODEDIR/webport does the same
+but binds to the loopback interface, ensuring that only the programs on the
+local host can connect. Using
+"ssl:8123:privateKey=mykey.pem:certKey=cert.pem" runs an SSL server.
+
+This webport can be set when the node is created by passing a --webport
+option to the 'tahoe create-client' command. By default, the node listens on
+port 8123, on the loopback (127.0.0.1) interface.
+
+== Basic Concepts ==
+
+As described in architecture.txt, each file and directory in a Tahoe virtual
+filesystem is referenced by an identifier that combines the designation of
+the object with the authority to do something with it (such as read or modify
+the contents). This identifier is called a "read-cap" or "write-cap",
+depending upon whether it enables read-only or read-write access. These
+"caps" are also referred to as URIs.
+
+The Tahoe web-based API is "REST-ful", meaning it implements the concepts of
+"REpresentational State Transfer": the original scheme by which the World
+Wide Web was intended to work. Each object (file or directory) is referenced
+by a URL that includes the read- or write- cap. HTTP methods (GET, PUT, and
+DELETE) are used to manipulate these objects. You can think of the URL as a
+noun, and the method as a verb.
+
+In REST, the GET method is used to retrieve information about an object, or
+to retrieve some representation of the object itself. When the object is a
+file, the basic GET method will simply return the contents of that file.
+Other variations (generally implemented by adding query parameters to the
+URL) will return information about the object, such as metadata. GET
+operations are required to have no side-effects.
+
+PUT is used to upload new objects into the filesystem, or to replace an
+existing object. DELETE it used to delete objects from the filesystem. Both
+PUT and DELETE are required to be idempotent: performing the same operation
+multiple times must have the same side-effects as only performing it once.
+
+POST is used for more complicated actions that cannot be expressed as a GET,
+PUT, or DELETE. POST operations can be thought of as a method call: sending
+some message to the object referenced by the URL. In Tahoe, POST is also used
+for operations that must be triggered by an HTML form (including upload and
+delete), because otherwise a regular web browser has no way to accomplish
+these tasks.
+
+Tahoe's web API is designed for two different consumers. The first is a
+program that needs to manipulate the virtual file system. Such programs are
+expected to use the RESTful interface described above. The second is a human
+using a standard web browser to work with the filesystem. This user is given
+a series of HTML pages with links to download files, and forms that use POST
+actions to upload, rename, and delete files.
+
+== URLs ==
+
+Tahoe uses a variety of read- and write- caps to identify files and
+directories. The most common of these is the "immutable file read-cap", which
+is used for most uploaded files. These read-caps look like the following:
+
+ URI:CHK:ime6pvkaxuetdfah2p2f35pe54:4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a:3:10:202
+
+The next most common is a "directory write-cap", which provides both read and
+write access to a directory, and look like this:
+
+ URI:DIR2:djrdkfawoqihigoett4g6auz6a:jx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq
+
+There are also "directory read-caps", which start with "URI:DIR2-RO:", and
+give read-only access to a directory. Finally there are also mutable file
+read- and write- caps, which start with "URI:SSK", and give access to mutable
+files.
+
+(later versions of Tahoe will make these strings shorter, and will remove the
+unfortunate colons, which must be escaped when these caps are embedded in
+URLs).
+
+To refer to any Tahoe object through the web API, you simply need to combine
+a prefix (which indicates the HTTP server to use) with the cap (which
+indicates which object inside that server to access). Since the default Tahoe
+webport is 8123, the most common prefix is one that will use a local node
+listening on this port:
+
+ http://127.0.0.1:8123/uri/ + $CAP
+
+So, to access the directory named above (which happens to be the
+publically-writable sample directory on the Tahoe test grid, described at
+http://allmydata.org/trac/tahoe/wiki/TestGrid), the URL would be:
+
+ http://127.0.0.1:8123/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/
+
+(note that the colons in the directory-cap are url-encoded into "%3A"
+sequences).
+
+Likewise, to access the file named above, use:
+
+ http://127.0.0.1:8123/uri/URI%3ACHK%3Aime6pvkaxuetdfah2p2f35pe54%3A4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a%3A3%3A10%3A202
+
+In the rest of this document, we'll use "$DIRCAP" as shorthand for a read-cap
+or write-cap that refers to a directory, and "$FILECAP" to abbreviate a cap
+that refers to a file (whether mutable or immutable). So those URLs above can
+be abbreviated as:
+
+ http://127.0.0.1:8123/uri/$DIRCAP/
+ http://127.0.0.1:8123/uri/$FILECAP
+
+The operation summaries below will abbreviate these further, by eliding the
+server prefix. They will be displayed like this:
+
+ /uri/$DIRCAP/
+ /uri/$FILECAP
+
+
+=== Child Lookup ===
+
+Tahoe directories contain named children, just like directories in a regular
+local filesystem. These children can be either files or subdirectories.
+
+If you have a Tahoe URL that refers to a directory, and want to reference a
+named child inside it, just append the child name to the URL. For example, if
+our sample directory contains a file named "welcome.txt", we can refer to
+that file with:
+
+ http://127.0.0.1:8123/uri/$DIRCAP/welcome.txt
+
+(or http://127.0.0.1:8123/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/welcome.txt)
+
+Multiple levels of subdirectories can be handled this way:
+
+ http://127.0.0.1:8123/uri/$DIRCAP/tahoe-source/docs/webapi.txt
+
+In this document, when we need to refer to a URL that references a file using
+this child-of-some-directory format, we'll use the following string:
+
+ /uri/$DIRCAP/[SUBDIRS../]FILENAME
+
+The "[SUBDIRS../]" part means that there are zero or more (optional)
+subdirectory names in the middle of the URL. The "FILENAME" at the end means
+that this whole URL refers to a file of some sort, rather than to a
+directory.
+
+When we need to refer specifically to a directory in this way, we'll write:
+
+ /uri/$DIRCAP/[SUBDIRS../]SUBDIR
+
+
+Note that all components of pathnames in URLs are required to be UTF-8
+encoded, so "resume.doc" (with an acute accent on both E's) would be accessed
+with:
+
+ http://127.0.0.1:8123/uri/$DIRCAP/r%C3%A9sum%C3%A9.doc
+
+Also note that the filenames inside upload POST forms are interpreted using
+whatever character set was provided in the conventional '_charset' field, and
+defaults to UTF-8 if not otherwise specified. The JSON representation of each
+directory contains native unicode strings. Tahoe directories are specified to
+contain unicode filenames, and cannot contain binary strings that are not
+representable as such.
+
+All Tahoe operations that refer to existing files or directories must include
+a suitable read- or write- cap in the URL: the webapi server won't add one
+for you. If you don't know the cap, you can't access the file. This allows
+the security properties of Tahoe caps to be extended across the webapi
+interface.
+
+== Slow Operations, Progress, and Cancelling ==
+
+Certain operations can be expected to take a long time. The "t=deep-check",
+described below, will recursively visit every file and directory reachable
+from a given starting point, which can take minutes or even hours for
+extremely large directory structures. A single long-running HTTP request is a
+fragile thing: proxies, NAT boxes, browsers, and users may all grow impatient
+with waiting and give up on the connection.
+
+For this reason, long-running operations have an "operation handle", which
+can be used to poll for status/progress messages while the operation
+proceeds. This handle can also be used to cancel the operation. These handles
+are created by the client, and passed in as a an "ophandle=" query argument
+to the POST or PUT request which starts the operation. The following
+operations can then be used to retrieve status:
+
+GET /operations/$HANDLE?output=HTML (with or without t=status)
+GET /operations/$HANDLE?output=JSON (same)
+
+ These two retrieve the current status of the given operation. Each operation
+ presents a different sort of information, but in general the page retrieved
+ will indicate:
+
+ * whether the operation is complete, or if it is still running
+ * how much of the operation is complete, and how much is left, if possible
+
+ The HTML form will include a meta-refresh tag, which will cause a regular
+ web browser to reload the status page about 60 seconds later. This tag will
+ be removed once the operation has completed.
+
+ There may be more status information available under
+ /operations/$HANDLE/$ETC : i.e., the handle forms the root of a URL space.
+
+POST /operations/$HANDLE?t=cancel
+
+ This terminates the operation, and returns an HTML page explaining what was
+ cancelled. If the operation handle has already expired (see below), this
+ POST will return a 404, which indicates that the operation is no longer
+ running (either it was completed or terminated). The response body will be
+ the same as a GET /operations/$HANDLE on this operation handle, and the
+ handle will be expired immediately afterwards.
+
+The operation handle will eventually expire, to avoid consuming an unbounded
+amount of memory. The handle's time-to-live can be reset at any time, by
+passing a retain-for= argument (with a count of seconds) to either the
+initial POST that starts the operation, or the subsequent GET request which
+asks about the operation. For example, if a 'GET
+/operations/$HANDLE?output=JSON&retain-for=600' query is performed, the
+handle will remain active for 600 seconds (10 minutes) after the GET was
+received.
+
+In addition, if the GET includes a release-after-complete=True argument, and
+the operation has completed, the operation handle will be released
+immediately.
+
+If a retain-for= argument is not used, the default handle lifetimes are:
+
+ * handles will remain valid at least until their operation finishes
+ * uncollected handles for finished operations (i.e. handles for operations
+ which have finished but for which the GET page has not been accessed since
+ completion) will remain valid for one hour, or for the total time consumed
+ by the operation, whichever is greater.
+ * collected handles (i.e. the GET page has been retrieved at least once
+ since the operation completed) will remain valid for ten minutes.
+
+
+== Programmatic Operations ==
+
+Now that we know how to build URLs that refer to files and directories in a
+Tahoe virtual filesystem, what sorts of operations can we do with those URLs?
+This section contains a catalog of GET, PUT, DELETE, and POST operations that
+can be performed on these URLs. This set of operations are aimed at programs
+that use HTTP to communicate with a Tahoe node. The next section describes
+operations that are intended for web browsers.
+
+=== Reading A File ===
+
+GET /uri/$FILECAP
+GET /uri/$DIRCAP/[SUBDIRS../]FILENAME
+
+ This will retrieve the contents of the given file. The HTTP response body
+ will contain the sequence of bytes that make up the file.
+
+ To view files in a web browser, you may want more control over the
+ Content-Type and Content-Disposition headers. Please see the next section
+ "Browser Operations", for details on how to modify these URLs for that
+ purpose.
+
+=== Writing/Uploading A File ===
+
+PUT /uri/$FILECAP
+PUT /uri/$DIRCAP/[SUBDIRS../]FILENAME
+
+ Upload a file, using the data from the HTTP request body, and add whatever
+ child links and subdirectories are necessary to make the file available at
+ the given location. Once this operation succeeds, a GET on the same URL will
+ retrieve the same contents that were just uploaded. This will create any
+ necessary intermediate subdirectories.
+
+ To use the /uri/$FILECAP form, $FILECAP be a write-cap for a mutable file.
+
+ In the /uri/$DIRCAP/[SUBDIRS../]FILENAME form, if the target file is a
+ writable mutable file, that files contents will be overwritten in-place. If
+ it is a read-cap for a mutable file, an error will occur. If it is an
+ immutable file, the old file will be discarded, and a new one will be put in
+ its place.
+
+ When creating a new file, if "mutable=true" is in the query arguments, the
+ operation will create a mutable file instead of an immutable one.
+
+ This returns the file-cap of the resulting file. If a new file was created
+ by this method, the HTTP response code (as dictated by rfc2616) will be set
+ to 201 CREATED. If an existing file was replaced or modified, the response
+ code will be 200 OK.
+
+ Note that the 'curl -T localfile http://127.0.0.1:8123/uri/$DIRCAP/foo.txt'
+ command can be used to invoke this operation.
+
+PUT /uri
+
+ This uploads a file, and produces a file-cap for the contents, but does not
+ attach the file into the virtual drive. No directories will be modified by
+ this operation. The file-cap is returned as the body of the HTTP response.
+
+ If "mutable=true" is in the query arguments, the operation will create a
+ mutable file, and return its write-cap in the HTTP respose. The default is
+ to create an immutable file, returning the read-cap as a response.
+
+=== Creating A New Directory ===
+
+POST /uri?t=mkdir
+PUT /uri?t=mkdir
+
+ Create a new empty directory and return its write-cap as the HTTP response
+ body. This does not make the newly created directory visible from the
+ virtual drive. The "PUT" operation is provided for backwards compatibility:
+ new code should use POST.
+
+POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir
+PUT /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir
+
+ Create new directories as necessary to make sure that the named target
+ ($DIRCAP/SUBDIRS../SUBDIR) is a directory. This will create additional
+ intermediate directories as necessary. If the named target directory already
+ exists, this will make no changes to it.
+
+ This will return an error if a blocking file is present at any of the parent
+ names, preventing the server from creating the necessary parent directory.
+
+ The write-cap of the new directory will be returned as the HTTP response
+ body.
+
+POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=NAME
+
+ Create a new empty directory and attach it to the given existing directory.
+ This will create additional intermediate directories as necessary.
+
+ The URL of this form points to the parent of the bottom-most new directory,
+ whereas the previous form has a URL that points directly to the bottom-most
+ new directory.
+
+=== Get Information About A File Or Directory (as JSON) ===
+
+GET /uri/$FILECAP?t=json
+GET /uri/$DIRCAP?t=json
+GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json
+GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json
+
+ This returns a machine-parseable JSON-encoded description of the given
+ object. The JSON always contains a list, and the first element of the list
+ is always a flag that indicates whether the referenced object is a file or a
+ directory. If it is a file, then the information includes file size and URI,
+ like this:
+
+ GET /uri/$FILECAP?t=json :
+ GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json :
+
+ [ "filenode", { "ro_uri": file_uri,
+ "size": bytes,
+ "mutable": false,
+ "metadata": {"ctime": 1202777696.7564139,
+ "mtime": 1202777696.7564139
+ }
+ } ]
+
+ If it is a directory, then it includes information about the children of
+ this directory, as a mapping from child name to a set of data about the
+ child (the same data that would appear in a corresponding GET?t=json of the
+ child itself). The child entries also include metadata about each child,
+ including creation- and modification- timestamps. The output looks like
+ this:
+
+ GET /uri/$DIRCAP?t=json :
+ GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json :
+
+ [ "dirnode", { "rw_uri": read_write_uri,
+ "ro_uri": read_only_uri,
+ "mutable": true,
+ "children": {
+ "foo.txt": [ "filenode", { "ro_uri": uri,
+ "size": bytes,
+ "metadata": {
+ "ctime": 1202777696.7564139,
+ "mtime": 1202777696.7564139
+ }
+ } ],
+ "subdir": [ "dirnode", { "rw_uri": rwuri,
+ "ro_uri": rouri,
+ "metadata": {
+ "ctime": 1202778102.7589991,
+ "mtime": 1202778111.2160511,
+ }
+ } ]
+ } } ]
+
+ In the above example, note how 'children' is a dictionary in which the keys
+ are child names and the values depend upon whether the child is a file or a
+ directory. The value is mostly the same as the JSON representation of the
+ child object (except that directories do not recurse -- the "children"
+ entry of the child is omitted, and the directory view includes the metadata
+ that is stored on the directory edge).
+
+ Then the rw_uri field will be present in the information about a directory
+ if and only if you have read-write access to that directory,
+
+
+=== Attaching an existing File or Directory by its read- or write- cap ===
+
+PUT /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri
+
+ This attaches a child object (either a file or directory) to a specified
+ location in the virtual filesystem. The child object is referenced by its
+ read- or write- cap, as provided in the HTTP request body. This will create
+ intermediate directories as necessary.
+
+ This is similar to a UNIX hardlink: by referencing a previously-uploaded
+ file (or previously-created directory) instead of uploading/creating a new
+ one, you can create two references to the same object.
+
+ The read- or write- cap of the child is provided in the body of the HTTP
+ request, and this same cap is returned in the response body.
+
+ The default behavior is to overwrite any existing object at the same
+ location. To prevent this (and make the operation return an error instead of
+ overwriting), add a "replace=false" argument, as "?t=uri&replace=false".
+ With replace=false, this operation will return an HTTP 409 "Conflict" error
+ if there is already an object at the given location, rather than overwriting
+ the existing object. Note that "true", "t", and "1" are all synonyms for
+ "True", and "false", "f", and "0" are synonyms for "False". the parameter is
+ case-insensitive.
+
+=== Deleting a File or Directory ===
+
+DELETE /uri/$DIRCAP/[SUBDIRS../]CHILDNAME
+
+ This removes the given name from its parent directory. CHILDNAME is the
+ name to be removed, and $DIRCAP/SUBDIRS.. indicates the directory that will
+ be modified.
+
+ Note that this does not actually delete the file or directory that the name
+ points to from the tahoe grid -- it only removes the named reference from
+ this directory. If there are other names in this directory or in other
+ directories that point to the resource, then it will remain accessible
+ through those paths. Even if all names pointing to this object are removed
+ from their parent directories, then someone with possession of its read-cap
+ can continue to access the object through that cap.
+
+ The object will only become completely unreachable once 1: there are no
+ reachable directories that reference it, and 2: nobody is holding a read-
+ or write- cap to the object. (This behavior is very similar to the way
+ hardlinks and anonymous files work in traditional unix filesystems).
+
+ This operation will not modify more than a single directory. Intermediate
+ directories which were implicitly created by PUT or POST methods will *not*
+ be automatically removed by DELETE.
+
+ This method returns the file- or directory- cap of the object that was just
+ removed.
+
+== Browser Operations ==
+
+This section describes the HTTP operations that provide support for humans
+running a web browser. Most of these operations use HTML forms that use POST
+to drive the Tahoe node.
+
+Note that for all POST operations, the arguments listed can be provided
+either as URL query arguments or as form body fields. URL query arguments are
+separated from the main URL by "?", and from each other by "&". For example,
+"POST /uri/$DIRCAP?t=upload&mutable=true". Form body fields are usually
+specified by using <input type="hidden"> elements. For clarity, the
+descriptions below display the most significant arguments as URL query args.
+
+=== Viewing A Directory (as HTML) ===
+
+GET /uri/$DIRCAP/[SUBDIRS../]
+
+ This returns an HTML page, intended to be displayed to a human by a web
+ browser, which contains HREF links to all files and directories reachable
+ from this directory. These HREF links do not have a t= argument, meaning
+ that a human who follows them will get pages also meant for a human. It also
+ contains forms to upload new files, and to delete files and directories.
+ Those forms use POST methods to do their job.
+
+=== Viewing/Downloading a File ===
+
+GET /uri/$FILECAP
+GET /uri/$DIRCAP/[SUBDIRS../]FILENAME
+
+ This will retrieve the contents of the given file. The HTTP response body
+ will contain the sequence of bytes that make up the file.
+
+ If you want the HTTP response to include a useful Content-Type header,
+ either use the second form (which starts with a $DIRCAP), or add a
+ "filename=foo" query argument, like "GET /uri/$FILECAP?filename=foo.jpg".
+ The bare "GET /uri/$FILECAP" does not give the Tahoe node enough information
+ to determine a Content-Type (since Tahoe immutable files are merely
+ sequences of bytes, not typed+named file objects).
+
+ If the URL has both filename= and "save=true" in the query arguments, then
+ the server to add a "Content-Disposition: attachment" header, along with a
+ filename= parameter. When a user clicks on such a link, most browsers will
+ offer to let the user save the file instead of displaying it inline (indeed,
+ most browsers will refuse to display it inline). "true", "t", "1", and other
+ case-insensitive equivalents are all treated the same.
+
+ Character-set handling in URLs and HTTP headers is a dubious art[1]. For
+ maximum compatibility, Tahoe simply copies the bytes from the filename=
+ argument into the Content-Disposition header's filename= parameter, without
+ trying to interpret them in any particular way.
+
+
+GET /named/$FILECAP/FILENAME
+
+ This is an alternate download form which makes it easier to get the correct
+ filename. The Tahoe server will provide the contents of the given file, with
+ a Content-Type header derived from the given filename. This form is used to
+ get browsers to use the "Save Link As" feature correctly, and also helps
+ command-line tools like "wget" and "curl" use the right filename. Note that
+ this form can *only* be used with file caps; it is an error to use a
+ directory cap after the /named/ prefix.
+
+=== Get Information About A File Or Directory (as HTML) ===
+
+GET /uri/$FILECAP?t=info
+GET /uri/$DIRCAP/?t=info
+GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR/?t=info
+GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=info
+
+ This returns a human-oriented HTML page with more detail about the selected
+ file or directory object. This page contains the following items:
+
+ object size
+ storage index
+ JSON representation
+ raw contents (text/plain)
+ access caps (URIs): verify-cap, read-cap, write-cap (for mutable objects)
+ check/verify/repair form
+ deep-check/deep-size/deep-stats/manifest (for directories)
+ replace-conents form (for mutable files)
+
+=== Creating a Directory ===
+
+POST /uri?t=mkdir
+
+ This creates a new directory, but does not attach it to the virtual
+ filesystem.
+
+ If a "redirect_to_result=true" argument is provided, then the HTTP response
+ will cause the web browser to be redirected to a /uri/$DIRCAP page that
+ gives access to the newly-created directory. If you bookmark this page,
+ you'll be able to get back to the directory again in the future. This is the
+ recommended way to start working with a Tahoe server: create a new unlinked
+ directory (using redirect_to_result=true), then bookmark the resulting
+ /uri/$DIRCAP page. There is a "Create Directory" button on the Welcome page
+ to invoke this action.
+
+ If "redirect_to_result=true" is not provided (or is given a value of
+ "false"), then the HTTP response body will simply be the write-cap of the
+ new directory.
+
+POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=CHILDNAME
+
+ This creates a new directory as a child of the designated SUBDIR. This will
+ create additional intermediate directories as necessary.
+
+ If a "when_done=URL" argument is provided, the HTTP response will cause the
+ web browser to redirect to the given URL. This provides a convenient way to
+ return the browser to the directory that was just modified. Without a
+ when_done= argument, the HTTP response will simply contain the write-cap of
+ the directory that was just created.
+
+
+=== Uploading a File ===
+
+POST /uri?t=upload
+
+ This uploads a file, and produces a file-cap for the contents, but does not
+ attach the file into the virtual drive. No directories will be modified by
+ this operation.
+
+ The file must be provided as the "file" field of an HTML encoded form body,
+ produced in response to an HTML form like this:
+ <form action="/uri" method="POST" enctype="multipart/form-data">
+ <input type="hidden" name="t" value="upload" />
+ <input type="file" name="file" />
+ <input type="submit" value="Upload Unlinked" />
+ </form>
+
+ If a "when_done=URL" argument is provided, the response body will cause the
+ browser to redirect to the given URL. If the when_done= URL has the string
+ "%(uri)s" in it, that string will be replaced by a URL-escaped form of the
+ newly created file-cap. (Note that without this substitution, there is no
+ way to access the file that was just uploaded).
+
+ The default (in the absence of when_done=) is to return an HTML page that
+ describes the results of the upload. This page will contain information
+ about which storage servers were used for the upload, how long each
+ operation took, etc.
+
+ If a "mutable=true" argument is provided, the operation will create a
+ mutable file, and the response body will contain the write-cap instead of
+ the upload results page. The default is to create an immutable file,
+ returning the upload results page as a response.
+
+
+POST /uri/$DIRCAP/[SUBDIRS../]?t=upload
+
+ This uploads a file, and attaches it as a new child of the given directory.
+ The file must be provided as the "file" field of an HTML encoded form body,
+ produced in response to an HTML form like this:
+ <form action="." method="POST" enctype="multipart/form-data">
+ <input type="hidden" name="t" value="upload" />
+ <input type="file" name="file" />
+ <input type="submit" value="Upload" />
+ </form>
+
+ A "name=" argument can be provided to specify the new child's name,
+ otherwise it will be taken from the "filename" field of the upload form
+ (most web browsers will copy the last component of the original file's
+ pathname into this field). To avoid confusion, name= is not allowed to
+ contain a slash.
+
+ If there is already a child with that name, and it is a mutable file, then
+ its contents are replaced with the data being uploaded. If it is not a
+ mutable file, the default behavior is to remove the existing child before
+ creating a new one. To prevent this (and make the operation return an error
+ instead of overwriting the old child), add a "replace=false" argument, as
+ "?t=upload&replace=false". With replace=false, this operation will return an
+ HTTP 409 "Conflict" error if there is already an object at the given
+ location, rather than overwriting the existing object. Note that "true",
+ "t", and "1" are all synonyms for "True", and "false", "f", and "0" are
+ synonyms for "False". the parameter is case-insensitive.
+
+ This will create additional intermediate directories as necessary, although
+ since it is expected to be triggered by a form that was retrieved by "GET
+ /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will
+ already exist.
+
+ If a "mutable=true" argument is provided, any new file that is created will
+ be a mutable file instead of an immutable one. <input type="checkbox"
+ name="mutable" /> will give the user a way to set this option.
+
+ If a "when_done=URL" argument is provided, the HTTP response will cause the
+ web browser to redirect to the given URL. This provides a convenient way to
+ return the browser to the directory that was just modified. Without a
+ when_done= argument, the HTTP response will simply contain the file-cap of
+ the file that was just uploaded (a write-cap for mutable files, or a
+ read-cap for immutable files).
+
+POST /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=upload
+
+ This also uploads a file and attaches it as a new child of the given
+ directory. It is a slight variant of the previous operation, as the URL
+ refers to the target file rather than the parent directory. It is otherwise
+ identical: this accepts mutable= and when_done= arguments too.
+
+POST /uri/$FILECAP?t=upload
+
+=== Attaching An Existing File Or Directory (by URI) ===
+
+POST /uri/$DIRCAP/[SUBDIRS../]?t=uri&name=CHILDNAME&uri=CHILDCAP
+
+ This attaches a given read- or write- cap "CHILDCAP" to the designated
+ directory, with a specified child name. This behaves much like the PUT t=uri
+ operation, and is a lot like a UNIX hardlink.
+
+ This will create additional intermediate directories as necessary, although
+ since it is expected to be triggered by a form that was retrieved by "GET
+ /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will
+ already exist.
+
+=== Deleting A Child ===
+
+POST /uri/$DIRCAP/[SUBDIRS../]?t=delete&name=CHILDNAME
+
+ This instructs the node to delete a child object (file or subdirectory) from
+ the given directory. Note that the entire subtree is removed. This is
+ somewhat like "rm -rf" (from the point of view of the parent), but other
+ references into the subtree will see that the child subdirectories are not
+ modified by this operation. Only the link from the given directory to its
+ child is severed.
+
+=== Renaming A Child ===
+
+POST /uri/$DIRCAP/[SUBDIRS../]?t=rename&from_name=OLD&to_name=NEW
+
+ This instructs the node to rename a child of the given directory. This is
+ exactly the same as removing the child, then adding the same child-cap under
+ the new name. This operation cannot move the child to a different directory.
+
+ This operation will replace any existing child of the new name, making it
+ behave like the UNIX "mv -f" command.
+
+=== Other Utilities ===
+
+GET /uri?uri=$CAP
+
+ This causes a redirect to /uri/$CAP, and retains any additional query
+ arguments (like filename= or save=). This is for the convenience of web
+ forms which allow the user to paste in a read- or write- cap (obtained
+ through some out-of-band channel, like IM or email).
+
+ Note that this form merely redirects to the specific file or directory
+ indicated by the $CAP: unlike the GET /uri/$DIRCAP form, you cannot
+ traverse to children by appending additional path segments to the URL.
+
+GET /uri/$DIRCAP/[SUBDIRS../]?t=rename-form&name=$CHILDNAME
+
+ This provides a useful facility to browser-based user interfaces. It
+ returns a page containing a form targetting the "POST $DIRCAP t=rename"
+ functionality described above, with the provided $CHILDNAME present in the
+ 'from_name' field of that form. I.e. this presents a form offering to
+ rename $CHILDNAME, requesting the new name, and submitting POST rename.
+
+GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri
+
+ This returns the file- or directory- cap for the specified object.
+
+GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=readonly-uri
+
+ This returns a read-only file- or directory- cap for the specified object.
+ If the object is an immutable file, this will return the same value as
+ t=uri.
+
+=== Debugging and Testing Features ===
+
+These URLs are less-likely to be helpful to the casual Tahoe user, and are
+mainly intended for developers.
+
+POST $URL?t=check
+
+ This triggers the FileChecker to determine the current "health" of the
+ given file or directory, by counting how many shares are available. The
+ page that is returned will display the results. This can be used as a "show
+ me detailed information about this file" page.
+
+ If a verify=true argument is provided, the node will perform a more
+ intensive check, downloading and verifying every single bit of every share.
+
+ If an output=JSON argument is provided, the response will be
+ machine-readable JSON instead of human-oriented HTML. The data is a
+ dictionary with the following keys:
+
+ storage-index: a base32-encoded string with the objects's storage index,
+ or an empty string for LIT files
+ results: a dictionary that describes the state of the file. For LIT files,
+ this dictionary has only the 'healthy' key, which will always be
+ True. For distributed files, this dictionary has the following
+ keys:
+ count-shares-good: the number of good shares that were found
+ count-shares-needed: 'k', the number of shares required for recovery
+ count-shares-expected: 'N', the number of total shares generated
+ count-good-share-hosts: the number of distinct storage servers with
+ good shares. If this number is less than
+ count-shares-good, then some shares are doubled
+ up, increasing the correlation of failures. This
+ indicates that one or more shares should be
+ moved to an otherwise unused server, if one is
+ available.
+ count-wrong-shares: for mutable files, the number of shares for
+ versions other than the 'best' one (highest
+ sequence number, highest roothash). These are
+ either old ...
+ count-recoverable-versions: for mutable files, the number of
+ recoverable versions of the file. For
+ a healthy file, this will equal 1.
+ count-unrecoverable-versions: for mutable files, the number of
+ unrecoverable versions of the file.
+ For a healthy file, this will be 0.
+ count-corrupt-shares: the number of shares with integrity failures
+ list-corrupt-shares: a list of "share locators", one for each share
+ that was found to be corrupt. Each share locator
+ is a list of (serverid, storage_index, sharenum).
+ needs-rebalancing: (bool) True if there are multiple shares on a single
+ storage server, indicating a reduction in reliability
+ that could be resolved by moving shares to new
+ servers.
+ servers-responding: list of base32-encoded storage server identifiers,
+ one for each server which responded to the share
+ query.
+ healthy: (bool) True if the file is completely healthy, False otherwise.
+ Healthy files have at least N good shares. Overlapping shares
+ (indicated by count-good-share-hosts < count-shares-good) do not
+ currently cause a file to be marked unhealthy. If there are at
+ least N good shares, then corrupt shares do not cause the file to
+ be marked unhealthy, although the corrupt shares will be listed
+ in the results (list-corrupt-shares) and should be manually
+ removed to wasting time in subsequent downloads (as the
+ downloader rediscovers the corruption and uses alternate shares).
+ sharemap: dict mapping share identifier to list of serverids
+ (base32-encoded strings). This indicates which servers are
+ holding which shares. For immutable files, the shareid is
+ an integer (the share number, from 0 to N-1). For
+ immutable files, it is a string of the form
+ 'seq%d-%s-sh%d', containing the sequence number, the
+ roothash, and the share number.
+
+POST $URL?t=start-deep-check (must add &ophandle=XYZ)
+
+ This initiates a recursive walk of all files and directories reachable from
+ the target, performing a check on each one just like t=check. The result
+ page will contain a summary of the results, including details on any
+ file/directory that was not fully healthy.
+
+ t=start-deep-check can only be invoked on a directory. An error (400
+ BAD_REQUEST) will be signalled if it is invoked on a file. The recursive
+ walker will deal with loops safely.
+
+ This accepts the same verify= argument as t=check.
+
+ Since this operation can take a long time (perhaps a second per object),
+ the ophandle= argument is required (see "Slow Operations, Progress, and
+ Cancelling" above). The response to this POST will be a redirect to the
+ corresponding /operations/$HANDLE page (with output=HTML or output=JSON to
+ match the output= argument given to the POST). The deep-check operation
+ will continue to run in the background, and the /operations page should be
+ used to find out when the operation is done.
+
+ Detailed checker results for non-healthy files and directories will be
+ available under /operations/$HANDLE/$STORAGEINDEX, and the HTML status will
+ contain links to these detailed results.
+
+ The HTML /operations/$HANDLE page for incomplete operations will contain a
+ meta-refresh tag, set to 60 seconds, so that a browser which uses
+ deep-check will automatically poll until the operation has completed.
+
+ The JSON page (/options/$HANDLE?output=JSON) will contain a
+ machine-readable JSON dictionary with the following keys:
+
+ finished: a boolean, True if the operation is complete, else False. Some
+ of the remaining keys may not be present until the operation
+ is complete.
+ root-storage-index: a base32-encoded string with the storage index of the
+ starting point of the deep-check operation
+ count-objects-checked: count of how many objects were checked. Note that
+ non-distributed objects (i.e. small immutable LIT
+ files) are not checked, since for these objects,
+ the data is contained entirely in the URI.
+ count-objects-healthy: how many of those objects were completely healthy
+ count-objects-unhealthy: how many were damaged in some way
+ count-corrupt-shares: how many shares were found to have corruption,
+ summed over all objects examined
+ list-corrupt-shares: a list of "share identifiers", one for each share
+ that was found to be corrupt. Each share identifier
+ is a list of (serverid, storage_index, sharenum).
+ list-unhealthy-files: a list of (pathname, check-results) tuples, for
+ each file that was not fully healthy. 'pathname' is
+ a list of strings (which can be joined by "/"
+ characters to turn it into a single string),
+ relative to the directory on which deep-check was
+ invoked. The 'check-results' field is the same as
+ that returned by t=check&output=JSON, described
+ above.
+ stats: a dictionary with the same keys as the t=deep-stats command
+ (described below)
+
+POST $URL?t=check&repair=true
+
+ This performs a health check of the given file or directory, and if the
+ checker determines that the object is not healthy (some shares are missing
+ or corrupted), it will perform a "repair". During repair, any missing
+ shares will be regenerated and uploaded to new servers.
+
+ This accepts the same verify=true argument as t=check. When an output=JSON
+ argument is provided, the machine-readable JSON response will contain the
+ following keys:
+
+ storage-index: a base32-encoded string with the objects's storage index,
+ or an empty string for LIT files
+ repair-attempted: (bool) True if repair was attempted
+ repair-successful: (bool) True if repair was attempted and the file was
+ fully healthy afterwards. False if no repair was
+ attempted, or if a repair attempt failed.
+ pre-repair-results: a dictionary that describes the state of the file
+ before any repair was performed. This contains exactly
+ the same keys as the 'results' value of the t=check
+ response, described above.
+ post-repair-results: a dictionary that describes the state of the file
+ after any repair was performed. If no repair was
+ performed, post-repair-results and pre-repair-results
+ will be the same. This contains exactly the same keys
+ as the 'results' value of the t=check response,
+ described above.
+
+POST $URL?t=start-deep-check&repair=true (must add &ophandle=XYZ)
+
+ This triggers a recursive walk of all files and directories, performing a
+ t=check&repair=true on each one.
+
+ Like t=start-deep-check without the repair= argument, this can only be
+ invoked on a directory. An error (400 BAD_REQUEST) will be signalled if it
+ is invoked on a file. The recursive walker will deal with loops safely.
+
+ This accepts the same verify=true argument as t=start-deep-check. It uses
+ the same ophandle= mechanism as start-deep-check. When an output=JSON
+ argument is provided, the response will contain the following keys:
+
+ finished: (bool) True if the operation has completed, else False
+ root-storage-index: a base32-encoded string with the storage index of the
+ starting point of the deep-check operation
+ count-objects-checked: count of how many objects were checked
+
+ count-objects-healthy-pre-repair: how many of those objects were completely
+ healthy, before any repair
+ count-objects-unhealthy-pre-repair: how many were damaged in some way
+ count-objects-healthy-post-repair: how many of those objects were completely
+ healthy, after any repair
+ count-objects-unhealthy-post-repair: how many were damaged in some way
+
+ count-repairs-attempted: repairs were attempted on this many objects.
+ count-repairs-successful: how many repairs resulted in healthy objects
+ count-repairs-unsuccessful: how many repairs resulted did not results in
+ completely healthy objects
+ count-corrupt-shares-pre-repair: how many shares were found to have
+ corruption, summed over all objects
+ examined, before any repair
+ count-corrupt-shares-post-repair: how many shares were found to have
+ corruption, summed over all objects
+ examined, after any repair
+ list-corrupt-shares: a list of "share identifiers", one for each share
+ that was found to be corrupt (before any repair).
+ Each share identifier is a list of (serverid,
+ storage_index, sharenum).
+ list-remaining-corrupt-shares: like list-corrupt-shares, but mutable shares
+ that were successfully repaired are not
+ included. These are shares that need
+ manual processing. Since immutable shares
+ cannot be modified by clients, all corruption
+ in immutable shares will be listed here.
+ list-unhealthy-files: a list of (pathname, check-results) tuples, for
+ each file that was not fully healthy. 'pathname' is
+ relative to the directory on which deep-check was
+ invoked. The 'check-results' field is the same as
+ that returned by t=check&repair=true&output=JSON,
+ described above.
+ stats: a dictionary with the same keys as the t=deep-stats command
+ (described below)
+
+POST $DIRURL?t=start-manifest (must add &ophandle=XYZ)
+
+ This operation generates a "manfest" of the given directory tree, mostly
+ for debugging. This is a table of (path, filecap/dircap), for every object
+ reachable from the starting directory. The path will be slash-joined, and
+ the filecap/dircap will contain a link to the object in question. This page
+ gives immediate access to every object in the virtual filesystem subtree.
+
+ This operation uses the same ophandle= mechanism as deep-check. The
+ corresponding /operations/$HANDLE page has three different forms. The
+ default is output=HTML.
+
+ If output=text is added to the query args, the results will be a text/plain
+ list. The first line is special: it is either "finished: yes" or "finished:
+ no"; if the operation is not finished, you must periodically reload the
+ page until it completes. The rest of the results are a plaintext list, with
+ one file/dir per line, slash-separated, with the filecap/dircap separated
+ by a space.
+
+ If output=JSON is added to the queryargs, then the results will be a
+ JSON-formatted dictionary with three keys:
+
+ finished (bool): if False then you must reload the page until True
+ origin_si (str): the storage index of the starting point
+ manifest: list of (path, cap) tuples, where path is a list of strings.
+
+POST $DIRURL?t=start-deep-size (must add &ophandle=XYZ)
+
+ This operation generates a number (in bytes) containing the sum of the
+ filesize of all directories and immutable files reachable from the given
+ directory. This is a rough lower bound of the total space consumed by this
+ subtree. It does not include space consumed by mutable files, nor does it
+ take expansion or encoding overhead into account. Later versions of the
+ code may improve this estimate upwards.
+
+ The /operations/$HANDLE status output consists of two lines of text:
+
+ finished: yes
+ size: 1234
+
+POST $DIRURL?t=start-deep-stats (must add &ophandle=XYZ)
+
+ This operation performs a recursive walk of all files and directories
+ reachable from the given directory, and generates a collection of
+ statistics about those objects.
+
+ The result (obtained from the /operations/$OPHANDLE page) is a
+ JSON-serialized dictionary with the following keys (note that some of these
+ keys may be missing until 'finished' is True):
+
+ finished: (bool) True if the operation has finished, else False
+ count-immutable-files: count of how many CHK files are in the set
+ count-mutable-files: same, for mutable files (does not include directories)
+ count-literal-files: same, for LIT files (data contained inside the URI)
+ count-files: sum of the above three
+ count-directories: count of directories
+ size-immutable-files: total bytes for all CHK files in the set, =deep-size
+ size-mutable-files (TODO): same, for current version of all mutable files
+ size-literal-files: same, for LIT files
+ size-directories: size of directories (includes size-literal-files)
+ size-files-histogram: list of (minsize, maxsize, count) buckets,
+ with a histogram of filesizes, 5dB/bucket,
+ for both literal and immutable files
+ largest-directory: number of children in the largest directory
+ largest-immutable-file: number of bytes in the largest CHK file
+
+ size-mutable-files is not implemented, because it would require extra
+ queries to each mutable file to get their size. This may be implemented in
+ the future.
+
+ Assuming no sharing, the basic space consumed by a single root directory is
+ the sum of size-immutable-files, size-mutable-files, and size-directories.
+ The actual disk space used by the shares is larger, because of the
+ following sources of overhead:
+
+ integrity data
+ expansion due to erasure coding
+ share management data (leases)
+ backend (ext3) minimum block size
+
+== Other Useful Pages ==
+
+The portion of the web namespace that begins with "/uri" (and "/named") is
+dedicated to giving users (both humans and programs) access to the Tahoe
+virtual filesystem. The rest of the namespace provides status information
+about the state of the Tahoe node.
+
+GET / (the root page)
+
+This is the "Welcome Page", and contains a few distinct sections:
+
+ Node information: library versions, local nodeid, services being provided.
+
+ Filesystem Access Forms: create a new directory, view a file/directory by
+ URI, upload a file (unlinked), download a file by
+ URI.
+
+ Grid Status: introducer information, helper information, connected storage
+ servers.
+
+GET /status/
+
+ This page lists all active uploads and downloads, and contains a short list
+ of recent upload/download operations. Each operation has a link to a page
+ that describes file sizes, servers that were involved, and the time consumed
+ in each phase of the operation.
+
+ A GET of /status/?t=json will contain a machine-readable subset of the same
+ data. It returns a JSON-encoded dictionary. The only key defined at this
+ time is "active", with a value that is a list of operation dictionaries, one
+ for each active operation. Once an operation is completed, it will no longer
+ appear in data["active"] .
+
+ Each op-dict contains a "type" key, one of "upload", "download",
+ "mapupdate", "publish", or "retrieve" (the first two are for immutable
+ files, while the latter three are for mutable files and directories).
+
+ The "upload" op-dict will contain the following keys:
+
+ type (string): "upload"
+ storage-index-string (string): a base32-encoded storage index
+ total-size (int): total size of the file
+ status (string): current status of the operation
+ progress-hash (float): 1.0 when the file has been hashed
+ progress-ciphertext (float): 1.0 when the file has been encrypted.
+ progress-encode-push (float): 1.0 when the file has been encoded and
+ pushed to the storage servers. For helper
+ uploads, the ciphertext value climbs to 1.0
+ first, then encoding starts. For unassisted
+ uploads, ciphertext and encode-push progress
+ will climb at the same pace.
+
+ The "download" op-dict will contain the following keys:
+
+ type (string): "download"
+ storage-index-string (string): a base32-encoded storage index
+ total-size (int): total size of the file
+ status (string): current status of the operation
+ progress (float): 1.0 when the file has been fully downloaded
+
+ Front-ends which want to report progress information are advised to simply
+ average together all the progress-* indicators. A slightly more accurate
+ value can be found by ignoring the progress-hash value (since the current
+ implementation hashes synchronously, so clients will probably never see
+ progress-hash!=1.0).
+
+GET /provisioning/
+
+ This page provides a basic tool to predict the likely storage and bandwidth
+ requirements of a large Tahoe grid. It provides forms to input things like
+ total number of users, number of files per user, average file size, number
+ of servers, expansion ratio, hard drive failure rate, etc. It then provides
+ numbers like how many disks per server will be needed, how many read
+ operations per second should be expected, and the likely MTBF for files in
+ the grid. This information is very preliminary, and the model upon which it
+ is based still needs a lot of work.
+
+GET /helper_status/
+
+ If the node is running a helper (i.e. if "$BASEDIR/run_helper" is
+ non-empty), then this page will provide a list of all the helper operations
+ currently in progress. If "?t=json" is added to the URL, it will return a
+ JSON-formatted list of helper statistics, which can then be used to produce
+ graphs to indicate how busy the helper is.
+
+GET /statistics/
+
+ This page provides "node statistics", which are collected from a variety of
+ sources.
+
+ load_monitor: every second, the node schedules a timer for one second in
+ the future, then measures how late the subsequent callback
+ is. The "load_average" is this tardiness, measured in
+ seconds, averaged over the last minute. It is an indication
+ of a busy node, one which is doing more work than can be
+ completed in a timely fashion. The "max_load" value is the
+ highest value that has been seen in the last 60 seconds.
+
+ cpu_monitor: every minute, the node uses time.clock() to measure how much
+ CPU time it has used, and it uses this value to produce
+ 1min/5min/15min moving averages. These values range from 0%
+ (0.0) to 100% (1.0), and indicate what fraction of the CPU
+ has been used by the Tahoe node. Not all operating systems
+ provide meaningful data to time.clock(): they may report 100%
+ CPU usage at all times.
+
+ uploader: this counts how many immutable files (and bytes) have been
+ uploaded since the node was started
+
+ downloader: this counts how many immutable files have been downloaded
+ since the node was started
+
+ publishes: this counts how many mutable files (including directories) have
+ been modified since the node was started
+
+ retrieves: this counts how many mutable files (including directories) have
+ been read since the node was started
+
+ There are other statistics that are tracked by the node. The "raw stats"
+ section shows a formatted dump of all of them.
+
+ By adding "?t=json" to the URL, the node will return a JSON-formatted
+ dictionary of stats values, which can be used by other tools to produce
+ graphs of node behavior. The misc/munin/ directory in the source
+ distribution provides some tools to produce these graphs.
+
+GET / (introducer status)
+
+ For Introducer nodes, the welcome page displays information about both
+ clients and servers which are connected to the introducer. Servers make
+ "service announcements", and these are listed in a table. Clients will
+ subscribe to hear about service announcements, and these subscriptions are
+ listed in a separate table. Both tables contain information about what
+ version of Tahoe is being run by the remote node, their advertised and
+ outbound IP addresses, their nodeid and nickname, and how long they have
+ been available.
+
+ By adding "?t=json" to the URL, the node will return a JSON-formatted
+ dictionary of stats values, which can be used to produce graphs of connected
+ clients over time.
+
+
+== Static Files in /public_html ==
+
+The webapi server will take any request for a URL that starts with /static
+and serve it from a configurable directory which defaults to
+$BASEDIR/public_html . This is configured by setting the "[node]web.static"
+value in $BASEDIR/tahoe.cfg . If this is left at the default value of
+"public_html", then http://localhost:8123/static/subdir/foo.html will be
+served with the contents of the file $BASEDIR/public_html/subdir/foo.html .
+
+This can be useful to serve a javascript application which provides a
+prettier front-end to the rest of the Tahoe webapi.
+
+
+== safety and security issues -- names vs. URIs ==
+
+Summary: use explicit file- and dir- caps whenever possible, to reduce the
+potential for surprises when the virtual drive is changed while you aren't
+looking.
+
+The vdrive provides a mutable filesystem, but the ways that the filesystem
+can change are limited. The only thing that can change is that the mapping
+from child names to child objects that each directory contains can be changed
+by adding a new child name pointing to an object, removing an existing child
+name, or changing an existing child name to point to a different object.
+
+Obviously if you query tahoe for information about the filesystem and then
+act upon the filesystem (such as by getting a listing of the contents of a
+directory and then adding a file to the directory), then the filesystem might
+have been changed after you queried it and before you acted upon it.
+However, if you use the URI instead of the pathname of an object when you act
+upon the object, then the only change that can happen is when the object is a
+directory then the set of child names it has might be different. If, on the
+other hand, you act upon the object using its pathname, then a different
+object might be in that place, which can result in more kinds of surprises.
+
+For example, suppose you are writing code which recursively downloads the
+contents of a directory. The first thing your code does is fetch the listing
+of the contents of the directory. For each child that it fetched, if that
+child is a file then it downloads the file, and if that child is a directory
+then it recurses into that directory. Now, if the download and the recurse
+actions are performed using the child's name, then the results might be
+wrong, because for example a child name that pointed to a sub-directory when
+you listed the directory might have been changed to point to a file (in which
+case your attempt to recurse into it would result in an error and the file
+would be skipped), or a child name that pointed to a file when you listed the
+directory might now point to a sub-directory (in which case your attempt to
+download the child would result in a file containing HTML text describing the
+sub-directory!).
+
+If your recursive algorithm uses the uri of the child instead of the name of
+the child, then those kinds of mistakes just can't happen. Note that both the
+child's name and the child's URI are included in the results of listing the
+parent directory, so it isn't any harder to use the URI for this purpose.
+
+In general, use names if you want "whatever object (whether file or
+directory) is found by following this name (or sequence of names) when my
+request reaches the server". Use URIs if you want "this particular object".
+
+== Concurrency Issues ==
+
+Tahoe uses both mutable and immutable files. Mutable files can be created
+explicitly by doing an upload with ?mutable=true added, or implicitly by
+creating a new directory (since a directory is just a special way to
+interpret a given mutable file).
+
+Mutable files suffer from the same consistency-vs-availability tradeoff that
+all distributed data storage systems face. It is not possible to
+simultaneously achieve perfect consistency and perfect availability in the
+face of network partitions (servers being unreachable or faulty).
+
+Tahoe tries to achieve a reasonable compromise, but there is a basic rule in
+place, known as the Prime Coordination Directive: "Don't Do That". What this
+means is that if write-access to a mutable file is available to several
+parties, then those parties are responsible for coordinating their activities
+to avoid multiple simultaneous updates. This could be achieved by having
+these parties talk to each other and using some sort of locking mechanism, or
+by serializing all changes through a single writer.
+
+The consequences of performing uncoordinated writes can vary. Some of the
+writers may lose their changes, as somebody else wins the race condition. In
+many cases the file will be left in an "unhealthy" state, meaning that there
+are not as many redundant shares as we would like (reducing the reliability
+of the file against server failures). In the worst case, the file can be left
+in such an unhealthy state that no version is recoverable, even the old ones.
+It is this small possibility of data loss that prompts us to issue the Prime
+Coordination Directive.
+
+Tahoe nodes implement internal serialization to make sure that a single Tahoe
+node cannot conflict with itself. For example, it is safe to issue two
+directory modification requests to a single tahoe node's webapi server at the
+same time, because the Tahoe node will internally delay one of them until
+after the other has finished being applied. (This feature was introduced in
+Tahoe-1.1; back with Tahoe-1.0 the web client was responsible for serializing
+web requests themselves).
+
+For more details, please see the "Consistency vs Availability" and "The Prime
+Coordination Directive" sections of mutable.txt, in the same directory as
+this file.
+
+
+[1]: URLs and HTTP and UTF-8, Oh My
+
+ HTTP does not provide a mechanism to specify the character set used to
+ encode non-ascii names in URLs (rfc2396#2.1). We prefer the convention that
+ the filename= argument shall be a URL-encoded UTF-8 encoded unicode object.
+ For example, suppose we want to provoke the server into using a filename of
+ "f i a n c e-acute e" (i.e. F I A N C U+00E9 E). The UTF-8 encoding of this
+ is 0x66 0x69 0x61 0x6e 0x63 0xc3 0xa9 0x65 (or "fianc\xC3\xA9e", as python's
+ repr() function would show). To encode this into a URL, the non-printable
+ characters must be escaped with the urlencode '%XX' mechansim, giving us
+ "fianc%C3%A9e". Thus, the first line of the HTTP request will be "GET
+ /uri/CAP...?save=true&filename=fianc%C3%A9e HTTP/1.1". Not all browsers
+ provide this: IE7 uses the Latin-1 encoding, which is fianc%E9e.
+
+ The response header will need to indicate a non-ASCII filename. The actual
+ mechanism to do this is not clear. For ASCII filenames, the response header
+ would look like:
+
+ Content-Disposition: attachment; filename="english.txt"
+
+ If Tahoe were to enforce the utf-8 convention, it would need to decode the
+ URL argument into a unicode string, and then encode it back into a sequence
+ of bytes when creating the response header. One possibility would be to use
+ unencoded utf-8. Developers suggest that IE7 might accept this:
+
+ #1: Content-Disposition: attachment; filename="fianc\xC3\xA9e"
+ (note, the last four bytes of that line, not including the newline, are
+ 0xC3 0xA9 0x65 0x22)
+
+ RFC2231#4 (dated 1997): suggests that the following might work, and some
+ developers (http://markmail.org/message/dsjyokgl7hv64ig3) have reported that
+ it is supported by firefox (but not IE7):
+
+ #2: Content-Disposition: attachment; filename*=utf-8''fianc%C3%A9e
+
+ My reading of RFC2616#19.5.1 (which defines Content-Disposition) says that
+ the filename= parameter is defined to be wrapped in quotes (presumeably to
+ allow spaces without breaking the parsing of subsequent parameters), which
+ would give us:
+
+ #3: Content-Disposition: attachment; filename*=utf-8''"fianc%C3%A9e"
+
+ However this is contrary to the examples in the email thread listed above.
+
+ Developers report that IE7 (when it is configured for UTF-8 URL encoding,
+ which is not the default in asian countries), will accept:
+
+ #4: Content-Disposition: attachment; filename=fianc%C3%A9e
+
+ However, for maximum compatibility, Tahoe simply copies bytes from the URL
+ into the response header, rather than enforcing the utf-8 convention. This
+ means it does not try to decode the filename from the URL argument, nor does
+ it encode the filename into the response header.
+++ /dev/null
-= Tahoe FTP Frontend =
-
-All Tahoe client nodes can run a frontend FTP server, allowing regular FTP
-clients to access the virtual filesystem.
-
-Since Tahoe does not use user accounts or passwords, the FTP server must be
-configured with a way to translate USER+PASS into a root directory cap. Two
-mechanisms are provided. The first is a simple flat file with one account per
-line. The second is an HTTP-based login mechanism, backed by simple PHP
-script and a database. The latter form is used by allmydata.com to provide
-secure access to customer rootcaps.
-
-== Configuring an Account File ==
-
-To configure the first form, create a file (probably in
-BASEDIR/private/ftp.accounts) in which each non-comment/non-blank line is a
-space-separated line of (USERNAME, PASSWORD, ROOTCAP), like so:
-
- % cat BASEDIR/private/ftp.accounts
- # This is a password file, (username, password, rootcap)
- alice password URI:DIR2:ioej8xmzrwilg772gzj4fhdg7a:wtiizszzz2rgmczv4wl6bqvbv33ag4kvbr6prz3u6w3geixa6m6a
- bob sekrit URI:DIR2:6bdmeitystckbl9yqlw7g56f4e:serp5ioqxnh34mlbmzwvkp3odehsyrr7eytt5f64we3k9hhcrcja
-
-Then add the following lines to the BASEDIR/tahoe.cfg file:
-
- [ftpd]
- enabled = true
- ftp.port = 8021
- ftp.accounts.file = private/ftp.accounts
-
-The FTP server will listen on the given port number. The ftp.accounts.file
-pathname will be interpreted relative to the node's BASEDIR.
-
-== Configuring an Account Server ==
-
-Determine the URL of the account server, say https://example.com/login . Then
-add the following lines to BASEDIR/tahoe.cfg:
-
- [ftpd]
- enabled = true
- ftp.port = 8021
- ftp.accounts.url = https://example.com/login
-
-== Dependencies ==
-
-The FTP server requires code in Twisted that enables asynchronous closing of
-file-upload operations. This code was not in the Twisted-8.1.0 release, and
-has not been committed to SVN trunk as of r24943. So it may be necessary to
-apply the following patch. The Tahoe node refuse to start the FTP server if
-it detects that this patch has not been applied.
-
-Index: twisted/protocols/ftp.py
-===================================================================
---- twisted/protocols/ftp.py (revision 24956)
-+++ twisted/protocols/ftp.py (working copy)
-@@ -1049,7 +1049,6 @@
- cons = ASCIIConsumerWrapper(cons)
-
- d = self.dtpInstance.registerConsumer(cons)
-- d.addCallbacks(cbSent, ebSent)
-
- # Tell them what to doooo
- if self.dtpInstance.isConnected:
-@@ -1062,6 +1061,8 @@
- def cbOpened(file):
- d = file.receive()
- d.addCallback(cbConsumer)
-+ d.addCallback(lambda ignored: file.close())
-+ d.addCallbacks(cbSent, ebSent)
- return d
-
- def ebOpened(err):
-@@ -1434,7 +1435,14 @@
- @rtype: C{Deferred} of C{IConsumer}
- """
-
-+ def close():
-+ """
-+ Perform any post-write work that needs to be done. This method may
-+ only be invoked once on each provider, and will always be invoked
-+ after receive().
-
-+ @rtype: C{Deferred} of anything: the value is ignored
-+ """
-
- def _getgroups(uid):
- """Return the primary and supplementary groups for the given UID.
-@@ -1795,6 +1803,8 @@
- # FileConsumer will close the file object
- return defer.succeed(FileConsumer(self.fObj))
-
-+ def close(self):
-+ return defer.succeed(None)
-
-
- class FTPRealm:
-Index: twisted/vfs/adapters/ftp.py
-===================================================================
---- twisted/vfs/adapters/ftp.py (revision 24956)
-+++ twisted/vfs/adapters/ftp.py (working copy)
-@@ -295,6 +295,11 @@
- """
- return defer.succeed(IConsumer(self.node))
-
-+ def close(self):
-+ """
-+ Perform post-write actions.
-+ """
-+ return defer.succeed(None)
-
-
- class _FileToConsumerAdapter(object):
+++ /dev/null
-= Tahoe SFTP Frontend =
-
-All Tahoe client nodes can run a frontend SFTP server, allowing regular SFTP
-clients to access the virtual filesystem.
-
-Since Tahoe does not use user accounts or passwords, the FTP server must be
-configured with a way to translate a username (and either a password or
-public key) into a root directory cap. Two mechanisms are provided. The first
-is a simple flat file with one account per line. The second is an HTTP-based
-login mechanism, backed by simple PHP script and a database. The latter form
-is used by allmydata.com to provide secure access to customer rootcaps.
-
-The SFTP server must also be given a public/private host keypair.
-
-== Configuring a Keypair ==
-
-First, generate a keypair for your server:
-
-% cd BASEDIR
-% ssh-keygen -f private/ssh_host_rsa_key
-
-You will then use the following lines in the tahoe.cfg file:
-
- [sftpd]
- sftp.host_pubkey_file = private/ssh_host_rsa_key.pub
- sftp.host_privkey_file = private/ssh_host_rsa_key
-
-== Configuring an Account File ==
-
-To configure the first form, create a file (probably in
-BASEDIR/private/sftp.accounts) in which each non-comment/non-blank line is a
-space-separated line of (USERNAME, PASSWORD/PUBKEY, ROOTCAP), like so:
-
-[TODO: the PUBKEY form is not yet supported]
-
- % cat BASEDIR/private/sftp.accounts
- # This is a password file, (username, password/pubkey, rootcap)
- alice password URI:DIR2:ioej8xmzrwilg772gzj4fhdg7a:wtiizszzz2rgmczv4wl6bqvbv33ag4kvbr6prz3u6w3geixa6m6a
- bob sekrit URI:DIR2:6bdmeitystckbl9yqlw7g56f4e:serp5ioqxnh34mlbmzwvkp3odehsyrr7eytt5f64we3k9hhcrcja
- carol ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAv2xHRVBoXnwxHLzthRD1wOWtyZ08b8n9cMZfJ58CBdBwAYP2NVNXc0XjRvswm5hnnAO+jyWPVNpXJjm9XllzYhODSNtSN+TXuJlUjhzA/T+ZwdgsgSAeHuuMQBoWt4Qc9HV6rHCdAeMhcnyqm6Q0sRAsfA/wfwiIgbvE7+cWpFa2anB6WeAnvK8+dMN0nvnkPE7GNyf/WFR1Ffuh9ifKdRB6yDNp17bQAqA3OWSFjch6fGPhp94y4g2jmTHlEUTyVsilgGqvGOutOVYnmOMnFijugU1Vu33G39GGzXWla6+fXwTk/oiVPiCYD7A7WFKes3nqMg8iVN6a6sxujrhnHQ== warner@fluxx URI:DIR2:6bdmeitystckbl9yqlw7g56f4e:serp5ioqxnh34mlbmzwvkp3odehsyrr7eytt5f64we3k9hhcrcja
-
-Note that if the second word of the line is "ssh-rsa" or "ssh-dss", the rest
-of the line is parsed differently, so users cannot have a password equal to
-either of these strings.
-
-Then add the following lines to the BASEDIR/tahoe.cfg file:
-
- [sftpd]
- enabled = true
- sftp.port = 8022
- sftp.host_pubkey_file = private/ssh_host_rsa_key.pub
- sftp.host_privkey_file = private/ssh_host_rsa_key
- sftp.accounts.file = private/sftp.accounts
-
-The SFTP server will listen on the given port number. The sftp.accounts.file
-pathname will be interpreted relative to the node's BASEDIR.
-
-== Configuring an Account Server ==
-
-Determine the URL of the account server, say https://example.com/login . Then
-add the following lines to BASEDIR/tahoe.cfg:
-
- [sftpd]
- enabled = true
- sftp.port = 8022
- sftp.host_pubkey_file = private/ssh_host_rsa_key.pub
- sftp.host_privkey_file = private/ssh_host_rsa_key
- sftp.accounts.url = https://example.com/login
-
-== Dependencies ==
-
-The Tahoe SFTP server requires the Twisted "Conch" component, which itself
-requires the pycrypto package (note that pycrypto is distinct from the
-pycryptopp that Tahoe uses).
+++ /dev/null
-
-= The Tahoe REST-ful Web API =
-
-1. Enabling the web-API port
-2. Basic Concepts: GET, PUT, DELETE, POST
-3. URLs, Machine-Oriented Interfaces
-4. Browser Operations: Human-Oriented Interfaces
-5. Welcome / Debug / Status pages
-6. Static Files in /public_html
-7. Safety and security issues -- names vs. URIs
-8. Concurrency Issues
-
-
-== Enabling the web-API port ==
-
-Every Tahoe node is capable of running a built-in HTTP server. To enable
-this, just write a port number into a file named "webport" in the node's base
-directory. For example, writing "8123" into $NODEDIR/webport will cause the
-node to run a webserver on port 8123.
-
-This string is actually a Twisted "strports" specification, meaning you can
-get more control over the interface to which the server binds by supplying
-additional arguments. For more details, see the documentation on
-twisted.application.strports:
-http://twistedmatrix.com/documents/current/api/twisted.application.strports.html
-
-Writing "tcp:8123:interface=127.0.0.1" into $NODEDIR/webport does the same
-but binds to the loopback interface, ensuring that only the programs on the
-local host can connect. Using
-"ssl:8123:privateKey=mykey.pem:certKey=cert.pem" runs an SSL server.
-
-This webport can be set when the node is created by passing a --webport
-option to the 'tahoe create-client' command. By default, the node listens on
-port 8123, on the loopback (127.0.0.1) interface.
-
-== Basic Concepts ==
-
-As described in architecture.txt, each file and directory in a Tahoe virtual
-filesystem is referenced by an identifier that combines the designation of
-the object with the authority to do something with it (such as read or modify
-the contents). This identifier is called a "read-cap" or "write-cap",
-depending upon whether it enables read-only or read-write access. These
-"caps" are also referred to as URIs.
-
-The Tahoe web-based API is "REST-ful", meaning it implements the concepts of
-"REpresentational State Transfer": the original scheme by which the World
-Wide Web was intended to work. Each object (file or directory) is referenced
-by a URL that includes the read- or write- cap. HTTP methods (GET, PUT, and
-DELETE) are used to manipulate these objects. You can think of the URL as a
-noun, and the method as a verb.
-
-In REST, the GET method is used to retrieve information about an object, or
-to retrieve some representation of the object itself. When the object is a
-file, the basic GET method will simply return the contents of that file.
-Other variations (generally implemented by adding query parameters to the
-URL) will return information about the object, such as metadata. GET
-operations are required to have no side-effects.
-
-PUT is used to upload new objects into the filesystem, or to replace an
-existing object. DELETE it used to delete objects from the filesystem. Both
-PUT and DELETE are required to be idempotent: performing the same operation
-multiple times must have the same side-effects as only performing it once.
-
-POST is used for more complicated actions that cannot be expressed as a GET,
-PUT, or DELETE. POST operations can be thought of as a method call: sending
-some message to the object referenced by the URL. In Tahoe, POST is also used
-for operations that must be triggered by an HTML form (including upload and
-delete), because otherwise a regular web browser has no way to accomplish
-these tasks.
-
-Tahoe's web API is designed for two different consumers. The first is a
-program that needs to manipulate the virtual file system. Such programs are
-expected to use the RESTful interface described above. The second is a human
-using a standard web browser to work with the filesystem. This user is given
-a series of HTML pages with links to download files, and forms that use POST
-actions to upload, rename, and delete files.
-
-== URLs ==
-
-Tahoe uses a variety of read- and write- caps to identify files and
-directories. The most common of these is the "immutable file read-cap", which
-is used for most uploaded files. These read-caps look like the following:
-
- URI:CHK:ime6pvkaxuetdfah2p2f35pe54:4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a:3:10:202
-
-The next most common is a "directory write-cap", which provides both read and
-write access to a directory, and look like this:
-
- URI:DIR2:djrdkfawoqihigoett4g6auz6a:jx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq
-
-There are also "directory read-caps", which start with "URI:DIR2-RO:", and
-give read-only access to a directory. Finally there are also mutable file
-read- and write- caps, which start with "URI:SSK", and give access to mutable
-files.
-
-(later versions of Tahoe will make these strings shorter, and will remove the
-unfortunate colons, which must be escaped when these caps are embedded in
-URLs).
-
-To refer to any Tahoe object through the web API, you simply need to combine
-a prefix (which indicates the HTTP server to use) with the cap (which
-indicates which object inside that server to access). Since the default Tahoe
-webport is 8123, the most common prefix is one that will use a local node
-listening on this port:
-
- http://127.0.0.1:8123/uri/ + $CAP
-
-So, to access the directory named above (which happens to be the
-publically-writable sample directory on the Tahoe test grid, described at
-http://allmydata.org/trac/tahoe/wiki/TestGrid), the URL would be:
-
- http://127.0.0.1:8123/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/
-
-(note that the colons in the directory-cap are url-encoded into "%3A"
-sequences).
-
-Likewise, to access the file named above, use:
-
- http://127.0.0.1:8123/uri/URI%3ACHK%3Aime6pvkaxuetdfah2p2f35pe54%3A4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a%3A3%3A10%3A202
-
-In the rest of this document, we'll use "$DIRCAP" as shorthand for a read-cap
-or write-cap that refers to a directory, and "$FILECAP" to abbreviate a cap
-that refers to a file (whether mutable or immutable). So those URLs above can
-be abbreviated as:
-
- http://127.0.0.1:8123/uri/$DIRCAP/
- http://127.0.0.1:8123/uri/$FILECAP
-
-The operation summaries below will abbreviate these further, by eliding the
-server prefix. They will be displayed like this:
-
- /uri/$DIRCAP/
- /uri/$FILECAP
-
-
-=== Child Lookup ===
-
-Tahoe directories contain named children, just like directories in a regular
-local filesystem. These children can be either files or subdirectories.
-
-If you have a Tahoe URL that refers to a directory, and want to reference a
-named child inside it, just append the child name to the URL. For example, if
-our sample directory contains a file named "welcome.txt", we can refer to
-that file with:
-
- http://127.0.0.1:8123/uri/$DIRCAP/welcome.txt
-
-(or http://127.0.0.1:8123/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/welcome.txt)
-
-Multiple levels of subdirectories can be handled this way:
-
- http://127.0.0.1:8123/uri/$DIRCAP/tahoe-source/docs/webapi.txt
-
-In this document, when we need to refer to a URL that references a file using
-this child-of-some-directory format, we'll use the following string:
-
- /uri/$DIRCAP/[SUBDIRS../]FILENAME
-
-The "[SUBDIRS../]" part means that there are zero or more (optional)
-subdirectory names in the middle of the URL. The "FILENAME" at the end means
-that this whole URL refers to a file of some sort, rather than to a
-directory.
-
-When we need to refer specifically to a directory in this way, we'll write:
-
- /uri/$DIRCAP/[SUBDIRS../]SUBDIR
-
-
-Note that all components of pathnames in URLs are required to be UTF-8
-encoded, so "resume.doc" (with an acute accent on both E's) would be accessed
-with:
-
- http://127.0.0.1:8123/uri/$DIRCAP/r%C3%A9sum%C3%A9.doc
-
-Also note that the filenames inside upload POST forms are interpreted using
-whatever character set was provided in the conventional '_charset' field, and
-defaults to UTF-8 if not otherwise specified. The JSON representation of each
-directory contains native unicode strings. Tahoe directories are specified to
-contain unicode filenames, and cannot contain binary strings that are not
-representable as such.
-
-All Tahoe operations that refer to existing files or directories must include
-a suitable read- or write- cap in the URL: the webapi server won't add one
-for you. If you don't know the cap, you can't access the file. This allows
-the security properties of Tahoe caps to be extended across the webapi
-interface.
-
-== Slow Operations, Progress, and Cancelling ==
-
-Certain operations can be expected to take a long time. The "t=deep-check",
-described below, will recursively visit every file and directory reachable
-from a given starting point, which can take minutes or even hours for
-extremely large directory structures. A single long-running HTTP request is a
-fragile thing: proxies, NAT boxes, browsers, and users may all grow impatient
-with waiting and give up on the connection.
-
-For this reason, long-running operations have an "operation handle", which
-can be used to poll for status/progress messages while the operation
-proceeds. This handle can also be used to cancel the operation. These handles
-are created by the client, and passed in as a an "ophandle=" query argument
-to the POST or PUT request which starts the operation. The following
-operations can then be used to retrieve status:
-
-GET /operations/$HANDLE?output=HTML (with or without t=status)
-GET /operations/$HANDLE?output=JSON (same)
-
- These two retrieve the current status of the given operation. Each operation
- presents a different sort of information, but in general the page retrieved
- will indicate:
-
- * whether the operation is complete, or if it is still running
- * how much of the operation is complete, and how much is left, if possible
-
- The HTML form will include a meta-refresh tag, which will cause a regular
- web browser to reload the status page about 60 seconds later. This tag will
- be removed once the operation has completed.
-
- There may be more status information available under
- /operations/$HANDLE/$ETC : i.e., the handle forms the root of a URL space.
-
-POST /operations/$HANDLE?t=cancel
-
- This terminates the operation, and returns an HTML page explaining what was
- cancelled. If the operation handle has already expired (see below), this
- POST will return a 404, which indicates that the operation is no longer
- running (either it was completed or terminated). The response body will be
- the same as a GET /operations/$HANDLE on this operation handle, and the
- handle will be expired immediately afterwards.
-
-The operation handle will eventually expire, to avoid consuming an unbounded
-amount of memory. The handle's time-to-live can be reset at any time, by
-passing a retain-for= argument (with a count of seconds) to either the
-initial POST that starts the operation, or the subsequent GET request which
-asks about the operation. For example, if a 'GET
-/operations/$HANDLE?output=JSON&retain-for=600' query is performed, the
-handle will remain active for 600 seconds (10 minutes) after the GET was
-received.
-
-In addition, if the GET includes a release-after-complete=True argument, and
-the operation has completed, the operation handle will be released
-immediately.
-
-If a retain-for= argument is not used, the default handle lifetimes are:
-
- * handles will remain valid at least until their operation finishes
- * uncollected handles for finished operations (i.e. handles for operations
- which have finished but for which the GET page has not been accessed since
- completion) will remain valid for one hour, or for the total time consumed
- by the operation, whichever is greater.
- * collected handles (i.e. the GET page has been retrieved at least once
- since the operation completed) will remain valid for ten minutes.
-
-
-== Programmatic Operations ==
-
-Now that we know how to build URLs that refer to files and directories in a
-Tahoe virtual filesystem, what sorts of operations can we do with those URLs?
-This section contains a catalog of GET, PUT, DELETE, and POST operations that
-can be performed on these URLs. This set of operations are aimed at programs
-that use HTTP to communicate with a Tahoe node. The next section describes
-operations that are intended for web browsers.
-
-=== Reading A File ===
-
-GET /uri/$FILECAP
-GET /uri/$DIRCAP/[SUBDIRS../]FILENAME
-
- This will retrieve the contents of the given file. The HTTP response body
- will contain the sequence of bytes that make up the file.
-
- To view files in a web browser, you may want more control over the
- Content-Type and Content-Disposition headers. Please see the next section
- "Browser Operations", for details on how to modify these URLs for that
- purpose.
-
-=== Writing/Uploading A File ===
-
-PUT /uri/$FILECAP
-PUT /uri/$DIRCAP/[SUBDIRS../]FILENAME
-
- Upload a file, using the data from the HTTP request body, and add whatever
- child links and subdirectories are necessary to make the file available at
- the given location. Once this operation succeeds, a GET on the same URL will
- retrieve the same contents that were just uploaded. This will create any
- necessary intermediate subdirectories.
-
- To use the /uri/$FILECAP form, $FILECAP be a write-cap for a mutable file.
-
- In the /uri/$DIRCAP/[SUBDIRS../]FILENAME form, if the target file is a
- writable mutable file, that files contents will be overwritten in-place. If
- it is a read-cap for a mutable file, an error will occur. If it is an
- immutable file, the old file will be discarded, and a new one will be put in
- its place.
-
- When creating a new file, if "mutable=true" is in the query arguments, the
- operation will create a mutable file instead of an immutable one.
-
- This returns the file-cap of the resulting file. If a new file was created
- by this method, the HTTP response code (as dictated by rfc2616) will be set
- to 201 CREATED. If an existing file was replaced or modified, the response
- code will be 200 OK.
-
- Note that the 'curl -T localfile http://127.0.0.1:8123/uri/$DIRCAP/foo.txt'
- command can be used to invoke this operation.
-
-PUT /uri
-
- This uploads a file, and produces a file-cap for the contents, but does not
- attach the file into the virtual drive. No directories will be modified by
- this operation. The file-cap is returned as the body of the HTTP response.
-
- If "mutable=true" is in the query arguments, the operation will create a
- mutable file, and return its write-cap in the HTTP respose. The default is
- to create an immutable file, returning the read-cap as a response.
-
-=== Creating A New Directory ===
-
-POST /uri?t=mkdir
-PUT /uri?t=mkdir
-
- Create a new empty directory and return its write-cap as the HTTP response
- body. This does not make the newly created directory visible from the
- virtual drive. The "PUT" operation is provided for backwards compatibility:
- new code should use POST.
-
-POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir
-PUT /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir
-
- Create new directories as necessary to make sure that the named target
- ($DIRCAP/SUBDIRS../SUBDIR) is a directory. This will create additional
- intermediate directories as necessary. If the named target directory already
- exists, this will make no changes to it.
-
- This will return an error if a blocking file is present at any of the parent
- names, preventing the server from creating the necessary parent directory.
-
- The write-cap of the new directory will be returned as the HTTP response
- body.
-
-POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=NAME
-
- Create a new empty directory and attach it to the given existing directory.
- This will create additional intermediate directories as necessary.
-
- The URL of this form points to the parent of the bottom-most new directory,
- whereas the previous form has a URL that points directly to the bottom-most
- new directory.
-
-=== Get Information About A File Or Directory (as JSON) ===
-
-GET /uri/$FILECAP?t=json
-GET /uri/$DIRCAP?t=json
-GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json
-GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json
-
- This returns a machine-parseable JSON-encoded description of the given
- object. The JSON always contains a list, and the first element of the list
- is always a flag that indicates whether the referenced object is a file or a
- directory. If it is a file, then the information includes file size and URI,
- like this:
-
- GET /uri/$FILECAP?t=json :
- GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json :
-
- [ "filenode", { "ro_uri": file_uri,
- "size": bytes,
- "mutable": false,
- "metadata": {"ctime": 1202777696.7564139,
- "mtime": 1202777696.7564139
- }
- } ]
-
- If it is a directory, then it includes information about the children of
- this directory, as a mapping from child name to a set of data about the
- child (the same data that would appear in a corresponding GET?t=json of the
- child itself). The child entries also include metadata about each child,
- including creation- and modification- timestamps. The output looks like
- this:
-
- GET /uri/$DIRCAP?t=json :
- GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json :
-
- [ "dirnode", { "rw_uri": read_write_uri,
- "ro_uri": read_only_uri,
- "mutable": true,
- "children": {
- "foo.txt": [ "filenode", { "ro_uri": uri,
- "size": bytes,
- "metadata": {
- "ctime": 1202777696.7564139,
- "mtime": 1202777696.7564139
- }
- } ],
- "subdir": [ "dirnode", { "rw_uri": rwuri,
- "ro_uri": rouri,
- "metadata": {
- "ctime": 1202778102.7589991,
- "mtime": 1202778111.2160511,
- }
- } ]
- } } ]
-
- In the above example, note how 'children' is a dictionary in which the keys
- are child names and the values depend upon whether the child is a file or a
- directory. The value is mostly the same as the JSON representation of the
- child object (except that directories do not recurse -- the "children"
- entry of the child is omitted, and the directory view includes the metadata
- that is stored on the directory edge).
-
- Then the rw_uri field will be present in the information about a directory
- if and only if you have read-write access to that directory,
-
-
-=== Attaching an existing File or Directory by its read- or write- cap ===
-
-PUT /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri
-
- This attaches a child object (either a file or directory) to a specified
- location in the virtual filesystem. The child object is referenced by its
- read- or write- cap, as provided in the HTTP request body. This will create
- intermediate directories as necessary.
-
- This is similar to a UNIX hardlink: by referencing a previously-uploaded
- file (or previously-created directory) instead of uploading/creating a new
- one, you can create two references to the same object.
-
- The read- or write- cap of the child is provided in the body of the HTTP
- request, and this same cap is returned in the response body.
-
- The default behavior is to overwrite any existing object at the same
- location. To prevent this (and make the operation return an error instead of
- overwriting), add a "replace=false" argument, as "?t=uri&replace=false".
- With replace=false, this operation will return an HTTP 409 "Conflict" error
- if there is already an object at the given location, rather than overwriting
- the existing object. Note that "true", "t", and "1" are all synonyms for
- "True", and "false", "f", and "0" are synonyms for "False". the parameter is
- case-insensitive.
-
-=== Deleting a File or Directory ===
-
-DELETE /uri/$DIRCAP/[SUBDIRS../]CHILDNAME
-
- This removes the given name from its parent directory. CHILDNAME is the
- name to be removed, and $DIRCAP/SUBDIRS.. indicates the directory that will
- be modified.
-
- Note that this does not actually delete the file or directory that the name
- points to from the tahoe grid -- it only removes the named reference from
- this directory. If there are other names in this directory or in other
- directories that point to the resource, then it will remain accessible
- through those paths. Even if all names pointing to this object are removed
- from their parent directories, then someone with possession of its read-cap
- can continue to access the object through that cap.
-
- The object will only become completely unreachable once 1: there are no
- reachable directories that reference it, and 2: nobody is holding a read-
- or write- cap to the object. (This behavior is very similar to the way
- hardlinks and anonymous files work in traditional unix filesystems).
-
- This operation will not modify more than a single directory. Intermediate
- directories which were implicitly created by PUT or POST methods will *not*
- be automatically removed by DELETE.
-
- This method returns the file- or directory- cap of the object that was just
- removed.
-
-== Browser Operations ==
-
-This section describes the HTTP operations that provide support for humans
-running a web browser. Most of these operations use HTML forms that use POST
-to drive the Tahoe node.
-
-Note that for all POST operations, the arguments listed can be provided
-either as URL query arguments or as form body fields. URL query arguments are
-separated from the main URL by "?", and from each other by "&". For example,
-"POST /uri/$DIRCAP?t=upload&mutable=true". Form body fields are usually
-specified by using <input type="hidden"> elements. For clarity, the
-descriptions below display the most significant arguments as URL query args.
-
-=== Viewing A Directory (as HTML) ===
-
-GET /uri/$DIRCAP/[SUBDIRS../]
-
- This returns an HTML page, intended to be displayed to a human by a web
- browser, which contains HREF links to all files and directories reachable
- from this directory. These HREF links do not have a t= argument, meaning
- that a human who follows them will get pages also meant for a human. It also
- contains forms to upload new files, and to delete files and directories.
- Those forms use POST methods to do their job.
-
-=== Viewing/Downloading a File ===
-
-GET /uri/$FILECAP
-GET /uri/$DIRCAP/[SUBDIRS../]FILENAME
-
- This will retrieve the contents of the given file. The HTTP response body
- will contain the sequence of bytes that make up the file.
-
- If you want the HTTP response to include a useful Content-Type header,
- either use the second form (which starts with a $DIRCAP), or add a
- "filename=foo" query argument, like "GET /uri/$FILECAP?filename=foo.jpg".
- The bare "GET /uri/$FILECAP" does not give the Tahoe node enough information
- to determine a Content-Type (since Tahoe immutable files are merely
- sequences of bytes, not typed+named file objects).
-
- If the URL has both filename= and "save=true" in the query arguments, then
- the server to add a "Content-Disposition: attachment" header, along with a
- filename= parameter. When a user clicks on such a link, most browsers will
- offer to let the user save the file instead of displaying it inline (indeed,
- most browsers will refuse to display it inline). "true", "t", "1", and other
- case-insensitive equivalents are all treated the same.
-
- Character-set handling in URLs and HTTP headers is a dubious art[1]. For
- maximum compatibility, Tahoe simply copies the bytes from the filename=
- argument into the Content-Disposition header's filename= parameter, without
- trying to interpret them in any particular way.
-
-
-GET /named/$FILECAP/FILENAME
-
- This is an alternate download form which makes it easier to get the correct
- filename. The Tahoe server will provide the contents of the given file, with
- a Content-Type header derived from the given filename. This form is used to
- get browsers to use the "Save Link As" feature correctly, and also helps
- command-line tools like "wget" and "curl" use the right filename. Note that
- this form can *only* be used with file caps; it is an error to use a
- directory cap after the /named/ prefix.
-
-=== Get Information About A File Or Directory (as HTML) ===
-
-GET /uri/$FILECAP?t=info
-GET /uri/$DIRCAP/?t=info
-GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR/?t=info
-GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=info
-
- This returns a human-oriented HTML page with more detail about the selected
- file or directory object. This page contains the following items:
-
- object size
- storage index
- JSON representation
- raw contents (text/plain)
- access caps (URIs): verify-cap, read-cap, write-cap (for mutable objects)
- check/verify/repair form
- deep-check/deep-size/deep-stats/manifest (for directories)
- replace-conents form (for mutable files)
-
-=== Creating a Directory ===
-
-POST /uri?t=mkdir
-
- This creates a new directory, but does not attach it to the virtual
- filesystem.
-
- If a "redirect_to_result=true" argument is provided, then the HTTP response
- will cause the web browser to be redirected to a /uri/$DIRCAP page that
- gives access to the newly-created directory. If you bookmark this page,
- you'll be able to get back to the directory again in the future. This is the
- recommended way to start working with a Tahoe server: create a new unlinked
- directory (using redirect_to_result=true), then bookmark the resulting
- /uri/$DIRCAP page. There is a "Create Directory" button on the Welcome page
- to invoke this action.
-
- If "redirect_to_result=true" is not provided (or is given a value of
- "false"), then the HTTP response body will simply be the write-cap of the
- new directory.
-
-POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=CHILDNAME
-
- This creates a new directory as a child of the designated SUBDIR. This will
- create additional intermediate directories as necessary.
-
- If a "when_done=URL" argument is provided, the HTTP response will cause the
- web browser to redirect to the given URL. This provides a convenient way to
- return the browser to the directory that was just modified. Without a
- when_done= argument, the HTTP response will simply contain the write-cap of
- the directory that was just created.
-
-
-=== Uploading a File ===
-
-POST /uri?t=upload
-
- This uploads a file, and produces a file-cap for the contents, but does not
- attach the file into the virtual drive. No directories will be modified by
- this operation.
-
- The file must be provided as the "file" field of an HTML encoded form body,
- produced in response to an HTML form like this:
- <form action="/uri" method="POST" enctype="multipart/form-data">
- <input type="hidden" name="t" value="upload" />
- <input type="file" name="file" />
- <input type="submit" value="Upload Unlinked" />
- </form>
-
- If a "when_done=URL" argument is provided, the response body will cause the
- browser to redirect to the given URL. If the when_done= URL has the string
- "%(uri)s" in it, that string will be replaced by a URL-escaped form of the
- newly created file-cap. (Note that without this substitution, there is no
- way to access the file that was just uploaded).
-
- The default (in the absence of when_done=) is to return an HTML page that
- describes the results of the upload. This page will contain information
- about which storage servers were used for the upload, how long each
- operation took, etc.
-
- If a "mutable=true" argument is provided, the operation will create a
- mutable file, and the response body will contain the write-cap instead of
- the upload results page. The default is to create an immutable file,
- returning the upload results page as a response.
-
-
-POST /uri/$DIRCAP/[SUBDIRS../]?t=upload
-
- This uploads a file, and attaches it as a new child of the given directory.
- The file must be provided as the "file" field of an HTML encoded form body,
- produced in response to an HTML form like this:
- <form action="." method="POST" enctype="multipart/form-data">
- <input type="hidden" name="t" value="upload" />
- <input type="file" name="file" />
- <input type="submit" value="Upload" />
- </form>
-
- A "name=" argument can be provided to specify the new child's name,
- otherwise it will be taken from the "filename" field of the upload form
- (most web browsers will copy the last component of the original file's
- pathname into this field). To avoid confusion, name= is not allowed to
- contain a slash.
-
- If there is already a child with that name, and it is a mutable file, then
- its contents are replaced with the data being uploaded. If it is not a
- mutable file, the default behavior is to remove the existing child before
- creating a new one. To prevent this (and make the operation return an error
- instead of overwriting the old child), add a "replace=false" argument, as
- "?t=upload&replace=false". With replace=false, this operation will return an
- HTTP 409 "Conflict" error if there is already an object at the given
- location, rather than overwriting the existing object. Note that "true",
- "t", and "1" are all synonyms for "True", and "false", "f", and "0" are
- synonyms for "False". the parameter is case-insensitive.
-
- This will create additional intermediate directories as necessary, although
- since it is expected to be triggered by a form that was retrieved by "GET
- /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will
- already exist.
-
- If a "mutable=true" argument is provided, any new file that is created will
- be a mutable file instead of an immutable one. <input type="checkbox"
- name="mutable" /> will give the user a way to set this option.
-
- If a "when_done=URL" argument is provided, the HTTP response will cause the
- web browser to redirect to the given URL. This provides a convenient way to
- return the browser to the directory that was just modified. Without a
- when_done= argument, the HTTP response will simply contain the file-cap of
- the file that was just uploaded (a write-cap for mutable files, or a
- read-cap for immutable files).
-
-POST /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=upload
-
- This also uploads a file and attaches it as a new child of the given
- directory. It is a slight variant of the previous operation, as the URL
- refers to the target file rather than the parent directory. It is otherwise
- identical: this accepts mutable= and when_done= arguments too.
-
-POST /uri/$FILECAP?t=upload
-
-=== Attaching An Existing File Or Directory (by URI) ===
-
-POST /uri/$DIRCAP/[SUBDIRS../]?t=uri&name=CHILDNAME&uri=CHILDCAP
-
- This attaches a given read- or write- cap "CHILDCAP" to the designated
- directory, with a specified child name. This behaves much like the PUT t=uri
- operation, and is a lot like a UNIX hardlink.
-
- This will create additional intermediate directories as necessary, although
- since it is expected to be triggered by a form that was retrieved by "GET
- /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will
- already exist.
-
-=== Deleting A Child ===
-
-POST /uri/$DIRCAP/[SUBDIRS../]?t=delete&name=CHILDNAME
-
- This instructs the node to delete a child object (file or subdirectory) from
- the given directory. Note that the entire subtree is removed. This is
- somewhat like "rm -rf" (from the point of view of the parent), but other
- references into the subtree will see that the child subdirectories are not
- modified by this operation. Only the link from the given directory to its
- child is severed.
-
-=== Renaming A Child ===
-
-POST /uri/$DIRCAP/[SUBDIRS../]?t=rename&from_name=OLD&to_name=NEW
-
- This instructs the node to rename a child of the given directory. This is
- exactly the same as removing the child, then adding the same child-cap under
- the new name. This operation cannot move the child to a different directory.
-
- This operation will replace any existing child of the new name, making it
- behave like the UNIX "mv -f" command.
-
-=== Other Utilities ===
-
-GET /uri?uri=$CAP
-
- This causes a redirect to /uri/$CAP, and retains any additional query
- arguments (like filename= or save=). This is for the convenience of web
- forms which allow the user to paste in a read- or write- cap (obtained
- through some out-of-band channel, like IM or email).
-
- Note that this form merely redirects to the specific file or directory
- indicated by the $CAP: unlike the GET /uri/$DIRCAP form, you cannot
- traverse to children by appending additional path segments to the URL.
-
-GET /uri/$DIRCAP/[SUBDIRS../]?t=rename-form&name=$CHILDNAME
-
- This provides a useful facility to browser-based user interfaces. It
- returns a page containing a form targetting the "POST $DIRCAP t=rename"
- functionality described above, with the provided $CHILDNAME present in the
- 'from_name' field of that form. I.e. this presents a form offering to
- rename $CHILDNAME, requesting the new name, and submitting POST rename.
-
-GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri
-
- This returns the file- or directory- cap for the specified object.
-
-GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=readonly-uri
-
- This returns a read-only file- or directory- cap for the specified object.
- If the object is an immutable file, this will return the same value as
- t=uri.
-
-=== Debugging and Testing Features ===
-
-These URLs are less-likely to be helpful to the casual Tahoe user, and are
-mainly intended for developers.
-
-POST $URL?t=check
-
- This triggers the FileChecker to determine the current "health" of the
- given file or directory, by counting how many shares are available. The
- page that is returned will display the results. This can be used as a "show
- me detailed information about this file" page.
-
- If a verify=true argument is provided, the node will perform a more
- intensive check, downloading and verifying every single bit of every share.
-
- If an output=JSON argument is provided, the response will be
- machine-readable JSON instead of human-oriented HTML. The data is a
- dictionary with the following keys:
-
- storage-index: a base32-encoded string with the objects's storage index,
- or an empty string for LIT files
- results: a dictionary that describes the state of the file. For LIT files,
- this dictionary has only the 'healthy' key, which will always be
- True. For distributed files, this dictionary has the following
- keys:
- count-shares-good: the number of good shares that were found
- count-shares-needed: 'k', the number of shares required for recovery
- count-shares-expected: 'N', the number of total shares generated
- count-good-share-hosts: the number of distinct storage servers with
- good shares. If this number is less than
- count-shares-good, then some shares are doubled
- up, increasing the correlation of failures. This
- indicates that one or more shares should be
- moved to an otherwise unused server, if one is
- available.
- count-wrong-shares: for mutable files, the number of shares for
- versions other than the 'best' one (highest
- sequence number, highest roothash). These are
- either old ...
- count-recoverable-versions: for mutable files, the number of
- recoverable versions of the file. For
- a healthy file, this will equal 1.
- count-unrecoverable-versions: for mutable files, the number of
- unrecoverable versions of the file.
- For a healthy file, this will be 0.
- count-corrupt-shares: the number of shares with integrity failures
- list-corrupt-shares: a list of "share locators", one for each share
- that was found to be corrupt. Each share locator
- is a list of (serverid, storage_index, sharenum).
- needs-rebalancing: (bool) True if there are multiple shares on a single
- storage server, indicating a reduction in reliability
- that could be resolved by moving shares to new
- servers.
- servers-responding: list of base32-encoded storage server identifiers,
- one for each server which responded to the share
- query.
- healthy: (bool) True if the file is completely healthy, False otherwise.
- Healthy files have at least N good shares. Overlapping shares
- (indicated by count-good-share-hosts < count-shares-good) do not
- currently cause a file to be marked unhealthy. If there are at
- least N good shares, then corrupt shares do not cause the file to
- be marked unhealthy, although the corrupt shares will be listed
- in the results (list-corrupt-shares) and should be manually
- removed to wasting time in subsequent downloads (as the
- downloader rediscovers the corruption and uses alternate shares).
- sharemap: dict mapping share identifier to list of serverids
- (base32-encoded strings). This indicates which servers are
- holding which shares. For immutable files, the shareid is
- an integer (the share number, from 0 to N-1). For
- immutable files, it is a string of the form
- 'seq%d-%s-sh%d', containing the sequence number, the
- roothash, and the share number.
-
-POST $URL?t=start-deep-check (must add &ophandle=XYZ)
-
- This initiates a recursive walk of all files and directories reachable from
- the target, performing a check on each one just like t=check. The result
- page will contain a summary of the results, including details on any
- file/directory that was not fully healthy.
-
- t=start-deep-check can only be invoked on a directory. An error (400
- BAD_REQUEST) will be signalled if it is invoked on a file. The recursive
- walker will deal with loops safely.
-
- This accepts the same verify= argument as t=check.
-
- Since this operation can take a long time (perhaps a second per object),
- the ophandle= argument is required (see "Slow Operations, Progress, and
- Cancelling" above). The response to this POST will be a redirect to the
- corresponding /operations/$HANDLE page (with output=HTML or output=JSON to
- match the output= argument given to the POST). The deep-check operation
- will continue to run in the background, and the /operations page should be
- used to find out when the operation is done.
-
- Detailed checker results for non-healthy files and directories will be
- available under /operations/$HANDLE/$STORAGEINDEX, and the HTML status will
- contain links to these detailed results.
-
- The HTML /operations/$HANDLE page for incomplete operations will contain a
- meta-refresh tag, set to 60 seconds, so that a browser which uses
- deep-check will automatically poll until the operation has completed.
-
- The JSON page (/options/$HANDLE?output=JSON) will contain a
- machine-readable JSON dictionary with the following keys:
-
- finished: a boolean, True if the operation is complete, else False. Some
- of the remaining keys may not be present until the operation
- is complete.
- root-storage-index: a base32-encoded string with the storage index of the
- starting point of the deep-check operation
- count-objects-checked: count of how many objects were checked. Note that
- non-distributed objects (i.e. small immutable LIT
- files) are not checked, since for these objects,
- the data is contained entirely in the URI.
- count-objects-healthy: how many of those objects were completely healthy
- count-objects-unhealthy: how many were damaged in some way
- count-corrupt-shares: how many shares were found to have corruption,
- summed over all objects examined
- list-corrupt-shares: a list of "share identifiers", one for each share
- that was found to be corrupt. Each share identifier
- is a list of (serverid, storage_index, sharenum).
- list-unhealthy-files: a list of (pathname, check-results) tuples, for
- each file that was not fully healthy. 'pathname' is
- a list of strings (which can be joined by "/"
- characters to turn it into a single string),
- relative to the directory on which deep-check was
- invoked. The 'check-results' field is the same as
- that returned by t=check&output=JSON, described
- above.
- stats: a dictionary with the same keys as the t=deep-stats command
- (described below)
-
-POST $URL?t=check&repair=true
-
- This performs a health check of the given file or directory, and if the
- checker determines that the object is not healthy (some shares are missing
- or corrupted), it will perform a "repair". During repair, any missing
- shares will be regenerated and uploaded to new servers.
-
- This accepts the same verify=true argument as t=check. When an output=JSON
- argument is provided, the machine-readable JSON response will contain the
- following keys:
-
- storage-index: a base32-encoded string with the objects's storage index,
- or an empty string for LIT files
- repair-attempted: (bool) True if repair was attempted
- repair-successful: (bool) True if repair was attempted and the file was
- fully healthy afterwards. False if no repair was
- attempted, or if a repair attempt failed.
- pre-repair-results: a dictionary that describes the state of the file
- before any repair was performed. This contains exactly
- the same keys as the 'results' value of the t=check
- response, described above.
- post-repair-results: a dictionary that describes the state of the file
- after any repair was performed. If no repair was
- performed, post-repair-results and pre-repair-results
- will be the same. This contains exactly the same keys
- as the 'results' value of the t=check response,
- described above.
-
-POST $URL?t=start-deep-check&repair=true (must add &ophandle=XYZ)
-
- This triggers a recursive walk of all files and directories, performing a
- t=check&repair=true on each one.
-
- Like t=start-deep-check without the repair= argument, this can only be
- invoked on a directory. An error (400 BAD_REQUEST) will be signalled if it
- is invoked on a file. The recursive walker will deal with loops safely.
-
- This accepts the same verify=true argument as t=start-deep-check. It uses
- the same ophandle= mechanism as start-deep-check. When an output=JSON
- argument is provided, the response will contain the following keys:
-
- finished: (bool) True if the operation has completed, else False
- root-storage-index: a base32-encoded string with the storage index of the
- starting point of the deep-check operation
- count-objects-checked: count of how many objects were checked
-
- count-objects-healthy-pre-repair: how many of those objects were completely
- healthy, before any repair
- count-objects-unhealthy-pre-repair: how many were damaged in some way
- count-objects-healthy-post-repair: how many of those objects were completely
- healthy, after any repair
- count-objects-unhealthy-post-repair: how many were damaged in some way
-
- count-repairs-attempted: repairs were attempted on this many objects.
- count-repairs-successful: how many repairs resulted in healthy objects
- count-repairs-unsuccessful: how many repairs resulted did not results in
- completely healthy objects
- count-corrupt-shares-pre-repair: how many shares were found to have
- corruption, summed over all objects
- examined, before any repair
- count-corrupt-shares-post-repair: how many shares were found to have
- corruption, summed over all objects
- examined, after any repair
- list-corrupt-shares: a list of "share identifiers", one for each share
- that was found to be corrupt (before any repair).
- Each share identifier is a list of (serverid,
- storage_index, sharenum).
- list-remaining-corrupt-shares: like list-corrupt-shares, but mutable shares
- that were successfully repaired are not
- included. These are shares that need
- manual processing. Since immutable shares
- cannot be modified by clients, all corruption
- in immutable shares will be listed here.
- list-unhealthy-files: a list of (pathname, check-results) tuples, for
- each file that was not fully healthy. 'pathname' is
- relative to the directory on which deep-check was
- invoked. The 'check-results' field is the same as
- that returned by t=check&repair=true&output=JSON,
- described above.
- stats: a dictionary with the same keys as the t=deep-stats command
- (described below)
-
-POST $DIRURL?t=start-manifest (must add &ophandle=XYZ)
-
- This operation generates a "manfest" of the given directory tree, mostly
- for debugging. This is a table of (path, filecap/dircap), for every object
- reachable from the starting directory. The path will be slash-joined, and
- the filecap/dircap will contain a link to the object in question. This page
- gives immediate access to every object in the virtual filesystem subtree.
-
- This operation uses the same ophandle= mechanism as deep-check. The
- corresponding /operations/$HANDLE page has three different forms. The
- default is output=HTML.
-
- If output=text is added to the query args, the results will be a text/plain
- list. The first line is special: it is either "finished: yes" or "finished:
- no"; if the operation is not finished, you must periodically reload the
- page until it completes. The rest of the results are a plaintext list, with
- one file/dir per line, slash-separated, with the filecap/dircap separated
- by a space.
-
- If output=JSON is added to the queryargs, then the results will be a
- JSON-formatted dictionary with three keys:
-
- finished (bool): if False then you must reload the page until True
- origin_si (str): the storage index of the starting point
- manifest: list of (path, cap) tuples, where path is a list of strings.
-
-POST $DIRURL?t=start-deep-size (must add &ophandle=XYZ)
-
- This operation generates a number (in bytes) containing the sum of the
- filesize of all directories and immutable files reachable from the given
- directory. This is a rough lower bound of the total space consumed by this
- subtree. It does not include space consumed by mutable files, nor does it
- take expansion or encoding overhead into account. Later versions of the
- code may improve this estimate upwards.
-
- The /operations/$HANDLE status output consists of two lines of text:
-
- finished: yes
- size: 1234
-
-POST $DIRURL?t=start-deep-stats (must add &ophandle=XYZ)
-
- This operation performs a recursive walk of all files and directories
- reachable from the given directory, and generates a collection of
- statistics about those objects.
-
- The result (obtained from the /operations/$OPHANDLE page) is a
- JSON-serialized dictionary with the following keys (note that some of these
- keys may be missing until 'finished' is True):
-
- finished: (bool) True if the operation has finished, else False
- count-immutable-files: count of how many CHK files are in the set
- count-mutable-files: same, for mutable files (does not include directories)
- count-literal-files: same, for LIT files (data contained inside the URI)
- count-files: sum of the above three
- count-directories: count of directories
- size-immutable-files: total bytes for all CHK files in the set, =deep-size
- size-mutable-files (TODO): same, for current version of all mutable files
- size-literal-files: same, for LIT files
- size-directories: size of directories (includes size-literal-files)
- size-files-histogram: list of (minsize, maxsize, count) buckets,
- with a histogram of filesizes, 5dB/bucket,
- for both literal and immutable files
- largest-directory: number of children in the largest directory
- largest-immutable-file: number of bytes in the largest CHK file
-
- size-mutable-files is not implemented, because it would require extra
- queries to each mutable file to get their size. This may be implemented in
- the future.
-
- Assuming no sharing, the basic space consumed by a single root directory is
- the sum of size-immutable-files, size-mutable-files, and size-directories.
- The actual disk space used by the shares is larger, because of the
- following sources of overhead:
-
- integrity data
- expansion due to erasure coding
- share management data (leases)
- backend (ext3) minimum block size
-
-== Other Useful Pages ==
-
-The portion of the web namespace that begins with "/uri" (and "/named") is
-dedicated to giving users (both humans and programs) access to the Tahoe
-virtual filesystem. The rest of the namespace provides status information
-about the state of the Tahoe node.
-
-GET / (the root page)
-
-This is the "Welcome Page", and contains a few distinct sections:
-
- Node information: library versions, local nodeid, services being provided.
-
- Filesystem Access Forms: create a new directory, view a file/directory by
- URI, upload a file (unlinked), download a file by
- URI.
-
- Grid Status: introducer information, helper information, connected storage
- servers.
-
-GET /status/
-
- This page lists all active uploads and downloads, and contains a short list
- of recent upload/download operations. Each operation has a link to a page
- that describes file sizes, servers that were involved, and the time consumed
- in each phase of the operation.
-
- A GET of /status/?t=json will contain a machine-readable subset of the same
- data. It returns a JSON-encoded dictionary. The only key defined at this
- time is "active", with a value that is a list of operation dictionaries, one
- for each active operation. Once an operation is completed, it will no longer
- appear in data["active"] .
-
- Each op-dict contains a "type" key, one of "upload", "download",
- "mapupdate", "publish", or "retrieve" (the first two are for immutable
- files, while the latter three are for mutable files and directories).
-
- The "upload" op-dict will contain the following keys:
-
- type (string): "upload"
- storage-index-string (string): a base32-encoded storage index
- total-size (int): total size of the file
- status (string): current status of the operation
- progress-hash (float): 1.0 when the file has been hashed
- progress-ciphertext (float): 1.0 when the file has been encrypted.
- progress-encode-push (float): 1.0 when the file has been encoded and
- pushed to the storage servers. For helper
- uploads, the ciphertext value climbs to 1.0
- first, then encoding starts. For unassisted
- uploads, ciphertext and encode-push progress
- will climb at the same pace.
-
- The "download" op-dict will contain the following keys:
-
- type (string): "download"
- storage-index-string (string): a base32-encoded storage index
- total-size (int): total size of the file
- status (string): current status of the operation
- progress (float): 1.0 when the file has been fully downloaded
-
- Front-ends which want to report progress information are advised to simply
- average together all the progress-* indicators. A slightly more accurate
- value can be found by ignoring the progress-hash value (since the current
- implementation hashes synchronously, so clients will probably never see
- progress-hash!=1.0).
-
-GET /provisioning/
-
- This page provides a basic tool to predict the likely storage and bandwidth
- requirements of a large Tahoe grid. It provides forms to input things like
- total number of users, number of files per user, average file size, number
- of servers, expansion ratio, hard drive failure rate, etc. It then provides
- numbers like how many disks per server will be needed, how many read
- operations per second should be expected, and the likely MTBF for files in
- the grid. This information is very preliminary, and the model upon which it
- is based still needs a lot of work.
-
-GET /helper_status/
-
- If the node is running a helper (i.e. if "$BASEDIR/run_helper" is
- non-empty), then this page will provide a list of all the helper operations
- currently in progress. If "?t=json" is added to the URL, it will return a
- JSON-formatted list of helper statistics, which can then be used to produce
- graphs to indicate how busy the helper is.
-
-GET /statistics/
-
- This page provides "node statistics", which are collected from a variety of
- sources.
-
- load_monitor: every second, the node schedules a timer for one second in
- the future, then measures how late the subsequent callback
- is. The "load_average" is this tardiness, measured in
- seconds, averaged over the last minute. It is an indication
- of a busy node, one which is doing more work than can be
- completed in a timely fashion. The "max_load" value is the
- highest value that has been seen in the last 60 seconds.
-
- cpu_monitor: every minute, the node uses time.clock() to measure how much
- CPU time it has used, and it uses this value to produce
- 1min/5min/15min moving averages. These values range from 0%
- (0.0) to 100% (1.0), and indicate what fraction of the CPU
- has been used by the Tahoe node. Not all operating systems
- provide meaningful data to time.clock(): they may report 100%
- CPU usage at all times.
-
- uploader: this counts how many immutable files (and bytes) have been
- uploaded since the node was started
-
- downloader: this counts how many immutable files have been downloaded
- since the node was started
-
- publishes: this counts how many mutable files (including directories) have
- been modified since the node was started
-
- retrieves: this counts how many mutable files (including directories) have
- been read since the node was started
-
- There are other statistics that are tracked by the node. The "raw stats"
- section shows a formatted dump of all of them.
-
- By adding "?t=json" to the URL, the node will return a JSON-formatted
- dictionary of stats values, which can be used by other tools to produce
- graphs of node behavior. The misc/munin/ directory in the source
- distribution provides some tools to produce these graphs.
-
-GET / (introducer status)
-
- For Introducer nodes, the welcome page displays information about both
- clients and servers which are connected to the introducer. Servers make
- "service announcements", and these are listed in a table. Clients will
- subscribe to hear about service announcements, and these subscriptions are
- listed in a separate table. Both tables contain information about what
- version of Tahoe is being run by the remote node, their advertised and
- outbound IP addresses, their nodeid and nickname, and how long they have
- been available.
-
- By adding "?t=json" to the URL, the node will return a JSON-formatted
- dictionary of stats values, which can be used to produce graphs of connected
- clients over time.
-
-
-== Static Files in /public_html ==
-
-The webapi server will take any request for a URL that starts with /static
-and serve it from a configurable directory which defaults to
-$BASEDIR/public_html . This is configured by setting the "[node]web.static"
-value in $BASEDIR/tahoe.cfg . If this is left at the default value of
-"public_html", then http://localhost:8123/static/subdir/foo.html will be
-served with the contents of the file $BASEDIR/public_html/subdir/foo.html .
-
-This can be useful to serve a javascript application which provides a
-prettier front-end to the rest of the Tahoe webapi.
-
-
-== safety and security issues -- names vs. URIs ==
-
-Summary: use explicit file- and dir- caps whenever possible, to reduce the
-potential for surprises when the virtual drive is changed while you aren't
-looking.
-
-The vdrive provides a mutable filesystem, but the ways that the filesystem
-can change are limited. The only thing that can change is that the mapping
-from child names to child objects that each directory contains can be changed
-by adding a new child name pointing to an object, removing an existing child
-name, or changing an existing child name to point to a different object.
-
-Obviously if you query tahoe for information about the filesystem and then
-act upon the filesystem (such as by getting a listing of the contents of a
-directory and then adding a file to the directory), then the filesystem might
-have been changed after you queried it and before you acted upon it.
-However, if you use the URI instead of the pathname of an object when you act
-upon the object, then the only change that can happen is when the object is a
-directory then the set of child names it has might be different. If, on the
-other hand, you act upon the object using its pathname, then a different
-object might be in that place, which can result in more kinds of surprises.
-
-For example, suppose you are writing code which recursively downloads the
-contents of a directory. The first thing your code does is fetch the listing
-of the contents of the directory. For each child that it fetched, if that
-child is a file then it downloads the file, and if that child is a directory
-then it recurses into that directory. Now, if the download and the recurse
-actions are performed using the child's name, then the results might be
-wrong, because for example a child name that pointed to a sub-directory when
-you listed the directory might have been changed to point to a file (in which
-case your attempt to recurse into it would result in an error and the file
-would be skipped), or a child name that pointed to a file when you listed the
-directory might now point to a sub-directory (in which case your attempt to
-download the child would result in a file containing HTML text describing the
-sub-directory!).
-
-If your recursive algorithm uses the uri of the child instead of the name of
-the child, then those kinds of mistakes just can't happen. Note that both the
-child's name and the child's URI are included in the results of listing the
-parent directory, so it isn't any harder to use the URI for this purpose.
-
-In general, use names if you want "whatever object (whether file or
-directory) is found by following this name (or sequence of names) when my
-request reaches the server". Use URIs if you want "this particular object".
-
-== Concurrency Issues ==
-
-Tahoe uses both mutable and immutable files. Mutable files can be created
-explicitly by doing an upload with ?mutable=true added, or implicitly by
-creating a new directory (since a directory is just a special way to
-interpret a given mutable file).
-
-Mutable files suffer from the same consistency-vs-availability tradeoff that
-all distributed data storage systems face. It is not possible to
-simultaneously achieve perfect consistency and perfect availability in the
-face of network partitions (servers being unreachable or faulty).
-
-Tahoe tries to achieve a reasonable compromise, but there is a basic rule in
-place, known as the Prime Coordination Directive: "Don't Do That". What this
-means is that if write-access to a mutable file is available to several
-parties, then those parties are responsible for coordinating their activities
-to avoid multiple simultaneous updates. This could be achieved by having
-these parties talk to each other and using some sort of locking mechanism, or
-by serializing all changes through a single writer.
-
-The consequences of performing uncoordinated writes can vary. Some of the
-writers may lose their changes, as somebody else wins the race condition. In
-many cases the file will be left in an "unhealthy" state, meaning that there
-are not as many redundant shares as we would like (reducing the reliability
-of the file against server failures). In the worst case, the file can be left
-in such an unhealthy state that no version is recoverable, even the old ones.
-It is this small possibility of data loss that prompts us to issue the Prime
-Coordination Directive.
-
-Tahoe nodes implement internal serialization to make sure that a single Tahoe
-node cannot conflict with itself. For example, it is safe to issue two
-directory modification requests to a single tahoe node's webapi server at the
-same time, because the Tahoe node will internally delay one of them until
-after the other has finished being applied. (This feature was introduced in
-Tahoe-1.1; back with Tahoe-1.0 the web client was responsible for serializing
-web requests themselves).
-
-For more details, please see the "Consistency vs Availability" and "The Prime
-Coordination Directive" sections of mutable.txt, in the same directory as
-this file.
-
-
-[1]: URLs and HTTP and UTF-8, Oh My
-
- HTTP does not provide a mechanism to specify the character set used to
- encode non-ascii names in URLs (rfc2396#2.1). We prefer the convention that
- the filename= argument shall be a URL-encoded UTF-8 encoded unicode object.
- For example, suppose we want to provoke the server into using a filename of
- "f i a n c e-acute e" (i.e. F I A N C U+00E9 E). The UTF-8 encoding of this
- is 0x66 0x69 0x61 0x6e 0x63 0xc3 0xa9 0x65 (or "fianc\xC3\xA9e", as python's
- repr() function would show). To encode this into a URL, the non-printable
- characters must be escaped with the urlencode '%XX' mechansim, giving us
- "fianc%C3%A9e". Thus, the first line of the HTTP request will be "GET
- /uri/CAP...?save=true&filename=fianc%C3%A9e HTTP/1.1". Not all browsers
- provide this: IE7 uses the Latin-1 encoding, which is fianc%E9e.
-
- The response header will need to indicate a non-ASCII filename. The actual
- mechanism to do this is not clear. For ASCII filenames, the response header
- would look like:
-
- Content-Disposition: attachment; filename="english.txt"
-
- If Tahoe were to enforce the utf-8 convention, it would need to decode the
- URL argument into a unicode string, and then encode it back into a sequence
- of bytes when creating the response header. One possibility would be to use
- unencoded utf-8. Developers suggest that IE7 might accept this:
-
- #1: Content-Disposition: attachment; filename="fianc\xC3\xA9e"
- (note, the last four bytes of that line, not including the newline, are
- 0xC3 0xA9 0x65 0x22)
-
- RFC2231#4 (dated 1997): suggests that the following might work, and some
- developers (http://markmail.org/message/dsjyokgl7hv64ig3) have reported that
- it is supported by firefox (but not IE7):
-
- #2: Content-Disposition: attachment; filename*=utf-8''fianc%C3%A9e
-
- My reading of RFC2616#19.5.1 (which defines Content-Disposition) says that
- the filename= parameter is defined to be wrapped in quotes (presumeably to
- allow spaces without breaking the parsing of subsequent parameters), which
- would give us:
-
- #3: Content-Disposition: attachment; filename*=utf-8''"fianc%C3%A9e"
-
- However this is contrary to the examples in the email thread listed above.
-
- Developers report that IE7 (when it is configured for UTF-8 URL encoding,
- which is not the default in asian countries), will accept:
-
- #4: Content-Disposition: attachment; filename=fianc%C3%A9e
-
- However, for maximum compatibility, Tahoe simply copies bytes from the URL
- into the response header, rather than enforcing the utf-8 convention. This
- means it does not try to decode the filename from the URL argument, nor does
- it encode the filename into the response header.