2 = The Tahoe REST-ful Web API =
4 1. Enabling the web-API port
5 2. Basic Concepts: GET, PUT, DELETE, POST
6 3. URLs, Machine-Oriented Interfaces
7 4. Browser Operations: Human-Oriented Interfaces
8 5. Welcome / Debug / Status pages
9 6. Static Files in /public_html
10 7. Safety and security issues -- names vs. URIs
14 == Enabling the web-API port ==
16 Every Tahoe node is capable of running a built-in HTTP server. To enable
17 this, just write a port number into a file named "webport" in the node's base
18 directory. For example, writing "8123" into $NODEDIR/webport will cause the
19 node to run a webserver on port 8123.
21 This string is actually a Twisted "strports" specification, meaning you can
22 get more control over the interface to which the server binds by supplying
23 additional arguments. For more details, see the documentation on
24 twisted.application.strports:
25 http://twistedmatrix.com/documents/current/api/twisted.application.strports.html
27 Writing "tcp:8123:interface=127.0.0.1" into $NODEDIR/webport does the same
28 but binds to the loopback interface, ensuring that only the programs on the
29 local host can connect. Using
30 "ssl:8123:privateKey=mykey.pem:certKey=cert.pem" runs an SSL server.
32 This webport can be set when the node is created by passing a --webport
33 option to the 'tahoe create-client' command. By default, the node listens on
34 port 8123, on the loopback (127.0.0.1) interface.
38 As described in architecture.txt, each file and directory in a Tahoe virtual
39 filesystem is referenced by an identifier that combines the designation of
40 the object with the authority to do something with it (such as read or modify
41 the contents). This identifier is called a "read-cap" or "write-cap",
42 depending upon whether it enables read-only or read-write access. These
43 "caps" are also referred to as URIs.
45 The Tahoe web-based API is "REST-ful", meaning it implements the concepts of
46 "REpresentational State Transfer": the original scheme by which the World
47 Wide Web was intended to work. Each object (file or directory) is referenced
48 by a URL that includes the read- or write- cap. HTTP methods (GET, PUT, and
49 DELETE) are used to manipulate these objects. You can think of the URL as a
50 noun, and the method as a verb.
52 In REST, the GET method is used to retrieve information about an object, or
53 to retrieve some representation of the object itself. When the object is a
54 file, the basic GET method will simply return the contents of that file.
55 Other variations (generally implemented by adding query parameters to the
56 URL) will return information about the object, such as metadata. GET
57 operations are required to have no side-effects.
59 PUT is used to upload new objects into the filesystem, or to replace an
60 existing object. DELETE it used to delete objects from the filesystem. Both
61 PUT and DELETE are required to be idempotent: performing the same operation
62 multiple times must have the same side-effects as only performing it once.
64 POST is used for more complicated actions that cannot be expressed as a GET,
65 PUT, or DELETE. POST operations can be thought of as a method call: sending
66 some message to the object referenced by the URL. In Tahoe, POST is also used
67 for operations that must be triggered by an HTML form (including upload and
68 delete), because otherwise a regular web browser has no way to accomplish
71 Tahoe's web API is designed for two different consumers. The first is a
72 program that needs to manipulate the virtual file system. Such programs are
73 expected to use the RESTful interface described above. The second is a human
74 using a standard web browser to work with the filesystem. This user is given
75 a series of HTML pages with links to download files, and forms that use POST
76 actions to upload, rename, and delete files.
80 Tahoe uses a variety of read- and write- caps to identify files and
81 directories. The most common of these is the "immutable file read-cap", which
82 is used for most uploaded files. These read-caps look like the following:
84 URI:CHK:ime6pvkaxuetdfah2p2f35pe54:4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a:3:10:202
86 The next most common is a "directory write-cap", which provides both read and
87 write access to a directory, and look like this:
89 URI:DIR2:djrdkfawoqihigoett4g6auz6a:jx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq
91 There are also "directory read-caps", which start with "URI:DIR2-RO:", and
92 give read-only access to a directory. Finally there are also mutable file
93 read- and write- caps, which start with "URI:SSK", and give access to mutable
96 (later versions of Tahoe will make these strings shorter, and will remove the
97 unfortunate colons, which must be escaped when these caps are embedded in
100 To refer to any Tahoe object through the web API, you simply need to combine
101 a prefix (which indicates the HTTP server to use) with the cap (which
102 indicates which object inside that server to access). Since the default Tahoe
103 webport is 8123, the most common prefix is one that will use a local node
104 listening on this port:
106 http://127.0.0.1:8123/uri/ + $CAP
108 So, to access the directory named above (which happens to be the
109 publically-writable sample directory on the Tahoe test grid, described at
110 http://allmydata.org/trac/tahoe/wiki/TestGrid), the URL would be:
112 http://127.0.0.1:8123/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/
114 (note that the colons in the directory-cap are url-encoded into "%3A"
117 Likewise, to access the file named above, use:
119 http://127.0.0.1:8123/uri/URI%3ACHK%3Aime6pvkaxuetdfah2p2f35pe54%3A4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a%3A3%3A10%3A202
121 In the rest of this document, we'll use "$DIRCAP" as shorthand for a read-cap
122 or write-cap that refers to a directory, and "$FILECAP" to abbreviate a cap
123 that refers to a file (whether mutable or immutable). So those URLs above can
126 http://127.0.0.1:8123/uri/$DIRCAP/
127 http://127.0.0.1:8123/uri/$FILECAP
129 The operation summaries below will abbreviate these further, by eliding the
130 server prefix. They will be displayed like this:
138 Tahoe directories contain named children, just like directories in a regular
139 local filesystem. These children can be either files or subdirectories.
141 If you have a Tahoe URL that refers to a directory, and want to reference a
142 named child inside it, just append the child name to the URL. For example, if
143 our sample directory contains a file named "welcome.txt", we can refer to
146 http://127.0.0.1:8123/uri/$DIRCAP/welcome.txt
148 (or http://127.0.0.1:8123/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/welcome.txt)
150 Multiple levels of subdirectories can be handled this way:
152 http://127.0.0.1:8123/uri/$DIRCAP/tahoe-source/docs/webapi.txt
154 In this document, when we need to refer to a URL that references a file using
155 this child-of-some-directory format, we'll use the following string:
157 /uri/$DIRCAP/[SUBDIRS../]FILENAME
159 The "[SUBDIRS../]" part means that there are zero or more (optional)
160 subdirectory names in the middle of the URL. The "FILENAME" at the end means
161 that this whole URL refers to a file of some sort, rather than to a
164 When we need to refer specifically to a directory in this way, we'll write:
166 /uri/$DIRCAP/[SUBDIRS../]SUBDIR
169 Note that all components of pathnames in URLs are required to be UTF-8
170 encoded, so "resume.doc" (with an acute accent on both E's) would be accessed
173 http://127.0.0.1:8123/uri/$DIRCAP/r%C3%A9sum%C3%A9.doc
175 Also note that the filenames inside upload POST forms are interpreted using
176 whatever character set was provided in the conventional '_charset' field, and
177 defaults to UTF-8 if not otherwise specified. The JSON representation of each
178 directory contains native unicode strings. Tahoe directories are specified to
179 contain unicode filenames, and cannot contain binary strings that are not
180 representable as such.
182 All Tahoe operations that refer to existing files or directories must include
183 a suitable read- or write- cap in the URL: the webapi server won't add one
184 for you. If you don't know the cap, you can't access the file. This allows
185 the security properties of Tahoe caps to be extended across the webapi
188 == Slow Operations, Progress, and Cancelling ==
190 Certain operations can be expected to take a long time. The "t=deep-check",
191 described below, will recursively visit every file and directory reachable
192 from a given starting point, which can take minutes or even hours for
193 extremely large directory structures. A single long-running HTTP request is a
194 fragile thing: proxies, NAT boxes, browsers, and users may all grow impatient
195 with waiting and give up on the connection.
197 For this reason, long-running operations have an "operation handle", which
198 can be used to poll for status/progress messages while the operation
199 proceeds. This handle can also be used to cancel the operation. These handles
200 are created by the client, and passed in as a an "ophandle=" query argument
201 to the POST or PUT request which starts the operation. The following
202 operations can then be used to retrieve status:
204 GET /operations/$HANDLE?output=HTML (with or without t=status)
205 GET /operations/$HANDLE?output=JSON (same)
207 These two retrieve the current status of the given operation. Each operation
208 presents a different sort of information, but in general the page retrieved
211 * whether the operation is complete, or if it is still running
212 * how much of the operation is complete, and how much is left, if possible
214 The HTML form will include a meta-refresh tag, which will cause a regular
215 web browser to reload the status page about 60 seconds later. This tag will
216 be removed once the operation has completed.
218 There may be more status information available under
219 /operations/$HANDLE/$ETC : i.e., the handle forms the root of a URL space.
221 POST /operations/$HANDLE?t=cancel
223 This terminates the operation, and returns an HTML page explaining what was
224 cancelled. If the operation handle has already expired (see below), this
225 POST will return a 404, which indicates that the operation is no longer
226 running (either it was completed or terminated). The response body will be
227 the same as a GET /operations/$HANDLE on this operation handle, and the
228 handle will be expired immediately afterwards.
230 The operation handle will eventually expire, to avoid consuming an unbounded
231 amount of memory. The handle's time-to-live can be reset at any time, by
232 passing a retain-for= argument (with a count of seconds) to either the
233 initial POST that starts the operation, or the subsequent GET request which
234 asks about the operation. For example, if a 'GET
235 /operations/$HANDLE?output=JSON&retain-for=600' query is performed, the
236 handle will remain active for 600 seconds (10 minutes) after the GET was
239 In addition, if the GET includes a release-after-complete=True argument, and
240 the operation has completed, the operation handle will be released
243 If a retain-for= argument is not used, the default handle lifetimes are:
245 * handles will remain valid at least until their operation finishes
246 * uncollected handles for finished operations (i.e. handles for operations
247 which have finished but for which the GET page has not been accessed since
248 completion) will remain valid for one hour, or for the total time consumed
249 by the operation, whichever is greater.
250 * collected handles (i.e. the GET page has been retrieved at least once
251 since the operation completed) will remain valid for ten minutes.
254 == Programmatic Operations ==
256 Now that we know how to build URLs that refer to files and directories in a
257 Tahoe virtual filesystem, what sorts of operations can we do with those URLs?
258 This section contains a catalog of GET, PUT, DELETE, and POST operations that
259 can be performed on these URLs. This set of operations are aimed at programs
260 that use HTTP to communicate with a Tahoe node. The next section describes
261 operations that are intended for web browsers.
263 === Reading A File ===
266 GET /uri/$DIRCAP/[SUBDIRS../]FILENAME
268 This will retrieve the contents of the given file. The HTTP response body
269 will contain the sequence of bytes that make up the file.
271 To view files in a web browser, you may want more control over the
272 Content-Type and Content-Disposition headers. Please see the next section
273 "Browser Operations", for details on how to modify these URLs for that
276 === Writing/Uploading A File ===
279 PUT /uri/$DIRCAP/[SUBDIRS../]FILENAME
281 Upload a file, using the data from the HTTP request body, and add whatever
282 child links and subdirectories are necessary to make the file available at
283 the given location. Once this operation succeeds, a GET on the same URL will
284 retrieve the same contents that were just uploaded. This will create any
285 necessary intermediate subdirectories.
287 To use the /uri/$FILECAP form, $FILECAP be a write-cap for a mutable file.
289 In the /uri/$DIRCAP/[SUBDIRS../]FILENAME form, if the target file is a
290 writable mutable file, that files contents will be overwritten in-place. If
291 it is a read-cap for a mutable file, an error will occur. If it is an
292 immutable file, the old file will be discarded, and a new one will be put in
295 When creating a new file, if "mutable=true" is in the query arguments, the
296 operation will create a mutable file instead of an immutable one.
298 This returns the file-cap of the resulting file. If a new file was created
299 by this method, the HTTP response code (as dictated by rfc2616) will be set
300 to 201 CREATED. If an existing file was replaced or modified, the response
303 Note that the 'curl -T localfile http://127.0.0.1:8123/uri/$DIRCAP/foo.txt'
304 command can be used to invoke this operation.
308 This uploads a file, and produces a file-cap for the contents, but does not
309 attach the file into the virtual drive. No directories will be modified by
310 this operation. The file-cap is returned as the body of the HTTP response.
312 If "mutable=true" is in the query arguments, the operation will create a
313 mutable file, and return its write-cap in the HTTP respose. The default is
314 to create an immutable file, returning the read-cap as a response.
316 === Creating A New Directory ===
321 Create a new empty directory and return its write-cap as the HTTP response
322 body. This does not make the newly created directory visible from the
323 virtual drive. The "PUT" operation is provided for backwards compatibility:
324 new code should use POST.
326 POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir
327 PUT /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir
329 Create new directories as necessary to make sure that the named target
330 ($DIRCAP/SUBDIRS../SUBDIR) is a directory. This will create additional
331 intermediate directories as necessary. If the named target directory already
332 exists, this will make no changes to it.
334 This will return an error if a blocking file is present at any of the parent
335 names, preventing the server from creating the necessary parent directory.
337 The write-cap of the new directory will be returned as the HTTP response
340 POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=NAME
342 Create a new empty directory and attach it to the given existing directory.
343 This will create additional intermediate directories as necessary.
345 The URL of this form points to the parent of the bottom-most new directory,
346 whereas the previous form has a URL that points directly to the bottom-most
349 === Get Information About A File Or Directory (as JSON) ===
351 GET /uri/$FILECAP?t=json
352 GET /uri/$DIRCAP?t=json
353 GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json
354 GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json
356 This returns a machine-parseable JSON-encoded description of the given
357 object. The JSON always contains a list, and the first element of the list
358 is always a flag that indicates whether the referenced object is a file or a
359 directory. If it is a file, then the information includes file size and URI,
362 GET /uri/$FILECAP?t=json :
363 GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json :
365 [ "filenode", { "ro_uri": file_uri,
368 "metadata": {"ctime": 1202777696.7564139,
369 "mtime": 1202777696.7564139
373 If it is a directory, then it includes information about the children of
374 this directory, as a mapping from child name to a set of data about the
375 child (the same data that would appear in a corresponding GET?t=json of the
376 child itself). The child entries also include metadata about each child,
377 including creation- and modification- timestamps. The output looks like
380 GET /uri/$DIRCAP?t=json :
381 GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json :
383 [ "dirnode", { "rw_uri": read_write_uri,
384 "ro_uri": read_only_uri,
387 "foo.txt": [ "filenode", { "ro_uri": uri,
390 "ctime": 1202777696.7564139,
391 "mtime": 1202777696.7564139
394 "subdir": [ "dirnode", { "rw_uri": rwuri,
397 "ctime": 1202778102.7589991,
398 "mtime": 1202778111.2160511,
403 In the above example, note how 'children' is a dictionary in which the keys
404 are child names and the values depend upon whether the child is a file or a
405 directory. The value is mostly the same as the JSON representation of the
406 child object (except that directories do not recurse -- the "children"
407 entry of the child is omitted, and the directory view includes the metadata
408 that is stored on the directory edge).
410 Then the rw_uri field will be present in the information about a directory
411 if and only if you have read-write access to that directory,
414 === Attaching an existing File or Directory by its read- or write- cap ===
416 PUT /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri
418 This attaches a child object (either a file or directory) to a specified
419 location in the virtual filesystem. The child object is referenced by its
420 read- or write- cap, as provided in the HTTP request body. This will create
421 intermediate directories as necessary.
423 This is similar to a UNIX hardlink: by referencing a previously-uploaded
424 file (or previously-created directory) instead of uploading/creating a new
425 one, you can create two references to the same object.
427 The read- or write- cap of the child is provided in the body of the HTTP
428 request, and this same cap is returned in the response body.
430 The default behavior is to overwrite any existing object at the same
431 location. To prevent this (and make the operation return an error instead of
432 overwriting), add a "replace=false" argument, as "?t=uri&replace=false".
433 With replace=false, this operation will return an HTTP 409 "Conflict" error
434 if there is already an object at the given location, rather than overwriting
435 the existing object. Note that "true", "t", and "1" are all synonyms for
436 "True", and "false", "f", and "0" are synonyms for "False". the parameter is
439 === Deleting a File or Directory ===
441 DELETE /uri/$DIRCAP/[SUBDIRS../]CHILDNAME
443 This removes the given name from its parent directory. CHILDNAME is the
444 name to be removed, and $DIRCAP/SUBDIRS.. indicates the directory that will
447 Note that this does not actually delete the file or directory that the name
448 points to from the tahoe grid -- it only removes the named reference from
449 this directory. If there are other names in this directory or in other
450 directories that point to the resource, then it will remain accessible
451 through those paths. Even if all names pointing to this object are removed
452 from their parent directories, then someone with possession of its read-cap
453 can continue to access the object through that cap.
455 The object will only become completely unreachable once 1: there are no
456 reachable directories that reference it, and 2: nobody is holding a read-
457 or write- cap to the object. (This behavior is very similar to the way
458 hardlinks and anonymous files work in traditional unix filesystems).
460 This operation will not modify more than a single directory. Intermediate
461 directories which were implicitly created by PUT or POST methods will *not*
462 be automatically removed by DELETE.
464 This method returns the file- or directory- cap of the object that was just
467 == Browser Operations ==
469 This section describes the HTTP operations that provide support for humans
470 running a web browser. Most of these operations use HTML forms that use POST
471 to drive the Tahoe node.
473 Note that for all POST operations, the arguments listed can be provided
474 either as URL query arguments or as form body fields. URL query arguments are
475 separated from the main URL by "?", and from each other by "&". For example,
476 "POST /uri/$DIRCAP?t=upload&mutable=true". Form body fields are usually
477 specified by using <input type="hidden"> elements. For clarity, the
478 descriptions below display the most significant arguments as URL query args.
480 === Viewing A Directory (as HTML) ===
482 GET /uri/$DIRCAP/[SUBDIRS../]
484 This returns an HTML page, intended to be displayed to a human by a web
485 browser, which contains HREF links to all files and directories reachable
486 from this directory. These HREF links do not have a t= argument, meaning
487 that a human who follows them will get pages also meant for a human. It also
488 contains forms to upload new files, and to delete files and directories.
489 Those forms use POST methods to do their job.
491 === Viewing/Downloading a File ===
494 GET /uri/$DIRCAP/[SUBDIRS../]FILENAME
496 This will retrieve the contents of the given file. The HTTP response body
497 will contain the sequence of bytes that make up the file.
499 If you want the HTTP response to include a useful Content-Type header,
500 either use the second form (which starts with a $DIRCAP), or add a
501 "filename=foo" query argument, like "GET /uri/$FILECAP?filename=foo.jpg".
502 The bare "GET /uri/$FILECAP" does not give the Tahoe node enough information
503 to determine a Content-Type (since Tahoe immutable files are merely
504 sequences of bytes, not typed+named file objects).
506 If the URL has both filename= and "save=true" in the query arguments, then
507 the server to add a "Content-Disposition: attachment" header, along with a
508 filename= parameter. When a user clicks on such a link, most browsers will
509 offer to let the user save the file instead of displaying it inline (indeed,
510 most browsers will refuse to display it inline). "true", "t", "1", and other
511 case-insensitive equivalents are all treated the same.
513 Character-set handling in URLs and HTTP headers is a dubious art[1]. For
514 maximum compatibility, Tahoe simply copies the bytes from the filename=
515 argument into the Content-Disposition header's filename= parameter, without
516 trying to interpret them in any particular way.
519 GET /named/$FILECAP/FILENAME
521 This is an alternate download form which makes it easier to get the correct
522 filename. The Tahoe server will provide the contents of the given file, with
523 a Content-Type header derived from the given filename. This form is used to
524 get browsers to use the "Save Link As" feature correctly, and also helps
525 command-line tools like "wget" and "curl" use the right filename. Note that
526 this form can *only* be used with file caps; it is an error to use a
527 directory cap after the /named/ prefix.
529 === Get Information About A File Or Directory (as HTML) ===
531 GET /uri/$FILECAP?t=info
532 GET /uri/$DIRCAP/?t=info
533 GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR/?t=info
534 GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=info
536 This returns a human-oriented HTML page with more detail about the selected
537 file or directory object. This page contains the following items:
542 raw contents (text/plain)
543 access caps (URIs): verify-cap, read-cap, write-cap (for mutable objects)
544 check/verify/repair form
545 deep-check/deep-size/deep-stats/manifest (for directories)
546 replace-conents form (for mutable files)
548 === Creating a Directory ===
552 This creates a new directory, but does not attach it to the virtual
555 If a "redirect_to_result=true" argument is provided, then the HTTP response
556 will cause the web browser to be redirected to a /uri/$DIRCAP page that
557 gives access to the newly-created directory. If you bookmark this page,
558 you'll be able to get back to the directory again in the future. This is the
559 recommended way to start working with a Tahoe server: create a new unlinked
560 directory (using redirect_to_result=true), then bookmark the resulting
561 /uri/$DIRCAP page. There is a "Create Directory" button on the Welcome page
562 to invoke this action.
564 If "redirect_to_result=true" is not provided (or is given a value of
565 "false"), then the HTTP response body will simply be the write-cap of the
568 POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=CHILDNAME
570 This creates a new directory as a child of the designated SUBDIR. This will
571 create additional intermediate directories as necessary.
573 If a "when_done=URL" argument is provided, the HTTP response will cause the
574 web browser to redirect to the given URL. This provides a convenient way to
575 return the browser to the directory that was just modified. Without a
576 when_done= argument, the HTTP response will simply contain the write-cap of
577 the directory that was just created.
580 === Uploading a File ===
584 This uploads a file, and produces a file-cap for the contents, but does not
585 attach the file into the virtual drive. No directories will be modified by
588 The file must be provided as the "file" field of an HTML encoded form body,
589 produced in response to an HTML form like this:
590 <form action="/uri" method="POST" enctype="multipart/form-data">
591 <input type="hidden" name="t" value="upload" />
592 <input type="file" name="file" />
593 <input type="submit" value="Upload Unlinked" />
596 If a "when_done=URL" argument is provided, the response body will cause the
597 browser to redirect to the given URL. If the when_done= URL has the string
598 "%(uri)s" in it, that string will be replaced by a URL-escaped form of the
599 newly created file-cap. (Note that without this substitution, there is no
600 way to access the file that was just uploaded).
602 The default (in the absence of when_done=) is to return an HTML page that
603 describes the results of the upload. This page will contain information
604 about which storage servers were used for the upload, how long each
607 If a "mutable=true" argument is provided, the operation will create a
608 mutable file, and the response body will contain the write-cap instead of
609 the upload results page. The default is to create an immutable file,
610 returning the upload results page as a response.
613 POST /uri/$DIRCAP/[SUBDIRS../]?t=upload
615 This uploads a file, and attaches it as a new child of the given directory.
616 The file must be provided as the "file" field of an HTML encoded form body,
617 produced in response to an HTML form like this:
618 <form action="." method="POST" enctype="multipart/form-data">
619 <input type="hidden" name="t" value="upload" />
620 <input type="file" name="file" />
621 <input type="submit" value="Upload" />
624 A "name=" argument can be provided to specify the new child's name,
625 otherwise it will be taken from the "filename" field of the upload form
626 (most web browsers will copy the last component of the original file's
627 pathname into this field). To avoid confusion, name= is not allowed to
630 If there is already a child with that name, and it is a mutable file, then
631 its contents are replaced with the data being uploaded. If it is not a
632 mutable file, the default behavior is to remove the existing child before
633 creating a new one. To prevent this (and make the operation return an error
634 instead of overwriting the old child), add a "replace=false" argument, as
635 "?t=upload&replace=false". With replace=false, this operation will return an
636 HTTP 409 "Conflict" error if there is already an object at the given
637 location, rather than overwriting the existing object. Note that "true",
638 "t", and "1" are all synonyms for "True", and "false", "f", and "0" are
639 synonyms for "False". the parameter is case-insensitive.
641 This will create additional intermediate directories as necessary, although
642 since it is expected to be triggered by a form that was retrieved by "GET
643 /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will
646 If a "mutable=true" argument is provided, any new file that is created will
647 be a mutable file instead of an immutable one. <input type="checkbox"
648 name="mutable" /> will give the user a way to set this option.
650 If a "when_done=URL" argument is provided, the HTTP response will cause the
651 web browser to redirect to the given URL. This provides a convenient way to
652 return the browser to the directory that was just modified. Without a
653 when_done= argument, the HTTP response will simply contain the file-cap of
654 the file that was just uploaded (a write-cap for mutable files, or a
655 read-cap for immutable files).
657 POST /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=upload
659 This also uploads a file and attaches it as a new child of the given
660 directory. It is a slight variant of the previous operation, as the URL
661 refers to the target file rather than the parent directory. It is otherwise
662 identical: this accepts mutable= and when_done= arguments too.
664 POST /uri/$FILECAP?t=upload
666 === Attaching An Existing File Or Directory (by URI) ===
668 POST /uri/$DIRCAP/[SUBDIRS../]?t=uri&name=CHILDNAME&uri=CHILDCAP
670 This attaches a given read- or write- cap "CHILDCAP" to the designated
671 directory, with a specified child name. This behaves much like the PUT t=uri
672 operation, and is a lot like a UNIX hardlink.
674 This will create additional intermediate directories as necessary, although
675 since it is expected to be triggered by a form that was retrieved by "GET
676 /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will
679 === Deleting A Child ===
681 POST /uri/$DIRCAP/[SUBDIRS../]?t=delete&name=CHILDNAME
683 This instructs the node to delete a child object (file or subdirectory) from
684 the given directory. Note that the entire subtree is removed. This is
685 somewhat like "rm -rf" (from the point of view of the parent), but other
686 references into the subtree will see that the child subdirectories are not
687 modified by this operation. Only the link from the given directory to its
690 === Renaming A Child ===
692 POST /uri/$DIRCAP/[SUBDIRS../]?t=rename&from_name=OLD&to_name=NEW
694 This instructs the node to rename a child of the given directory. This is
695 exactly the same as removing the child, then adding the same child-cap under
696 the new name. This operation cannot move the child to a different directory.
698 This operation will replace any existing child of the new name, making it
699 behave like the UNIX "mv -f" command.
701 === Other Utilities ===
705 This causes a redirect to /uri/$CAP, and retains any additional query
706 arguments (like filename= or save=). This is for the convenience of web
707 forms which allow the user to paste in a read- or write- cap (obtained
708 through some out-of-band channel, like IM or email).
710 Note that this form merely redirects to the specific file or directory
711 indicated by the $CAP: unlike the GET /uri/$DIRCAP form, you cannot
712 traverse to children by appending additional path segments to the URL.
714 GET /uri/$DIRCAP/[SUBDIRS../]?t=rename-form&name=$CHILDNAME
716 This provides a useful facility to browser-based user interfaces. It
717 returns a page containing a form targetting the "POST $DIRCAP t=rename"
718 functionality described above, with the provided $CHILDNAME present in the
719 'from_name' field of that form. I.e. this presents a form offering to
720 rename $CHILDNAME, requesting the new name, and submitting POST rename.
722 GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri
724 This returns the file- or directory- cap for the specified object.
726 GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=readonly-uri
728 This returns a read-only file- or directory- cap for the specified object.
729 If the object is an immutable file, this will return the same value as
732 === Debugging and Testing Features ===
734 These URLs are less-likely to be helpful to the casual Tahoe user, and are
735 mainly intended for developers.
739 This triggers the FileChecker to determine the current "health" of the
740 given file or directory, by counting how many shares are available. The
741 page that is returned will display the results. This can be used as a "show
742 me detailed information about this file" page.
744 If a verify=true argument is provided, the node will perform a more
745 intensive check, downloading and verifying every single bit of every share.
747 If an output=JSON argument is provided, the response will be
748 machine-readable JSON instead of human-oriented HTML. The data is a
749 dictionary with the following keys:
751 storage-index: a base32-encoded string with the objects's storage index,
752 or an empty string for LIT files
753 summary: a string, with a one-line summary of the stats of the file
754 results: a dictionary that describes the state of the file. For LIT files,
755 this dictionary has only the 'healthy' key, which will always be
756 True. For distributed files, this dictionary has the following
758 count-shares-good: the number of good shares that were found
759 count-shares-needed: 'k', the number of shares required for recovery
760 count-shares-expected: 'N', the number of total shares generated
761 count-good-share-hosts: the number of distinct storage servers with
762 good shares. If this number is less than
763 count-shares-good, then some shares are doubled
764 up, increasing the correlation of failures. This
765 indicates that one or more shares should be
766 moved to an otherwise unused server, if one is
768 count-wrong-shares: for mutable files, the number of shares for
769 versions other than the 'best' one (highest
770 sequence number, highest roothash). These are
772 count-recoverable-versions: for mutable files, the number of
773 recoverable versions of the file. For
774 a healthy file, this will equal 1.
775 count-unrecoverable-versions: for mutable files, the number of
776 unrecoverable versions of the file.
777 For a healthy file, this will be 0.
778 count-corrupt-shares: the number of shares with integrity failures
779 list-corrupt-shares: a list of "share locators", one for each share
780 that was found to be corrupt. Each share locator
781 is a list of (serverid, storage_index, sharenum).
782 needs-rebalancing: (bool) True if there are multiple shares on a single
783 storage server, indicating a reduction in reliability
784 that could be resolved by moving shares to new
786 servers-responding: list of base32-encoded storage server identifiers,
787 one for each server which responded to the share
789 healthy: (bool) True if the file is completely healthy, False otherwise.
790 Healthy files have at least N good shares. Overlapping shares
791 (indicated by count-good-share-hosts < count-shares-good) do not
792 currently cause a file to be marked unhealthy. If there are at
793 least N good shares, then corrupt shares do not cause the file to
794 be marked unhealthy, although the corrupt shares will be listed
795 in the results (list-corrupt-shares) and should be manually
796 removed to wasting time in subsequent downloads (as the
797 downloader rediscovers the corruption and uses alternate shares).
798 sharemap: dict mapping share identifier to list of serverids
799 (base32-encoded strings). This indicates which servers are
800 holding which shares. For immutable files, the shareid is
801 an integer (the share number, from 0 to N-1). For
802 immutable files, it is a string of the form
803 'seq%d-%s-sh%d', containing the sequence number, the
804 roothash, and the share number.
806 POST $URL?t=start-deep-check (must add &ophandle=XYZ)
808 This initiates a recursive walk of all files and directories reachable from
809 the target, performing a check on each one just like t=check. The result
810 page will contain a summary of the results, including details on any
811 file/directory that was not fully healthy.
813 t=start-deep-check can only be invoked on a directory. An error (400
814 BAD_REQUEST) will be signalled if it is invoked on a file. The recursive
815 walker will deal with loops safely.
817 This accepts the same verify= argument as t=check.
819 Since this operation can take a long time (perhaps a second per object),
820 the ophandle= argument is required (see "Slow Operations, Progress, and
821 Cancelling" above). The response to this POST will be a redirect to the
822 corresponding /operations/$HANDLE page (with output=HTML or output=JSON to
823 match the output= argument given to the POST). The deep-check operation
824 will continue to run in the background, and the /operations page should be
825 used to find out when the operation is done.
827 Detailed checker results for non-healthy files and directories will be
828 available under /operations/$HANDLE/$STORAGEINDEX, and the HTML status will
829 contain links to these detailed results.
831 The HTML /operations/$HANDLE page for incomplete operations will contain a
832 meta-refresh tag, set to 60 seconds, so that a browser which uses
833 deep-check will automatically poll until the operation has completed.
835 The JSON page (/options/$HANDLE?output=JSON) will contain a
836 machine-readable JSON dictionary with the following keys:
838 finished: a boolean, True if the operation is complete, else False. Some
839 of the remaining keys may not be present until the operation
841 root-storage-index: a base32-encoded string with the storage index of the
842 starting point of the deep-check operation
843 count-objects-checked: count of how many objects were checked. Note that
844 non-distributed objects (i.e. small immutable LIT
845 files) are not checked, since for these objects,
846 the data is contained entirely in the URI.
847 count-objects-healthy: how many of those objects were completely healthy
848 count-objects-unhealthy: how many were damaged in some way
849 count-corrupt-shares: how many shares were found to have corruption,
850 summed over all objects examined
851 list-corrupt-shares: a list of "share identifiers", one for each share
852 that was found to be corrupt. Each share identifier
853 is a list of (serverid, storage_index, sharenum).
854 list-unhealthy-files: a list of (pathname, check-results) tuples, for
855 each file that was not fully healthy. 'pathname' is
856 a list of strings (which can be joined by "/"
857 characters to turn it into a single string),
858 relative to the directory on which deep-check was
859 invoked. The 'check-results' field is the same as
860 that returned by t=check&output=JSON, described
862 stats: a dictionary with the same keys as the t=deep-stats command
865 POST $URL?t=check&repair=true
867 This performs a health check of the given file or directory, and if the
868 checker determines that the object is not healthy (some shares are missing
869 or corrupted), it will perform a "repair". During repair, any missing
870 shares will be regenerated and uploaded to new servers.
872 This accepts the same verify=true argument as t=check. When an output=JSON
873 argument is provided, the machine-readable JSON response will contain the
876 storage-index: a base32-encoded string with the objects's storage index,
877 or an empty string for LIT files
878 repair-attempted: (bool) True if repair was attempted
879 repair-successful: (bool) True if repair was attempted and the file was
880 fully healthy afterwards. False if no repair was
881 attempted, or if a repair attempt failed.
882 pre-repair-results: a dictionary that describes the state of the file
883 before any repair was performed. This contains exactly
884 the same keys as the 'results' value of the t=check
885 response, described above.
886 post-repair-results: a dictionary that describes the state of the file
887 after any repair was performed. If no repair was
888 performed, post-repair-results and pre-repair-results
889 will be the same. This contains exactly the same keys
890 as the 'results' value of the t=check response,
893 POST $URL?t=start-deep-check&repair=true (must add &ophandle=XYZ)
895 This triggers a recursive walk of all files and directories, performing a
896 t=check&repair=true on each one.
898 Like t=start-deep-check without the repair= argument, this can only be
899 invoked on a directory. An error (400 BAD_REQUEST) will be signalled if it
900 is invoked on a file. The recursive walker will deal with loops safely.
902 This accepts the same verify=true argument as t=start-deep-check. It uses
903 the same ophandle= mechanism as start-deep-check. When an output=JSON
904 argument is provided, the response will contain the following keys:
906 finished: (bool) True if the operation has completed, else False
907 root-storage-index: a base32-encoded string with the storage index of the
908 starting point of the deep-check operation
909 count-objects-checked: count of how many objects were checked
911 count-objects-healthy-pre-repair: how many of those objects were completely
912 healthy, before any repair
913 count-objects-unhealthy-pre-repair: how many were damaged in some way
914 count-objects-healthy-post-repair: how many of those objects were completely
915 healthy, after any repair
916 count-objects-unhealthy-post-repair: how many were damaged in some way
918 count-repairs-attempted: repairs were attempted on this many objects.
919 count-repairs-successful: how many repairs resulted in healthy objects
920 count-repairs-unsuccessful: how many repairs resulted did not results in
921 completely healthy objects
922 count-corrupt-shares-pre-repair: how many shares were found to have
923 corruption, summed over all objects
924 examined, before any repair
925 count-corrupt-shares-post-repair: how many shares were found to have
926 corruption, summed over all objects
927 examined, after any repair
928 list-corrupt-shares: a list of "share identifiers", one for each share
929 that was found to be corrupt (before any repair).
930 Each share identifier is a list of (serverid,
931 storage_index, sharenum).
932 list-remaining-corrupt-shares: like list-corrupt-shares, but mutable shares
933 that were successfully repaired are not
934 included. These are shares that need
935 manual processing. Since immutable shares
936 cannot be modified by clients, all corruption
937 in immutable shares will be listed here.
938 list-unhealthy-files: a list of (pathname, check-results) tuples, for
939 each file that was not fully healthy. 'pathname' is
940 relative to the directory on which deep-check was
941 invoked. The 'check-results' field is the same as
942 that returned by t=check&repair=true&output=JSON,
944 stats: a dictionary with the same keys as the t=deep-stats command
947 POST $DIRURL?t=start-manifest (must add &ophandle=XYZ)
949 This operation generates a "manfest" of the given directory tree, mostly
950 for debugging. This is a table of (path, filecap/dircap), for every object
951 reachable from the starting directory. The path will be slash-joined, and
952 the filecap/dircap will contain a link to the object in question. This page
953 gives immediate access to every object in the virtual filesystem subtree.
955 This operation uses the same ophandle= mechanism as deep-check. The
956 corresponding /operations/$HANDLE page has three different forms. The
957 default is output=HTML.
959 If output=text is added to the query args, the results will be a text/plain
960 list. The first line is special: it is either "finished: yes" or "finished:
961 no"; if the operation is not finished, you must periodically reload the
962 page until it completes. The rest of the results are a plaintext list, with
963 one file/dir per line, slash-separated, with the filecap/dircap separated
966 If output=JSON is added to the queryargs, then the results will be a
967 JSON-formatted dictionary with five keys:
969 finished (bool): if False then you must reload the page until True
970 origin_si (base32 str): the storage index of the starting point
971 manifest: list of (path, cap) tuples, where path is a list of strings.
972 storage-index: list of (base32) storage index strings
973 stats: a dictionary with the same keys as the t=deep-stats command
976 POST $DIRURL?t=start-deep-size (must add &ophandle=XYZ)
978 This operation generates a number (in bytes) containing the sum of the
979 filesize of all directories and immutable files reachable from the given
980 directory. This is a rough lower bound of the total space consumed by this
981 subtree. It does not include space consumed by mutable files, nor does it
982 take expansion or encoding overhead into account. Later versions of the
983 code may improve this estimate upwards.
985 The /operations/$HANDLE status output consists of two lines of text:
990 POST $DIRURL?t=start-deep-stats (must add &ophandle=XYZ)
992 This operation performs a recursive walk of all files and directories
993 reachable from the given directory, and generates a collection of
994 statistics about those objects.
996 The result (obtained from the /operations/$OPHANDLE page) is a
997 JSON-serialized dictionary with the following keys (note that some of these
998 keys may be missing until 'finished' is True):
1000 finished: (bool) True if the operation has finished, else False
1001 count-immutable-files: count of how many CHK files are in the set
1002 count-mutable-files: same, for mutable files (does not include directories)
1003 count-literal-files: same, for LIT files (data contained inside the URI)
1004 count-files: sum of the above three
1005 count-directories: count of directories
1006 size-immutable-files: total bytes for all CHK files in the set, =deep-size
1007 size-mutable-files (TODO): same, for current version of all mutable files
1008 size-literal-files: same, for LIT files
1009 size-directories: size of directories (includes size-literal-files)
1010 size-files-histogram: list of (minsize, maxsize, count) buckets,
1011 with a histogram of filesizes, 5dB/bucket,
1012 for both literal and immutable files
1013 largest-directory: number of children in the largest directory
1014 largest-immutable-file: number of bytes in the largest CHK file
1016 size-mutable-files is not implemented, because it would require extra
1017 queries to each mutable file to get their size. This may be implemented in
1020 Assuming no sharing, the basic space consumed by a single root directory is
1021 the sum of size-immutable-files, size-mutable-files, and size-directories.
1022 The actual disk space used by the shares is larger, because of the
1023 following sources of overhead:
1026 expansion due to erasure coding
1027 share management data (leases)
1028 backend (ext3) minimum block size
1030 == Other Useful Pages ==
1032 The portion of the web namespace that begins with "/uri" (and "/named") is
1033 dedicated to giving users (both humans and programs) access to the Tahoe
1034 virtual filesystem. The rest of the namespace provides status information
1035 about the state of the Tahoe node.
1037 GET / (the root page)
1039 This is the "Welcome Page", and contains a few distinct sections:
1041 Node information: library versions, local nodeid, services being provided.
1043 Filesystem Access Forms: create a new directory, view a file/directory by
1044 URI, upload a file (unlinked), download a file by
1047 Grid Status: introducer information, helper information, connected storage
1052 This page lists all active uploads and downloads, and contains a short list
1053 of recent upload/download operations. Each operation has a link to a page
1054 that describes file sizes, servers that were involved, and the time consumed
1055 in each phase of the operation.
1057 A GET of /status/?t=json will contain a machine-readable subset of the same
1058 data. It returns a JSON-encoded dictionary. The only key defined at this
1059 time is "active", with a value that is a list of operation dictionaries, one
1060 for each active operation. Once an operation is completed, it will no longer
1061 appear in data["active"] .
1063 Each op-dict contains a "type" key, one of "upload", "download",
1064 "mapupdate", "publish", or "retrieve" (the first two are for immutable
1065 files, while the latter three are for mutable files and directories).
1067 The "upload" op-dict will contain the following keys:
1069 type (string): "upload"
1070 storage-index-string (string): a base32-encoded storage index
1071 total-size (int): total size of the file
1072 status (string): current status of the operation
1073 progress-hash (float): 1.0 when the file has been hashed
1074 progress-ciphertext (float): 1.0 when the file has been encrypted.
1075 progress-encode-push (float): 1.0 when the file has been encoded and
1076 pushed to the storage servers. For helper
1077 uploads, the ciphertext value climbs to 1.0
1078 first, then encoding starts. For unassisted
1079 uploads, ciphertext and encode-push progress
1080 will climb at the same pace.
1082 The "download" op-dict will contain the following keys:
1084 type (string): "download"
1085 storage-index-string (string): a base32-encoded storage index
1086 total-size (int): total size of the file
1087 status (string): current status of the operation
1088 progress (float): 1.0 when the file has been fully downloaded
1090 Front-ends which want to report progress information are advised to simply
1091 average together all the progress-* indicators. A slightly more accurate
1092 value can be found by ignoring the progress-hash value (since the current
1093 implementation hashes synchronously, so clients will probably never see
1094 progress-hash!=1.0).
1098 This page provides a basic tool to predict the likely storage and bandwidth
1099 requirements of a large Tahoe grid. It provides forms to input things like
1100 total number of users, number of files per user, average file size, number
1101 of servers, expansion ratio, hard drive failure rate, etc. It then provides
1102 numbers like how many disks per server will be needed, how many read
1103 operations per second should be expected, and the likely MTBF for files in
1104 the grid. This information is very preliminary, and the model upon which it
1105 is based still needs a lot of work.
1109 If the node is running a helper (i.e. if "$BASEDIR/run_helper" is
1110 non-empty), then this page will provide a list of all the helper operations
1111 currently in progress. If "?t=json" is added to the URL, it will return a
1112 JSON-formatted list of helper statistics, which can then be used to produce
1113 graphs to indicate how busy the helper is.
1117 This page provides "node statistics", which are collected from a variety of
1120 load_monitor: every second, the node schedules a timer for one second in
1121 the future, then measures how late the subsequent callback
1122 is. The "load_average" is this tardiness, measured in
1123 seconds, averaged over the last minute. It is an indication
1124 of a busy node, one which is doing more work than can be
1125 completed in a timely fashion. The "max_load" value is the
1126 highest value that has been seen in the last 60 seconds.
1128 cpu_monitor: every minute, the node uses time.clock() to measure how much
1129 CPU time it has used, and it uses this value to produce
1130 1min/5min/15min moving averages. These values range from 0%
1131 (0.0) to 100% (1.0), and indicate what fraction of the CPU
1132 has been used by the Tahoe node. Not all operating systems
1133 provide meaningful data to time.clock(): they may report 100%
1134 CPU usage at all times.
1136 uploader: this counts how many immutable files (and bytes) have been
1137 uploaded since the node was started
1139 downloader: this counts how many immutable files have been downloaded
1140 since the node was started
1142 publishes: this counts how many mutable files (including directories) have
1143 been modified since the node was started
1145 retrieves: this counts how many mutable files (including directories) have
1146 been read since the node was started
1148 There are other statistics that are tracked by the node. The "raw stats"
1149 section shows a formatted dump of all of them.
1151 By adding "?t=json" to the URL, the node will return a JSON-formatted
1152 dictionary of stats values, which can be used by other tools to produce
1153 graphs of node behavior. The misc/munin/ directory in the source
1154 distribution provides some tools to produce these graphs.
1156 GET / (introducer status)
1158 For Introducer nodes, the welcome page displays information about both
1159 clients and servers which are connected to the introducer. Servers make
1160 "service announcements", and these are listed in a table. Clients will
1161 subscribe to hear about service announcements, and these subscriptions are
1162 listed in a separate table. Both tables contain information about what
1163 version of Tahoe is being run by the remote node, their advertised and
1164 outbound IP addresses, their nodeid and nickname, and how long they have
1167 By adding "?t=json" to the URL, the node will return a JSON-formatted
1168 dictionary of stats values, which can be used to produce graphs of connected
1169 clients over time. This dictionary has the following keys:
1171 ["subscription_summary"] : a dictionary mapping service name (like
1172 "storage") to an integer with the number of
1173 clients that have subscribed to hear about that
1175 ["announcement_summary"] : a dictionary mapping service name to an integer
1176 with the number of servers which are announcing
1178 ["announcement_distinct_hosts"] : a dictionary mapping service name to an
1179 integer which represents the number of
1180 distinct hosts that are providing that
1181 service. If two servers have announced
1182 FURLs which use the same hostnames (but
1183 different ports and tubids), they are
1184 considered to be on the same host.
1187 == Static Files in /public_html ==
1189 The webapi server will take any request for a URL that starts with /static
1190 and serve it from a configurable directory which defaults to
1191 $BASEDIR/public_html . This is configured by setting the "[node]web.static"
1192 value in $BASEDIR/tahoe.cfg . If this is left at the default value of
1193 "public_html", then http://localhost:8123/static/subdir/foo.html will be
1194 served with the contents of the file $BASEDIR/public_html/subdir/foo.html .
1196 This can be useful to serve a javascript application which provides a
1197 prettier front-end to the rest of the Tahoe webapi.
1200 == safety and security issues -- names vs. URIs ==
1202 Summary: use explicit file- and dir- caps whenever possible, to reduce the
1203 potential for surprises when the virtual drive is changed while you aren't
1206 The vdrive provides a mutable filesystem, but the ways that the filesystem
1207 can change are limited. The only thing that can change is that the mapping
1208 from child names to child objects that each directory contains can be changed
1209 by adding a new child name pointing to an object, removing an existing child
1210 name, or changing an existing child name to point to a different object.
1212 Obviously if you query tahoe for information about the filesystem and then
1213 act upon the filesystem (such as by getting a listing of the contents of a
1214 directory and then adding a file to the directory), then the filesystem might
1215 have been changed after you queried it and before you acted upon it.
1216 However, if you use the URI instead of the pathname of an object when you act
1217 upon the object, then the only change that can happen is when the object is a
1218 directory then the set of child names it has might be different. If, on the
1219 other hand, you act upon the object using its pathname, then a different
1220 object might be in that place, which can result in more kinds of surprises.
1222 For example, suppose you are writing code which recursively downloads the
1223 contents of a directory. The first thing your code does is fetch the listing
1224 of the contents of the directory. For each child that it fetched, if that
1225 child is a file then it downloads the file, and if that child is a directory
1226 then it recurses into that directory. Now, if the download and the recurse
1227 actions are performed using the child's name, then the results might be
1228 wrong, because for example a child name that pointed to a sub-directory when
1229 you listed the directory might have been changed to point to a file (in which
1230 case your attempt to recurse into it would result in an error and the file
1231 would be skipped), or a child name that pointed to a file when you listed the
1232 directory might now point to a sub-directory (in which case your attempt to
1233 download the child would result in a file containing HTML text describing the
1236 If your recursive algorithm uses the uri of the child instead of the name of
1237 the child, then those kinds of mistakes just can't happen. Note that both the
1238 child's name and the child's URI are included in the results of listing the
1239 parent directory, so it isn't any harder to use the URI for this purpose.
1241 In general, use names if you want "whatever object (whether file or
1242 directory) is found by following this name (or sequence of names) when my
1243 request reaches the server". Use URIs if you want "this particular object".
1245 == Concurrency Issues ==
1247 Tahoe uses both mutable and immutable files. Mutable files can be created
1248 explicitly by doing an upload with ?mutable=true added, or implicitly by
1249 creating a new directory (since a directory is just a special way to
1250 interpret a given mutable file).
1252 Mutable files suffer from the same consistency-vs-availability tradeoff that
1253 all distributed data storage systems face. It is not possible to
1254 simultaneously achieve perfect consistency and perfect availability in the
1255 face of network partitions (servers being unreachable or faulty).
1257 Tahoe tries to achieve a reasonable compromise, but there is a basic rule in
1258 place, known as the Prime Coordination Directive: "Don't Do That". What this
1259 means is that if write-access to a mutable file is available to several
1260 parties, then those parties are responsible for coordinating their activities
1261 to avoid multiple simultaneous updates. This could be achieved by having
1262 these parties talk to each other and using some sort of locking mechanism, or
1263 by serializing all changes through a single writer.
1265 The consequences of performing uncoordinated writes can vary. Some of the
1266 writers may lose their changes, as somebody else wins the race condition. In
1267 many cases the file will be left in an "unhealthy" state, meaning that there
1268 are not as many redundant shares as we would like (reducing the reliability
1269 of the file against server failures). In the worst case, the file can be left
1270 in such an unhealthy state that no version is recoverable, even the old ones.
1271 It is this small possibility of data loss that prompts us to issue the Prime
1272 Coordination Directive.
1274 Tahoe nodes implement internal serialization to make sure that a single Tahoe
1275 node cannot conflict with itself. For example, it is safe to issue two
1276 directory modification requests to a single tahoe node's webapi server at the
1277 same time, because the Tahoe node will internally delay one of them until
1278 after the other has finished being applied. (This feature was introduced in
1279 Tahoe-1.1; back with Tahoe-1.0 the web client was responsible for serializing
1280 web requests themselves).
1282 For more details, please see the "Consistency vs Availability" and "The Prime
1283 Coordination Directive" sections of mutable.txt, in the same directory as
1287 [1]: URLs and HTTP and UTF-8, Oh My
1289 HTTP does not provide a mechanism to specify the character set used to
1290 encode non-ascii names in URLs (rfc2396#2.1). We prefer the convention that
1291 the filename= argument shall be a URL-encoded UTF-8 encoded unicode object.
1292 For example, suppose we want to provoke the server into using a filename of
1293 "f i a n c e-acute e" (i.e. F I A N C U+00E9 E). The UTF-8 encoding of this
1294 is 0x66 0x69 0x61 0x6e 0x63 0xc3 0xa9 0x65 (or "fianc\xC3\xA9e", as python's
1295 repr() function would show). To encode this into a URL, the non-printable
1296 characters must be escaped with the urlencode '%XX' mechansim, giving us
1297 "fianc%C3%A9e". Thus, the first line of the HTTP request will be "GET
1298 /uri/CAP...?save=true&filename=fianc%C3%A9e HTTP/1.1". Not all browsers
1299 provide this: IE7 uses the Latin-1 encoding, which is fianc%E9e.
1301 The response header will need to indicate a non-ASCII filename. The actual
1302 mechanism to do this is not clear. For ASCII filenames, the response header
1305 Content-Disposition: attachment; filename="english.txt"
1307 If Tahoe were to enforce the utf-8 convention, it would need to decode the
1308 URL argument into a unicode string, and then encode it back into a sequence
1309 of bytes when creating the response header. One possibility would be to use
1310 unencoded utf-8. Developers suggest that IE7 might accept this:
1312 #1: Content-Disposition: attachment; filename="fianc\xC3\xA9e"
1313 (note, the last four bytes of that line, not including the newline, are
1314 0xC3 0xA9 0x65 0x22)
1316 RFC2231#4 (dated 1997): suggests that the following might work, and some
1317 developers (http://markmail.org/message/dsjyokgl7hv64ig3) have reported that
1318 it is supported by firefox (but not IE7):
1320 #2: Content-Disposition: attachment; filename*=utf-8''fianc%C3%A9e
1322 My reading of RFC2616#19.5.1 (which defines Content-Disposition) says that
1323 the filename= parameter is defined to be wrapped in quotes (presumeably to
1324 allow spaces without breaking the parsing of subsequent parameters), which
1327 #3: Content-Disposition: attachment; filename*=utf-8''"fianc%C3%A9e"
1329 However this is contrary to the examples in the email thread listed above.
1331 Developers report that IE7 (when it is configured for UTF-8 URL encoding,
1332 which is not the default in asian countries), will accept:
1334 #4: Content-Disposition: attachment; filename=fianc%C3%A9e
1336 However, for maximum compatibility, Tahoe simply copies bytes from the URL
1337 into the response header, rather than enforcing the utf-8 convention. This
1338 means it does not try to decode the filename from the URL argument, nor does
1339 it encode the filename into the response header.