From: Zooko O'Whielacronx Date: Thu, 23 Aug 2007 20:03:26 +0000 (-0700) Subject: new improved webapi.txt X-Git-Url: https://git.rkrishnan.org/components/%22news.html/architecture.txt?a=commitdiff_plain;h=2b77a70920963abdcffe0ef0da471cf7bc246d4d;p=tahoe-lafs%2Ftahoe-lafs.git new improved webapi.txt As per ticket #118, this refactors the explanation of URIs and paths and changes the JSON metadata schema. http://allmydata.org/trac/tahoe/ticket/118 --- diff --git a/docs/webapi.txt b/docs/webapi.txt index 6e7d6c6d..9776138b 100644 --- a/docs/webapi.txt +++ b/docs/webapi.txt @@ -1,4 +1,16 @@ -== connecting to the tahoe node == +This document has six sections: + +1. the basic API for how to programmatically control your tahoe node +2. convenience methods +3. safety and security issues +4. features for controlling your tahoe node from a standard web browser +5. debugging and testing features +6. XML-RPC (coming soon) + + +1. the basic API for how to programmatically control your tahoe node + +a. connecting to the tahoe node Writing "8011" into $NODEDIR/webport causes the node to run a webserver on port 8011. Writing "tcp:8011:interface=127.0.0.1" into $NODEDIR/webport does @@ -7,34 +19,28 @@ on the local host can connect. Using "ssl:8011:privateKey=mykey.pem:certKey=cert.pem" would run an SSL server. See twisted.application.strports for more details. -In this release, anyone who can connect to this port will be able to use the -vdrive. Authentication will be added in a near-future release, probably by -having the node generate an unguessable prefix which should be inserted -before the 'vdrive' segment in the URLS described below, and writing this -nonce to a read-by-owner-only file in $NODEDIR. Please see ticket #98 for -details. +If $NODEDIR/webpassword exists, it will be used (somehow) to require HTTP +Digest Authentication for all webserver connections. XXX specify how - -== vdrive == +b. file names The node provides some small number of "virtual drives". In the 0.5 release, this number is two: the first is the global shared vdrive, the second is the private non-shared vdrive. We will call these "global" and -"private" for now. +"private". For the purpose of this document, let us assume that the vdrives currently contain the following directories and files: - global/ - global/Documents/ - global/Documents/notes.txt - - private/ - private/Pictures/ - private/Pictures/tractors.jpg - private/Pictures/family/ - private/Pictures/family/bobby.jpg +global/ +global/Documents/ +global/Documents/notes.txt +private/ +private/Pictures/ +private/Pictures/tractors.jpg +private/Pictures/family/ +private/Pictures/family/bobby.jpg Within the webserver, there is a tree of resources. The top-level "vdrive" resource gives access to files and directories in all of the user's virtual @@ -50,239 +56,176 @@ In addition, each directory has a corresponding URL. The Pictures URL is: http://localhost:8011/vdrive/private/Pictures -Now, what can we do with these URLs? By varying the HTTP method -(GET/PUT/POST/DELETE) and by appending a type-indicating query argument, we -control how what we want to do with the data and how it should be presented. - - -=== Manipulating files and directories by name === - -In the following examples "$URL" is a shorthand for a URL like the ones -described above, with "vdrive/" as the top level, followed by a -slash-separated sequence of directory names, ending with the name of a file -or a directory. "$NEWURL" is a shorthand for a URL pointing to a location in -the vdrive where currently nothing exists. - - GET $URL - - If the given place in the vdrive contains a file, then this simply - retrieves the contents of the file. The Content-Type is set according to - the vdrive's metadata (if available) or by using the usual - filename-extension-magic built into most webservers. The file's contents - are provided in the body of the HTTP response. - - If the given place contains a directory, then this returns an HTML page, - intended to be used by humans, which contains HREF links to all files and - directories reachable from this dirnode. These HREF links do not have a t= - argument, meaning that a human who follows them will get pages also meant - for a human. It also contains forms to upload new files, and to delete - files and directories. These forms use POST methods to do their job. - - You can add the "save=true" argument, which adds a 'Content-Disposition: - attachment' header to prompt most web browsers to save the file to disk - rather than attempting to display it. - - GET $URL?t=json - - This returns machine-parseable information about the named file or - directory in the HTTP response body. This information contains a flag that - indicates whether the thing is a file or a directory. - - If it is a file, then the information includes file size, metadata (like - Content-Type), and URIs, like this: - - [ 'filenode', { 'mutable': bool, 'uri': file_uri, 'size': bytes } ] - - If it is a directory, then it includes a flag to indicate whether this is a - read-write dirnode or a read-only dirnode, and information about the - children of this directory, as a mapping from child name to a set of - metadata about the child (the same data that would appear in a - corresponding GET?t=json of the child itself). Like this: - - [ 'dirnode', { 'mutable': bool, 'uri': uri, 'children': children } ] - - where 'children' is a dictionary in which the keys are child names - and the values depend upon whether the child is a file or a directory: - - 'foo.txt': [ 'filenode', { 'mutable': bool, 'uri': uri, 'size': bytes } ] - 'subdir': [ 'dirnode', { 'mutable': bool, 'uri': uri } ] - - note that the value is the same as the JSON representation of the - corresponding FILEURL or DIRURL (except that directories do not recurse -- - the "children" entry of the child is omitted). +c. URIs - Before writing code that uses these results, please see the important note - below about TOCTTOU bugs. +A separate top-level namespace ("uri/" instead of "vdrive/") is used to +access to files and directories directly by URI, rather than by going through +the vdrive. - GET $URL?t=uri +For example, this identifies a file or directory: - This returns the URI of the given file or directory in the HTTP response - body. If you have read-write access to that resource then this returns a - URI which provides read-write access. If you have read-only access to that - resource then this returns a URI which provides read-only access. +http://localhost:8011/uri/$URI - GET $URL?t=readonly-uri +And this identifies a file or directory named "tractors.jpg" in a +subdirectory "Pictures" of the identified directory: - This returns the URI providing read-only access to the given file or - directory (whether or not you have read-only or read-write access). - (Currently all files are immutable so everyone has read-only access to all - files.) +http://localhost:8011/uri/$URI/Pictures/tractors.jpg - PUT $URL?t=uri +In the following examples, "$URL" is a shorthand for a URL like the ones +above, either with "vdrive/" as the top level and a sequence of +slash-separated pathnames following, or with "uri/" as the top level, +followed by a URI, optionally followed by a sequence of slash-separated +pathnames. - This attaches a child (either a file or a directory) to the vdrive at the - given location. The URI of the child is provided in the body of the HTTP - request. This can be used to attach a shared directory to the - vdrive. Intermediate directories are created on-demand just like with the - regular PUT command. +Now, what can we do with these URLs? By varying the HTTP method +(GET/PUT/POST/DELETE) and by appending a type-indicating query argument, we +control what we want to do with the data and how it should be presented. - If there was already a child at the given name, this command will replace - the old child with the new one, and will return an HTTP 200 (OK) response - code. If there was not already a child there, it will return 201 (Created). - If you add an "replace=false" query argument, the command will return a 409 - (Conflict) error rather than replacing an existing child. +d. examining files or directories - DELETE $URL + GET $URL?t=json - This deletes the given file or directory from the vdrive. If it is a - directory then this deletes all of its chilren. Note that this *does not* - delete any parent directories, so a sequence of 'PUT $NEWURL' and 'DELETE - $NEWURL' does not necessarily return the vdrive to its original state (it - may leave some intermediate directories). + This returns machine-parseable information about the indicated file or + directory in the HTTP response body. This information contains a flag that + indicates whether the thing is a file or a directory. + If it is a file, then the information includes file size and URI, like + this: -=== Manipulating files by name === + [ 'filenode', { 'ro_uri': file_uri, + 'size': bytes } ] -In these examples, $NEWURL is specifically defined to point to a location in -the vdrive where currently nothing exists, and will be used to refer to a -file rather than a directory. + If it is a directory, then it includes information about the children of + this directory, as a mapping from child name to a set of metadata about the + child (the same data that would appear in a corresponding GET?t=json of the + child itself). Like this: - PUT $NEWURL + [ 'dirnode', { 'rw_uri': read_write_uri, + 'ro_uri': read_only_uri, + 'children': children } ] - This uploads a file to the given place in the vdrive. It will create - intermediate directories as necessary. The file's contents are taken from - the body of the HTTP request. For convenience, the HTTP response contains - the URI that results from uploading the file, although the node is not - obligated to do anything with the URI. According to the HTTP/1.1 - specification (rfc2616), this should return a 200 (OK) code when modifying - an existing file, and a 201 (Created) code when creating a new file. + In the above example, 'children' is a dictionary in which the keys are + child names and the values depend upon whether the child is a file or a + directory: - If there was already a child at the given name, this command will replace - the old child with the new one, and will return an HTTP 200 (OK) response - code. If there was not already a child there, it will return 201 (Created). - If you add an "replace=false" query argument, the command will return a 409 - (Conflict) error rather than replacing an existing child. + 'foo.txt': [ 'filenode', { 'ro_uri': uri, 'size': bytes } ] + 'subdir': [ 'dirnode', { 'rw_uri': rwuri, 'ro_uri': rouri } ] - To use this, run 'curl -T localfile http://localhost:8011/vdrive/global/newfile' + note that the value is the same as the JSON representation of the child + object (except that directories do not recurse -- the "children" entry of + the child is omitted). + Then the rw_uri field will be present in the information about a directory + if and only if you have read-write access to that directory, -=== Manipulating directories by name === +e. downloading a file -In this section, $URL and $NEWURL specifically refer to directories, rather -than files. + GET $URL - PUT $NEWURL?t=mkdir + If the indicated object is a file, then this simply retrieves the contents + of the file. The file's contents are provided in the body of the HTTP + response. - Create a new empty directory at the given path. The HTTP response contains - the URI of the given directory, although the client is not obligated to do - anything with it. + If the indicated object a directory, then this returns an HTML page, + intended to be used by humans, which contains HREF links to all files and + directories reachable from this directory. These HREF links do not have a + t= argument, meaning that a human who follows them will get pages also + meant for a human. It also contains forms to upload new files, and to + delete files and directories. These forms use POST methods to do their job. - If there was already a child at the given name, this command will replace - the old child with the new one, and will return an HTTP 200 (OK) response - code. If there was not already a child there, it will return 201 (Created). - If you add an "replace=false" query argument, the command will return a 409 - (Conflict) error rather than replacing an existing child. + You can add the "save=true" argument, which adds a 'Content-Disposition: + attachment' header to prompt most web browsers to save the file to disk + rather than attempting to display it. - GET $URL?t=rename-form&name=$CHILDNAME + A filename (from which a MIME type can be derived) can be specified using a + 'filename=' query argument. This is especially useful if the $URL does not + end with the name of the file (because it instead ends with the identifier + of the file). This filename is also the one used if the 'save=true' + argument is set. For example: - This provides a useful facility to browser-based user interfaces. It - returns a page containing a form targetting the "POST $URL t=rename" - functionality described below, with the provided $CHILDNAME present in the - 'from_name' field of that form. I.e. this presents a form offering to - rename $CHILDNAME, requesting the new name, and submitting POST rename. - Note that this can be used to rename both files and directories, but the - GET request itself is always directed to the directory containing the - object to be renamed. + GET http://localhost:8011/uri/$TRACTORS_URI?filename=tractors.jpg +f. uploading a file -== URIs == + PUT http://localhost:8011/uri -A separate top-level resource namespace ("uri/" instead of "vdrive/") is used -to get access to files and directories that are indexed directly by URI, -rather than by going through the vdrive. The resource thus referenced is used -the same way as if it were accessed through the vdrive (including accessing a -directory's children with "$URI/childname"). + Upload a file, returning its URI as the HTTP response body. This does not + make the file visible from the virtual drive -- to do that, see section + 1.h. below, or the convenience method in section 2.a.. -For example, this identifies a file or directory: +g. creating a new directory -http://localhost:8011/uri/$URI + PUT http://localhost:8011/uri?t=mkdir -And this identifies a file or directory "foo" in a subdirectory "somedir" of -the identified directory: + Create a new empty directory and return its URI as the HTTP response body. + This does not make the newly created directory visible from the virtual + drive, but you can use section 1.h. to attach it, or the convenience method + in section 2.XXX. -http://localhost:8011/uri/$URI/somedir/foo +h. attaching a file or directory as the child of an extant directory -In the following examples, "$URI_URL" is a shorthand for a URL like the one -above, with "uri/" as the top level, followed by a URI. + PUT $URL?t=uri -Note that since tahoe URIs may contain slashes (in particular, dirnode URIs -contain a FURL, which resembles a regular HTTP URL and starts with pb://), -when URIs are used in this form, they must be specially quoted. All slashes -in the URI must be replaced by '!' characters. The intent is to remove this -unpleasant requirement in a future release: please see ticket #102 for -details. + This attaches a child (either a file or a directory) to the given directory + $URL is required to indicate a directory as the second-to-last element and + the desired filename as the last element, for example: - GET $URI_URL - GET $URI_URL?t=json - GET $URI_URL?t=uri - GET $URI_URL?t=readonly-uri + PUT http://localhost:8011/uri/$URI_OF_SOME_DIR/Pictures/tractors.jpg + PUT http://localhost:8011/uri/$URI_OF_SOME_DIR/tractors.jpg + PUT http://localhost:8011/vdrive/private/Pictures/tractors.jpg - These each behave the same way that their name-based URL equivalent does, - described in the "files and directories" section above. The difference is - that which file or directory you access does not depend on the contents of - parent directories as it does with the name-based URLs, since a URI - uniquely identifies an object regardless of its location. + The URI of the child is provided in the body of the HTTP request. - Since files accessed directly this way do not have a filename (from which a - MIME-type can be derived), one can be specified using a 'filename=' query - argument. This filename is also the one used if the 'save=true' argument is - set. For example: + There is an optional "?overwrite=" param whose value can be "true", "t", + "1", "false", "f", or "0" (case-insensitive), and which defaults to "true". + If the indicated directory already contains the given child name, then if + overwrite is true then the value of that name is changed to be the new URI. + If overwrite is false then an error is returned. XXX specify the error - GET http://localhost:8011/uri/$TRACTORS_URI?filename=tractors.jpg + This can be used to attach a shared directory (a directory that other + people can read or write) to the vdrive. Intermediate directories, if any, + are created on-demand. - If the URI represents a directory, you can append additional path segments - to $URI_URL to access children of that directory. For example, if we first - obtained the URI of the "private/Pictures" directory by doing: +i. removing a name from a directory - GET http://localhost:8011/vdrive/private/Pictures?t=uri -> PICTURES_URI + DELETE $URL - then we could download "private/Pictures/family/bobby.jpg" by fetching: + This removes the given name from the given directory. $URL is required to + indicate a directory as the second-to-last element and the name to remove + from that directory as the last element, just as in section 1.g.. - GET http://localhost:8011/uri/$PICTURES_URI/family/bobby.jpg + Note that this does not actually delete the resource that the name points + to from the tahoe grid -- it only removes this name in this directory. If + there are other names in this directory or in other directories that point + to the resource, then it will remain accessible through those paths. Even + if all names pointing to this resource are removed from their parent + directories, then if someone is in possession of the URI of this resource + they can continue to access the resource through the URI. Only if a person + is not in possession of the URI, and they do not have access to any + directories which contain names pointing to this resource, are they + prevented from accessing the resource. - Note that since the $URI_URL already contains the URI, the only use for the - "?t=readonly-uri" command is if the thing identified is a directory and you - have read-write access to it and you want to get a URI which provides - read-only access to it. "?t=uri" is completely redundant but included for - completeness. +2. convenience methods - GET http://localhost:8011/uri?uri=$URI +a. uploading a file and attaching it to the vdrive - This causes a redirect to /uri/$URI, and retains any additional query - arguments (like filename= or save=). This is for the convenience of web - forms which allow the user to paste in a URI (obtained through some - out-of-band channel, like IM or email). + PUT $URI - Note that this form merely redirects to the specific node indicated by the - URI: unlike the GET /uri/$URI form, you cannot traverse to children by - appending additional path segments to the URL. + Upload a file and link it into the the vdrive at the location specified by + $URI. The last item in the $URI must be a filename, and the second-to-last + item must identify a directory. - The $URI provided as a query argument is allowed to contain slashes. The - redirection provided will escape the slashes with exclamation points, as - described above. + It will create intermediate directories as necessary. The file's contents + are taken from the body of the HTTP request. For convenience, the HTTP + response contains the URI that results from uploading the file, although + the client is not obligated to do anything with the URI. According to the + HTTP/1.1 specification (rfc2616), this should return a 200 (OK) code when + modifying an existing file, and a 201 (Created) code when creating a new + file. + To use this, run 'curl -T localfile http://localhost:8011/vdrive/global/newfile' -== names versus identifiers == +3. safety and security issues -- names vs. URIs The vdrive provides a mutable filesystem, but the ways that the filesystem can change are limited. The only thing that can change is that the mapping @@ -307,14 +250,14 @@ child is a file then it downloads the file, and if that child is a directory then it recurses into that directory. Now, if the download and the recurse actions are performed using the child's name, then the results might be wrong, because for example a child name that pointed to a sub-directory when -you listed the directory might have been changed to point to a file, in which +you listed the directory might have been changed to point to a file (in which case your attempt to recurse into it would result in an error and the file -would be skipped, or a child name that pointed to a file when you listed the -directory might now point to a sub-directory, in which case your attempt to +would be skipped), or a child name that pointed to a file when you listed the +directory might now point to a sub-directory (in which case your attempt to download the child would result in a file containing HTML text describing the -sub-directory! +sub-directory!). -If your recursive algorithm uses the URI of the child instead of the name of +If your recursive algorithm uses the uri of the child instead of the name of the child, then those kinds of mistakes just can't happen. Note that both the child's name and the child's URI are included in the results of listing the parent directory, so it isn't harder to use the URI for this purpose. @@ -323,13 +266,37 @@ In general, use names if you want "whatever object (whether file or directory) is found by following this name (or sequence of names) when my request reaches the server". Use URIs if you want "this particular object". -== POST forms == +4. features for controlling your tahoe node from a standard web browser + +a. uri redirect - POST $URL + GET http://localhost:8011/uri?uri=$URI + + This causes a redirect to /uri/$URI, and retains any additional query + arguments (like filename= or save=). This is for the convenience of web + forms which allow the user to paste in a URI (obtained through some + out-of-band channel, like IM or email). + + Note that this form merely redirects to the specific file or directory + indicated by the URI: unlike the GET /uri/$URI form, you cannot traverse to + children by appending additional path segments to the URL. + +b. web page offering rename + + GET $URL?t=rename-form&name=$CHILDNAME + + This provides a useful facility to browser-based user interfaces. It + returns a page containing a form targetting the "POST $URL t=rename" + functionality described below, with the provided $CHILDNAME present in the + 'from_name' field of that form. I.e. this presents a form offering to + rename $CHILDNAME, requesting the new name, and submitting POST rename. + +c. POST forms + + POST $URL t=upload name=childname (optional) file=newfile - This instructs the node to upload a file into the given directory. We need this because forms are the only way for a web browser to upload a file (browsers do not know how to do PUT or DELETE). The file's contents and the @@ -337,73 +304,43 @@ request reaches the server". Use URIs if you want "this particular object". used to upload a single file at a time. To avoid confusion, name= is not allowed to contain a slash (a 400 Bad Request error will result). - If there was already a child at the given name, this command will replace - the old child with the new one. But if you add a "replace=false" argument, - the command will refuse to replace the child, signalling an error instead. - - POST $URL + POST $URL t=mkdir name=childname This instructs the node to create a new empty directory. The name of the - new child directory will be included in the form's arguments. Existing - children are replaced unless a "replace=false" argument is provided. + new child directory will be included in the form's arguments. - POST $URL + POST $URL t=uri name=childname uri=newuri This instructs the node to attach a child that is referenced by URI (just - like the PUT $URL?t=uri method). The name and URI of the new child will be - included in the form's arguments. Existing children are replaced unless a - "replace=false" argument is provided. + like the PUT $URL?t=uri method). The name and URI of the new child + will be included in the form's arguments. - POST $URL + POST $URL t=delete name=childname This instructs the node to delete a file from the given directory. The name of the child to be deleted will be included in the form's arguments. - POST $URL + POST $URL t=rename from_name=oldchildname to_name=newchildname This instructs the node to rename a child within the given directory. The child specified by 'from_name' is removed, and reattached as a child named - for 'to_name'. An existing child at 'to_name' is replaced unless a - "replace=false" argument is provided, making the default behavior similar - to the unix 'mv -f' command. - - -== XMLRPC == - - http://localhost:8011/xmlrpc - - This resource provides an XMLRPC server on which all of the previous - operations can be expressed as function calls taking a "pathname" argument. - This is provided for applications that want to think of everything in terms - of XMLRPC. - - listdir(vdrivename, path) -> dict of (childname -> (stuff)) - put(vdrivename, path, contents) -> URI - get(vdrivename, path) -> contents - mkdir(vdrivename, path) -> URI - put_localfile(vdrivename, path, localfilename) -> URI - get_localfile(vdrivename, path, localfilename) - put_localdir(vdrivename, path, localdirname) # recursive - get_localdir(vdrivename, path, localdirname) # recursive - put_uri(vdrivename, path, URI) - - etc.. - + for 'to_name'. This is unconditional and will replace any child already + present under 'to_name', akin to 'mv -f' in unix parlance. -== Testing/Debugging Commands == +5. debugging and testing features - GET $URL?t=download&localfile=$LOCALPATH - GET $URL?t=download&localdir=$LOCALPATH +GET $URL?t=download&localfile=$LOCALPATH +GET $URL?t=download&localdir=$LOCALPATH The localfile= form instructs the node to download the given file and write it into the local filesystem at $LOCALPATH. The localdir= form instructs @@ -423,8 +360,8 @@ request reaches the server". Use URIs if you want "this particular object". Therefore this form is only enabled if you create a file named 'webport_allow_localfile' in the node's base directory. - PUT $NEWURL?t=upload&localfile=$LOCALPATH - PUT $NEWURL?t=upload&localdir=$LOCALPATH +PUT $NEWURL?t=upload&localfile=$LOCALPATH +PUT $NEWURL?t=upload&localdir=$LOCALPATH This uploads a file or directory from the node's local filesystem to the vdrive. As with "GET $URL?t=download&localfile=$LOCALPATH", this request @@ -456,6 +393,29 @@ request reaches the server". Use URIs if you want "this particular object". enabled if you create a file named 'webport_allow_localfile' in the node's base directory. - GET $URL?t=manifest +GET $URL?t=manifest Return an HTML-formatted manifest of the given directory, for debugging. + +6. XMLRPC (coming soon) + + http://localhost:8011/xmlrpc + + This resource provides an XMLRPC server on which all of the previous + operations can be expressed as function calls taking a "pathname" argument. + This is provided for applications that want to think of everything in terms + of XMLRPC. + + listdir(vdrivename, path) -> dict of (childname -> (stuff)) + put(vdrivename, path, contents) -> URI + get(vdrivename, path) -> contents + mkdir(vdrivename, path) -> URI + put_localfile(vdrivename, path, localfilename) -> URI + get_localfile(vdrivename, path, localfilename) + put_localdir(vdrivename, path, localdirname) # recursive + get_localdir(vdrivename, path, localdirname) # recursive + put_uri(vdrivename, path, URI) + + etc.. + +