From: Zooko O'Whielacronx Date: Fri, 10 Aug 2007 19:04:30 +0000 (-0700) Subject: webapi.txt: further refactoring and add a section explaining TOCTTOU bugs and how... X-Git-Url: https://git.rkrishnan.org/components/%22news.html/architecture.txt?a=commitdiff_plain;h=887240e7a35ba522c812aca7e13ff51e57e9fecd;p=tahoe-lafs%2Ftahoe-lafs.git webapi.txt: further refactoring and add a section explaining TOCTTOU bugs and how to avoid them by using URIs --- diff --git a/docs/webapi.txt b/docs/webapi.txt index 538287a0..eb70cc13 100644 --- a/docs/webapi.txt +++ b/docs/webapi.txt @@ -48,12 +48,14 @@ Now, what can we do with these URLs? By varying the HTTP "method" (GET/PUT/POST/DELETE) and by appending a type-indicating query argument, we control how what we want to do with the data and how it should be presented. -In the following examples "$URL" is a shorthand for a URL like the ones -described above. "$NEWURL" is a shorthand for a URL pointing to a location -in the vdrive where currently nothing exists. +=== files and directories by name === -=== files or directories === +In the following examples "$URL" is a shorthand for a URL like the ones +described above, with "vdrive/" as the top level, followed by a +slash-separated sequence of file or directory names. "$NEWURL" is a +shorthand for a URL pointing to a location in the vdrive where currently +nothing exists. GET $URL @@ -70,6 +72,10 @@ in the vdrive where currently nothing exists. for a human. It also contains forms to upload new files, and to delete files and directories. These forms use POST methods to do their job. + You can add the "save=true" argument, which adds a 'Content-Disposition: + attachment' header to prompt most web browsers to save the file to disk + rather than attempting to display it. + GET $URL?t=json This returns machine-parseable information about the named file or @@ -99,6 +105,9 @@ in the vdrive where currently nothing exists. corresponding FILEURL or DIRURL (except that dirnodes do not recurse -- the "children" entry of the child is omitted). + Before writing code that uses these results, please see the important note + below about TOCTTOU bugs. + DELETE $URL This deletes the given file or directory from the vdrive. If it is a @@ -150,8 +159,7 @@ in the vdrive where currently nothing exists. curl -T /dev/null 'http://localhost:8011/vdrive/global/newdir?t=upload&localdir=/home/user/directory-to-upload' - -=== just for files === +=== files by name === GET $URL?t=file @@ -173,7 +181,7 @@ in the vdrive where currently nothing exists. To use this, run 'curl -T localfile http://localhost:8011/vdrive/global/newfile' -=== just for directories === +=== directories by name === GET $URL?t=manifest @@ -194,6 +202,115 @@ in the vdrive where currently nothing exists. rename $CHILDNAME, requesting the new name, and submitting POST rename. +== URIs == + +A separate top-level resource namespace ("uri" instead of "vdrive") is used +to get access to files and dirnodes that are indexed directly by URI, rather +than by going through the vdrive. The resource thus referenced is used the +same way as if it were accessed through the vdrive, (including accessing a +directory's children with "$URI/childname"). + +For example, this identifies a file or directory: + +http://localhost:8011/uri/$URI + +And this identifies a file or directory in a subdirectory of the identified +directory: + +http://localhost:8011/uri/$URI/subdir/foo + +In the following examples, "$URI_URL" is a shorthand for a URL like the one +above, with "uri/" as the top level, followed by a URI. + +Note that since tahoe URIs may contain slashes (in particular, dirnode URIs +contain a FURL, which resembles a regular HTTP URL and starts with pb://), +when URIs are used in this form, they must be specially quoted. All slashes +in the URI must be replaced by '!' characters. XXX consider changing the +allmydata.org uri format to relieve the user of this requirement. + + GET $URI_URL + + This behaves the same way a "GET $URL", described in the "files and + directories" section above, but which file or directory you get does not + depend on the contents of parent directories as it does with the name-based + URLs, since a URI uniquely identifies an object regardless of its location. + + If the URI identifies a file, then this retrieves the contents of the + file. Since files accessed this way do not have a filename (from which a + MIME-type can be derived), one can be specified using a 'filename=' query + argument. This filename is also the one used if the 'save=true' argument is + set. + + PUT $URL?t=uri + + This attaches a child (either a file or a directory) to the vdrive at the + given location. The URI is provided in the body of the HTTP request. This + can be used to attach a shared directory to the vdrive. Intermediate + directories are created on-demand just like with the regular PUT command. + + GET http://localhost:8011/uri?uri=$URI + + This causes a redirect to /uri/$URI, and retains any additional query + arguments (like filename= or save=). This is for the convenience of web + forms which allow the user to paste in a URI (obtained through some + out-of-band channel, like IM or email). + + Note that this form only redirects to the specific node indicated by the + URI: unlike the GET /uri/$URI form, you cannot traverse to child nodes by + appending additional path segments to the URL. + + The $URI provided as a query argument is allowed to contain slashes. The + redirection provided will escape the slashes with exclamation points, as + described above. + + +== TOCTTOU bugs == + +Note that since directories are mutable you can get surprises if you query +the vdrive, e.g. "GET $URL?t=json", examine the resulting JSON-encoded +information, and then fetch from or update the vdrive using a name-based URL. +This is because the actual state of the vdrive could have changed after you +did the "GET $URL?t=json" query and before you did the subsequent fetch or +update. + +For example, suppose you query to find out that "vdrive/private/somedir/foo" +is a file which has a certain number of bytes, and then you issue a "GET +vdrive/private/somedir/foo" to fetch the file. The file that you get might +have a different number of bytes than the one that you chose to fetch, +because the "foo" entry in the "somedir" directory may have been changed to +point to a different file between your query and your fetch, or because the +"somedir" entry in the private vdrive might have been changed to point to a +different directory. + +Potentially more damaging, suppose that the "foo" entry was changed to point +to a directory instead of a file. Then instead of receiving the expected +file, you receive a file containing an HTML page describing the directory +contents! + +These are examples of TOCTTOU bugs ( http://en.wikipedia.org/wiki/TOCTTOU ). + +A good way to avoid these bugs is to issue your second request, not with a +URL based on the sequence of names that lead to the object, but instead with +the URI of the object. For example, in the case that you query a directory +listing (with "GET vdrive/private/somedir?t=json"), find a file named "foo" +therein that you want to download, and then download the file, if you +download it with its URI ("GET uri/$URI") instead of its URL ("GET +vdrive/private/somedir/foo") then you will get the file that was in the +"somedir/" directory under the name "foo" when you queried that directory, +even if the "somedir/" directory has since been changed so that its "foo" +child now points to a different file or to a directory. + +In general, use names if you want "whatever object (whether file or +directory) is found by following this sequence of names when my request +reaches the server". Use URIs if you want "this particular object". + +If you are basing your decision to fetch from or update the vdrive on +filesystem information that was returned by an earlier query, then you +usually intend to fetch or update the particular object that was in that +location when you queried it, rather than whatever object is going to be in +that location when your request reaches the server. + + == POST forms == POST $URL @@ -242,64 +359,6 @@ in the vdrive where currently nothing exists. present under 'to_name', akin to 'mv -f' in unix parlance. -== URIs == - - http://localhost:8011/uri/$URI - - A separate top-level resource namespace ("uri" instead of "vdrive") is used - to get access to files and dirnodes that are indexed directly by URI, - rather than by going through the vdrive. The resource thus referenced is - used the same way as if it were accessed through the vdrive, including - child-resource-traversal behavior. For example, if the URI corresponds to a - file, then - - GET http://localhost:8011/uri/$URI - - would retrieve the contents of the file. Since files accessed this way do - not have a naturally-occurring filename (from which a MIME-type can be - derived), one can be specified using a 'filename=' query argument. This - filename is also the one used if the 'save=true' argument is set, which - adds a 'Content-Disposition: attachment' header to prompt most web browsers - to save the file to disk rather than attempting to display it: - - GET http://localhost:8011/uri/$URI?filename=foo.jpg - GET http://localhost:8011/uri/$URI?filename=foo.jpg&save=true - - If the URI corresponds to a directory, then: - - PUT http://localhost:8011/uri/$URI/subdir/newfile?localfile=$FILENAME - - would upload a file (with contents taken from the local filesystem) to a - new file in a subdirectory of the referenced dirnode. - - Note that since tahoe URIs may contain slashes (in particular, dirnode URIs - contain a FURL, which resembles a regular HTTP URL and starts with pb://), - when URIs are used in this form, they must be specially quoted. All slashes - in the URI must be replaced by '!' characters. - - PUT $URL?t=uri - - This attaches a child (either a file or a directory) to the vdrive at the - given location. The URI is provided in the body of the HTTP request. This - can be used to attach a shared directory to the vdrive. Intermediate - directories are created on-demand just like with the regular PUT command. - - GET http://localhost:8011/uri?uri=$URI - - This causes a redirect to /uri/$URI, and retains any additional query - arguments (like filename= or save=). This is for the convenience of web - forms which allow the user to paste in a URI (obtained through some - out-of-band channel, like IM or email). - - Note that this form only redirects to the specific node indicated by the - URI: unlike the GET /uri/$URI form, you cannot traverse to child nodes by - appending additional path segments to the URL. - - The $URI provided as a query argument is allowed to contain slashes. The - redirection provided will escape the slashes with exclamation points, as - described above. - - == XMLRPC == http://localhost:8011/xmlrpc @@ -320,3 +379,4 @@ in the vdrive where currently nothing exists. put_uri(vdrivename, path, URI) etc.. +