--- /dev/null
+======================
+The Tahoe CLI commands
+======================
+
+1. `Overview`_
+2. `CLI Command Overview`_
+3. `Node Management`_
+4. `Filesystem Manipulation`_
+
+ 1. `Starting Directories`_
+ 2. `Command Syntax Summary`_
+ 3. `Command Examples`_
+
+5. `Storage Grid Maintenance`_
+6. `Debugging`_
+
+
+Overview
+========
+
+Tahoe provides a single executable named "``tahoe``", which can be used to
+create and manage client/server nodes, manipulate the filesystem, and perform
+several debugging/maintenance tasks.
+
+This executable lives in the source tree at "``bin/tahoe``". Once you've done a
+build (by running "make"), ``bin/tahoe`` can be run in-place: if it discovers
+that it is being run from within a Tahoe source tree, it will modify sys.path
+as necessary to use all the source code and dependent libraries contained in
+that tree.
+
+If you've installed Tahoe (using "``make install``", or by installing a binary
+package), then the tahoe executable will be available somewhere else, perhaps
+in ``/usr/bin/tahoe``. In this case, it will use your platform's normal
+PYTHONPATH search paths to find the tahoe code and other libraries.
+
+
+CLI Command Overview
+====================
+
+The "``tahoe``" tool provides access to three categories of commands.
+
+* node management: create a client/server node, start/stop/restart it
+* filesystem manipulation: list files, upload, download, delete, rename
+* debugging: unpack cap-strings, examine share files
+
+To get a list of all commands, just run "``tahoe``" with no additional
+arguments. "``tahoe --help``" might also provide something useful.
+
+Running "``tahoe --version``" will display a list of version strings, starting
+with the "allmydata" module (which contains the majority of the Tahoe
+functionality) and including versions for a number of dependent libraries,
+like Twisted, Foolscap, pycryptopp, and zfec.
+
+
+Node Management
+===============
+
+"``tahoe create-node [NODEDIR]``" is the basic make-a-new-node command. It
+creates a new directory and populates it with files that will allow the
+"``tahoe start``" command to use it later on. This command creates nodes that
+have client functionality (upload/download files), web API services
+(controlled by the 'webport' file), and storage services (unless
+"--no-storage" is specified).
+
+NODEDIR defaults to ~/.tahoe/ , and newly-created nodes default to
+publishing a web server on port 3456 (limited to the loopback interface, at
+127.0.0.1, to restrict access to other programs on the same host). All of the
+other "``tahoe``" subcommands use corresponding defaults.
+
+"``tahoe create-client [NODEDIR]``" creates a node with no storage service.
+That is, it behaves like "``tahoe create-node --no-storage [NODEDIR]``".
+(This is a change from versions prior to 1.6.0.)
+
+"``tahoe create-introducer [NODEDIR]``" is used to create the Introducer node.
+This node provides introduction services and nothing else. When started, this
+node will produce an introducer.furl, which should be published to all
+clients.
+
+"``tahoe create-key-generator [NODEDIR]``" is used to create a special
+"key-generation" service, which allows a client to offload their RSA key
+generation to a separate process. Since RSA key generation takes several
+seconds, and must be done each time a directory is created, moving it to a
+separate process allows the first process (perhaps a busy wapi server) to
+continue servicing other requests. The key generator exports a FURL that can
+be copied into a node to enable this functionality.
+
+"``tahoe run [NODEDIR]``" will start a previously-created node in the foreground.
+
+"``tahoe start [NODEDIR]``" will launch a previously-created node. It will launch
+the node into the background, using the standard Twisted "twistd"
+daemon-launching tool. On some platforms (including Windows) this command is
+unable to run a daemon in the background; in that case it behaves in the
+same way as "``tahoe run``".
+
+"``tahoe stop [NODEDIR]``" will shut down a running node.
+
+"``tahoe restart [NODEDIR]``" will stop and then restart a running node. This is
+most often used by developers who have just modified the code and want to
+start using their changes.
+
+
+Filesystem Manipulation
+=======================
+
+These commands let you exmaine a Tahoe filesystem, providing basic
+list/upload/download/delete/rename/mkdir functionality. They can be used as
+primitives by other scripts. Most of these commands are fairly thin wrappers
+around wapi calls.
+
+By default, all filesystem-manipulation commands look in ~/.tahoe/ to figure
+out which Tahoe node they should use. When the CLI command uses wapi calls,
+it will use ~/.tahoe/node.url for this purpose: a running Tahoe node that
+provides a wapi port will write its URL into this file. If you want to use
+a node on some other host, just create ~/.tahoe/ and copy that node's wapi
+URL into this file, and the CLI commands will contact that node instead of a
+local one.
+
+These commands also use a table of "aliases" to figure out which directory
+they ought to use a starting point. This is explained in more detail below.
+
+As of Tahoe v1.7, passing non-ASCII characters to the CLI should work,
+except on Windows. The command-line arguments are assumed to use the
+character encoding specified by the current locale.
+
+Starting Directories
+--------------------
+
+As described in architecture.txt, the Tahoe distributed filesystem consists
+of a collection of directories and files, each of which has a "read-cap" or a
+"write-cap" (also known as a URI). Each directory is simply a table that maps
+a name to a child file or directory, and this table is turned into a string
+and stored in a mutable file. The whole set of directory and file "nodes" are
+connected together into a directed graph.
+
+To use this collection of files and directories, you need to choose a
+starting point: some specific directory that we will refer to as a
+"starting directory". For a given starting directory, the "``ls
+[STARTING_DIR]:``" command would list the contents of this directory,
+the "``ls [STARTING_DIR]:dir1``" command would look inside this directory
+for a child named "dir1" and list its contents, "``ls
+[STARTING_DIR]:dir1/subdir2``" would look two levels deep, etc.
+
+Note that there is no real global "root" directory, but instead each
+starting directory provides a different, possibly overlapping
+perspective on the graph of files and directories.
+
+Each tahoe node remembers a list of starting points, named "aliases",
+in a file named ~/.tahoe/private/aliases . These aliases are short UTF-8
+encoded strings that stand in for a directory read- or write- cap. If
+you use the command line "``ls``" without any "[STARTING_DIR]:" argument,
+then it will use the default alias, which is "tahoe", therefore "``tahoe
+ls``" has the same effect as "``tahoe ls tahoe:``". The same goes for the
+other commands which can reasonably use a default alias: get, put,
+mkdir, mv, and rm.
+
+For backwards compatibility with Tahoe-1.0, if the "tahoe": alias is not
+found in ~/.tahoe/private/aliases, the CLI will use the contents of
+~/.tahoe/private/root_dir.cap instead. Tahoe-1.0 had only a single starting
+point, and stored it in this root_dir.cap file, so Tahoe-1.1 will use it if
+necessary. However, once you've set a "tahoe:" alias with "``tahoe set-alias``",
+that will override anything in the old root_dir.cap file.
+
+The Tahoe CLI commands use the same filename syntax as scp and rsync
+-- an optional "alias:" prefix, followed by the pathname or filename.
+Some commands (like "tahoe cp") use the lack of an alias to mean that
+you want to refer to a local file, instead of something from the tahoe
+virtual filesystem. [TODO] Another way to indicate this is to start
+the pathname with a dot, slash, or tilde.
+
+When you're dealing a single starting directory, the "tahoe:" alias is
+all you need. But when you want to refer to something that isn't yet
+attached to the graph rooted at that starting directory, you need to
+refer to it by its capability. The way to do that is either to use its
+capability directory as an argument on the command line, or to add an
+alias to it, with the "tahoe add-alias" command. Once you've added an
+alias, you can use that alias as an argument to commands.
+
+The best way to get started with Tahoe is to create a node, start it, then
+use the following command to create a new directory and set it as your
+"tahoe:" alias::
+
+ tahoe create-alias tahoe
+
+After that you can use "``tahoe ls tahoe:``" and
+"``tahoe cp local.txt tahoe:``", and both will refer to the directory that
+you've just created.
+
+SECURITY NOTE: For users of shared systems
+``````````````````````````````````````````
+
+Another way to achieve the same effect as the above "tahoe create-alias"
+command is::
+
+ tahoe add-alias tahoe `tahoe mkdir`
+
+However, command-line arguments are visible to other users (through the
+'ps' command, or the Windows Process Explorer tool), so if you are using a
+tahoe node on a shared host, your login neighbors will be able to see (and
+capture) any directory caps that you set up with the "``tahoe add-alias``"
+command.
+
+The "``tahoe create-alias``" command avoids this problem by creating a new
+directory and putting the cap into your aliases file for you. Alternatively,
+you can edit the NODEDIR/private/aliases file directly, by adding a line like
+this::
+
+ fun: URI:DIR2:ovjy4yhylqlfoqg2vcze36dhde:4d4f47qko2xm5g7osgo2yyidi5m4muyo2vjjy53q4vjju2u55mfa
+
+By entering the dircap through the editor, the command-line arguments are
+bypassed, and other users will not be able to see them. Once you've added the
+alias, no other secrets are passed through the command line, so this
+vulnerability becomes less significant: they can still see your filenames and
+other arguments you type there, but not the caps that Tahoe uses to permit
+access to your files and directories.
+
+
+Command Syntax Summary
+----------------------
+
+tahoe add-alias alias cap
+
+tahoe create-alias alias
+
+tahoe list-aliases
+
+tahoe mkdir
+
+tahoe mkdir [alias:]path
+
+tahoe ls [alias:][path]
+
+tahoe webopen [alias:][path]
+
+tahoe put [--mutable] [localfrom:-]
+
+tahoe put [--mutable] [localfrom:-] [alias:]to
+
+tahoe put [--mutable] [localfrom:-] [alias:]subdir/to
+
+tahoe put [--mutable] [localfrom:-] dircap:to
+
+tahoe put [--mutable] [localfrom:-] dircap:./subdir/to
+
+tahoe put [localfrom:-] mutable-file-writecap
+
+tahoe get [alias:]from [localto:-]
+
+tahoe cp [-r] [alias:]frompath [alias:]topath
+
+tahoe rm [alias:]what
+
+tahoe mv [alias:]from [alias:]to
+
+tahoe ln [alias:]from [alias:]to
+
+tahoe backup localfrom [alias:]to
+
+Command Examples
+----------------
+
+``tahoe mkdir``
+
+ This creates a new empty unlinked directory, and prints its write-cap to
+ stdout. The new directory is not attached to anything else.
+
+``tahoe add-alias fun DIRCAP``
+
+ An example would be::
+
+ tahoe add-alias fun URI:DIR2:ovjy4yhylqlfoqg2vcze36dhde:4d4f47qko2xm5g7osgo2yyidi5m4muyo2vjjy53q4vjju2u55mfa
+
+ This creates an alias "fun:" and configures it to use the given directory
+ cap. Once this is done, "tahoe ls fun:" will list the contents of this
+ directory. Use "tahoe add-alias tahoe DIRCAP" to set the contents of the
+ default "tahoe:" alias.
+
+``tahoe create-alias fun``
+
+ This combines "``tahoe mkdir``" and "``tahoe add-alias``" into a single step.
+
+``tahoe list-aliases``
+
+ This displays a table of all configured aliases.
+
+``tahoe mkdir subdir``
+
+``tahoe mkdir /subdir``
+
+ This both create a new empty directory and attaches it to your root with the
+ name "subdir".
+
+``tahoe ls``
+
+``tahoe ls /``
+
+``tahoe ls tahoe:``
+
+``tahoe ls tahoe:/``
+
+ All four list the root directory of your personal virtual filesystem.
+
+``tahoe ls subdir``
+
+ This lists a subdirectory of your filesystem.
+
+``tahoe webopen``
+
+``tahoe webopen tahoe:``
+
+``tahoe webopen tahoe:subdir/``
+
+``tahoe webopen subdir/``
+
+ This uses the python 'webbrowser' module to cause a local web browser to
+ open to the web page for the given directory. This page offers interfaces to
+ add, dowlonad, rename, and delete files in the directory. If not given an
+ alias or path, opens "tahoe:", the root dir of the default alias.
+
+``tahoe put file.txt``
+
+``tahoe put ./file.txt``
+
+``tahoe put /tmp/file.txt``
+
+``tahoe put ~/file.txt``
+
+ These upload the local file into the grid, and prints the new read-cap to
+ stdout. The uploaded file is not attached to any directory. All one-argument
+ forms of "``tahoe put``" perform an unlinked upload.
+
+``tahoe put -``
+
+``tahoe put``
+
+ These also perform an unlinked upload, but the data to be uploaded is taken
+ from stdin.
+
+``tahoe put file.txt uploaded.txt``
+
+``tahoe put file.txt tahoe:uploaded.txt``
+
+ These upload the local file and add it to your root with the name
+ "uploaded.txt"
+
+``tahoe put file.txt subdir/foo.txt``
+
+``tahoe put - subdir/foo.txt``
+
+``tahoe put file.txt tahoe:subdir/foo.txt``
+
+``tahoe put file.txt DIRCAP:./foo.txt``
+
+``tahoe put file.txt DIRCAP:./subdir/foo.txt``
+
+ These upload the named file and attach them to a subdirectory of the given
+ root directory, under the name "foo.txt". Note that to use a directory
+ write-cap instead of an alias, you must use ":./" as a separator, rather
+ than ":", to help the CLI parser figure out where the dircap ends. When the
+ source file is named "-", the contents are taken from stdin.
+
+``tahoe put file.txt --mutable``
+
+ Create a new mutable file, fill it with the contents of file.txt, and print
+ the new write-cap to stdout.
+
+``tahoe put file.txt MUTABLE-FILE-WRITECAP``
+
+ Replace the contents of the given mutable file with the contents of file.txt
+ and prints the same write-cap to stdout.
+
+``tahoe cp file.txt tahoe:uploaded.txt``
+
+``tahoe cp file.txt tahoe:``
+
+``tahoe cp file.txt tahoe:/``
+
+``tahoe cp ./file.txt tahoe:``
+
+ These upload the local file and add it to your root with the name
+ "uploaded.txt".
+
+``tahoe cp tahoe:uploaded.txt downloaded.txt``
+
+``tahoe cp tahoe:uploaded.txt ./downloaded.txt``
+
+``tahoe cp tahoe:uploaded.txt /tmp/downloaded.txt``
+
+``tahoe cp tahoe:uploaded.txt ~/downloaded.txt``
+
+ This downloads the named file from your tahoe root, and puts the result on
+ your local filesystem.
+
+``tahoe cp tahoe:uploaded.txt fun:stuff.txt``
+
+ This copies a file from your tahoe root to a different virtual directory,
+ set up earlier with "tahoe add-alias fun DIRCAP".
+
+``tahoe rm uploaded.txt``
+
+``tahoe rm tahoe:uploaded.txt``
+
+ This deletes a file from your tahoe root.
+
+``tahoe mv uploaded.txt renamed.txt``
+
+``tahoe mv tahoe:uploaded.txt tahoe:renamed.txt``
+
+ These rename a file within your tahoe root directory.
+
+``tahoe mv uploaded.txt fun:``
+
+``tahoe mv tahoe:uploaded.txt fun:``
+
+``tahoe mv tahoe:uploaded.txt fun:uploaded.txt``
+
+ These move a file from your tahoe root directory to the virtual directory
+ set up earlier with "tahoe add-alias fun DIRCAP"
+
+``tahoe backup ~ work:backups``
+
+ This command performs a full versioned backup of every file and directory
+ underneath your "~" home directory, placing an immutable timestamped
+ snapshot in e.g. work:backups/Archives/2009-02-06_04:00:05Z/ (note that the
+ timestamp is in UTC, hence the "Z" suffix), and a link to the latest
+ snapshot in work:backups/Latest/ . This command uses a small SQLite database
+ known as the "backupdb", stored in ~/.tahoe/private/backupdb.sqlite, to
+ remember which local files have been backed up already, and will avoid
+ uploading files that have already been backed up. It compares timestamps and
+ filesizes when making this comparison. It also re-uses existing directories
+ which have identical contents. This lets it run faster and reduces the
+ number of directories created.
+
+ If you reconfigure your client node to switch to a different grid, you
+ should delete the stale backupdb.sqlite file, to force "tahoe backup" to
+ upload all files to the new grid.
+
+``tahoe backup --exclude=*~ ~ work:backups``
+
+ Same as above, but this time the backup process will ignore any
+ filename that will end with '~'. '--exclude' will accept any standard
+ unix shell-style wildcards, have a look at
+ http://docs.python.org/library/fnmatch.html for a more detailed
+ reference. You may give multiple '--exclude' options. Please pay
+ attention that the pattern will be matched against any level of the
+ directory tree, it's still impossible to specify absolute path exclusions.
+
+``tahoe backup --exclude-from=/path/to/filename ~ work:backups``
+
+ '--exclude-from' is similar to '--exclude', but reads exclusion
+ patterns from '/path/to/filename', one per line.
+
+``tahoe backup --exclude-vcs ~ work:backups``
+
+ This command will ignore any known file or directory that's used by
+ version control systems to store metadata. The excluded names are:
+
+ * CVS
+ * RCS
+ * SCCS
+ * .git
+ * .gitignore
+ * .cvsignore
+ * .svn
+ * .arch-ids
+ * {arch}
+ * =RELEASE-ID
+ * =meta-update
+ * =update
+ * .bzr
+ * .bzrignore
+ * .bzrtags
+ * .hg
+ * .hgignore
+ * _darcs
+
+Storage Grid Maintenance
+========================
+
+``tahoe manifest tahoe:``
+
+``tahoe manifest --storage-index tahoe:``
+
+``tahoe manifest --verify-cap tahoe:``
+
+``tahoe manifest --repair-cap tahoe:``
+
+``tahoe manifest --raw tahoe:``
+
+ This performs a recursive walk of the given directory, visiting every file
+ and directory that can be reached from that point. It then emits one line to
+ stdout for each object it encounters.
+
+ The default behavior is to print the access cap string (like URI:CHK:.. or
+ URI:DIR2:..), followed by a space, followed by the full path name.
+
+ If --storage-index is added, each line will instead contain the object's
+ storage index. This (string) value is useful to determine which share files
+ (on the server) are associated with this directory tree. The --verify-cap
+ and --repair-cap options are similar, but emit a verify-cap and repair-cap,
+ respectively. If --raw is provided instead, the output will be a
+ JSON-encoded dictionary that includes keys for pathnames, storage index
+ strings, and cap strings. The last line of the --raw output will be a JSON
+ encoded deep-stats dictionary.
+
+``tahoe stats tahoe:``
+
+ This performs a recursive walk of the given directory, visiting every file
+ and directory that can be reached from that point. It gathers statistics on
+ the sizes of the objects it encounters, and prints a summary to stdout.
+
+
+Debugging
+=========
+
+For a list of all debugging commands, use "tahoe debug".
+
+"``tahoe debug find-shares STORAGEINDEX NODEDIRS..``" will look through one or
+more storage nodes for the share files that are providing storage for the
+given storage index.
+
+"``tahoe debug catalog-shares NODEDIRS..``" will look through one or more
+storage nodes and locate every single share they contain. It produces a report
+on stdout with one line per share, describing what kind of share it is, the
+storage index, the size of the file is used for, etc. It may be useful to
+concatenate these reports from all storage hosts and use it to look for
+anomalies.
+
+"``tahoe debug dump-share SHAREFILE``" will take the name of a single share file
+(as found by "tahoe find-shares") and print a summary of its contents to
+stdout. This includes a list of leases, summaries of the hash tree, and
+information from the UEB (URI Extension Block). For mutable file shares, it
+will describe which version (seqnum and root-hash) is being stored in this
+share.
+
+"``tahoe debug dump-cap CAP``" will take a URI (a file read-cap, or a directory
+read- or write- cap) and unpack it into separate pieces. The most useful
+aspect of this command is to reveal the storage index for any given URI. This
+can be used to locate the share files that are holding the encoded+encrypted
+data for this file.
+
+"``tahoe debug repl``" will launch an interactive python interpreter in which
+the Tahoe packages and modules are available on sys.path (e.g. by using 'import
+allmydata'). This is most useful from a source tree: it simply sets the
+PYTHONPATH correctly and runs the 'python' executable.
+
+"``tahoe debug corrupt-share SHAREFILE``" will flip a bit in the given
+sharefile. This can be used to test the client-side verification/repair code.
+Obviously, this command should not be used during normal operation.
+++ /dev/null
-======================
-The Tahoe CLI commands
-======================
-
-1. `Overview`_
-2. `CLI Command Overview`_
-3. `Node Management`_
-4. `Filesystem Manipulation`_
-
- 1. `Starting Directories`_
- 2. `Command Syntax Summary`_
- 3. `Command Examples`_
-
-5. `Storage Grid Maintenance`_
-6. `Debugging`_
-
-
-Overview
-========
-
-Tahoe provides a single executable named "``tahoe``", which can be used to
-create and manage client/server nodes, manipulate the filesystem, and perform
-several debugging/maintenance tasks.
-
-This executable lives in the source tree at "``bin/tahoe``". Once you've done a
-build (by running "make"), ``bin/tahoe`` can be run in-place: if it discovers
-that it is being run from within a Tahoe source tree, it will modify sys.path
-as necessary to use all the source code and dependent libraries contained in
-that tree.
-
-If you've installed Tahoe (using "``make install``", or by installing a binary
-package), then the tahoe executable will be available somewhere else, perhaps
-in ``/usr/bin/tahoe``. In this case, it will use your platform's normal
-PYTHONPATH search paths to find the tahoe code and other libraries.
-
-
-CLI Command Overview
-====================
-
-The "``tahoe``" tool provides access to three categories of commands.
-
-* node management: create a client/server node, start/stop/restart it
-* filesystem manipulation: list files, upload, download, delete, rename
-* debugging: unpack cap-strings, examine share files
-
-To get a list of all commands, just run "``tahoe``" with no additional
-arguments. "``tahoe --help``" might also provide something useful.
-
-Running "``tahoe --version``" will display a list of version strings, starting
-with the "allmydata" module (which contains the majority of the Tahoe
-functionality) and including versions for a number of dependent libraries,
-like Twisted, Foolscap, pycryptopp, and zfec.
-
-
-Node Management
-===============
-
-"``tahoe create-node [NODEDIR]``" is the basic make-a-new-node command. It
-creates a new directory and populates it with files that will allow the
-"``tahoe start``" command to use it later on. This command creates nodes that
-have client functionality (upload/download files), web API services
-(controlled by the 'webport' file), and storage services (unless
-"--no-storage" is specified).
-
-NODEDIR defaults to ~/.tahoe/ , and newly-created nodes default to
-publishing a web server on port 3456 (limited to the loopback interface, at
-127.0.0.1, to restrict access to other programs on the same host). All of the
-other "``tahoe``" subcommands use corresponding defaults.
-
-"``tahoe create-client [NODEDIR]``" creates a node with no storage service.
-That is, it behaves like "``tahoe create-node --no-storage [NODEDIR]``".
-(This is a change from versions prior to 1.6.0.)
-
-"``tahoe create-introducer [NODEDIR]``" is used to create the Introducer node.
-This node provides introduction services and nothing else. When started, this
-node will produce an introducer.furl, which should be published to all
-clients.
-
-"``tahoe create-key-generator [NODEDIR]``" is used to create a special
-"key-generation" service, which allows a client to offload their RSA key
-generation to a separate process. Since RSA key generation takes several
-seconds, and must be done each time a directory is created, moving it to a
-separate process allows the first process (perhaps a busy wapi server) to
-continue servicing other requests. The key generator exports a FURL that can
-be copied into a node to enable this functionality.
-
-"``tahoe run [NODEDIR]``" will start a previously-created node in the foreground.
-
-"``tahoe start [NODEDIR]``" will launch a previously-created node. It will launch
-the node into the background, using the standard Twisted "twistd"
-daemon-launching tool. On some platforms (including Windows) this command is
-unable to run a daemon in the background; in that case it behaves in the
-same way as "``tahoe run``".
-
-"``tahoe stop [NODEDIR]``" will shut down a running node.
-
-"``tahoe restart [NODEDIR]``" will stop and then restart a running node. This is
-most often used by developers who have just modified the code and want to
-start using their changes.
-
-
-Filesystem Manipulation
-=======================
-
-These commands let you exmaine a Tahoe filesystem, providing basic
-list/upload/download/delete/rename/mkdir functionality. They can be used as
-primitives by other scripts. Most of these commands are fairly thin wrappers
-around wapi calls.
-
-By default, all filesystem-manipulation commands look in ~/.tahoe/ to figure
-out which Tahoe node they should use. When the CLI command uses wapi calls,
-it will use ~/.tahoe/node.url for this purpose: a running Tahoe node that
-provides a wapi port will write its URL into this file. If you want to use
-a node on some other host, just create ~/.tahoe/ and copy that node's wapi
-URL into this file, and the CLI commands will contact that node instead of a
-local one.
-
-These commands also use a table of "aliases" to figure out which directory
-they ought to use a starting point. This is explained in more detail below.
-
-As of Tahoe v1.7, passing non-ASCII characters to the CLI should work,
-except on Windows. The command-line arguments are assumed to use the
-character encoding specified by the current locale.
-
-Starting Directories
---------------------
-
-As described in architecture.txt, the Tahoe distributed filesystem consists
-of a collection of directories and files, each of which has a "read-cap" or a
-"write-cap" (also known as a URI). Each directory is simply a table that maps
-a name to a child file or directory, and this table is turned into a string
-and stored in a mutable file. The whole set of directory and file "nodes" are
-connected together into a directed graph.
-
-To use this collection of files and directories, you need to choose a
-starting point: some specific directory that we will refer to as a
-"starting directory". For a given starting directory, the "``ls
-[STARTING_DIR]:``" command would list the contents of this directory,
-the "``ls [STARTING_DIR]:dir1``" command would look inside this directory
-for a child named "dir1" and list its contents, "``ls
-[STARTING_DIR]:dir1/subdir2``" would look two levels deep, etc.
-
-Note that there is no real global "root" directory, but instead each
-starting directory provides a different, possibly overlapping
-perspective on the graph of files and directories.
-
-Each tahoe node remembers a list of starting points, named "aliases",
-in a file named ~/.tahoe/private/aliases . These aliases are short UTF-8
-encoded strings that stand in for a directory read- or write- cap. If
-you use the command line "``ls``" without any "[STARTING_DIR]:" argument,
-then it will use the default alias, which is "tahoe", therefore "``tahoe
-ls``" has the same effect as "``tahoe ls tahoe:``". The same goes for the
-other commands which can reasonably use a default alias: get, put,
-mkdir, mv, and rm.
-
-For backwards compatibility with Tahoe-1.0, if the "tahoe": alias is not
-found in ~/.tahoe/private/aliases, the CLI will use the contents of
-~/.tahoe/private/root_dir.cap instead. Tahoe-1.0 had only a single starting
-point, and stored it in this root_dir.cap file, so Tahoe-1.1 will use it if
-necessary. However, once you've set a "tahoe:" alias with "``tahoe set-alias``",
-that will override anything in the old root_dir.cap file.
-
-The Tahoe CLI commands use the same filename syntax as scp and rsync
--- an optional "alias:" prefix, followed by the pathname or filename.
-Some commands (like "tahoe cp") use the lack of an alias to mean that
-you want to refer to a local file, instead of something from the tahoe
-virtual filesystem. [TODO] Another way to indicate this is to start
-the pathname with a dot, slash, or tilde.
-
-When you're dealing a single starting directory, the "tahoe:" alias is
-all you need. But when you want to refer to something that isn't yet
-attached to the graph rooted at that starting directory, you need to
-refer to it by its capability. The way to do that is either to use its
-capability directory as an argument on the command line, or to add an
-alias to it, with the "tahoe add-alias" command. Once you've added an
-alias, you can use that alias as an argument to commands.
-
-The best way to get started with Tahoe is to create a node, start it, then
-use the following command to create a new directory and set it as your
-"tahoe:" alias::
-
- tahoe create-alias tahoe
-
-After that you can use "``tahoe ls tahoe:``" and
-"``tahoe cp local.txt tahoe:``", and both will refer to the directory that
-you've just created.
-
-SECURITY NOTE: For users of shared systems
-``````````````````````````````````````````
-
-Another way to achieve the same effect as the above "tahoe create-alias"
-command is::
-
- tahoe add-alias tahoe `tahoe mkdir`
-
-However, command-line arguments are visible to other users (through the
-'ps' command, or the Windows Process Explorer tool), so if you are using a
-tahoe node on a shared host, your login neighbors will be able to see (and
-capture) any directory caps that you set up with the "``tahoe add-alias``"
-command.
-
-The "``tahoe create-alias``" command avoids this problem by creating a new
-directory and putting the cap into your aliases file for you. Alternatively,
-you can edit the NODEDIR/private/aliases file directly, by adding a line like
-this::
-
- fun: URI:DIR2:ovjy4yhylqlfoqg2vcze36dhde:4d4f47qko2xm5g7osgo2yyidi5m4muyo2vjjy53q4vjju2u55mfa
-
-By entering the dircap through the editor, the command-line arguments are
-bypassed, and other users will not be able to see them. Once you've added the
-alias, no other secrets are passed through the command line, so this
-vulnerability becomes less significant: they can still see your filenames and
-other arguments you type there, but not the caps that Tahoe uses to permit
-access to your files and directories.
-
-
-Command Syntax Summary
-----------------------
-
-tahoe add-alias alias cap
-
-tahoe create-alias alias
-
-tahoe list-aliases
-
-tahoe mkdir
-
-tahoe mkdir [alias:]path
-
-tahoe ls [alias:][path]
-
-tahoe webopen [alias:][path]
-
-tahoe put [--mutable] [localfrom:-]
-
-tahoe put [--mutable] [localfrom:-] [alias:]to
-
-tahoe put [--mutable] [localfrom:-] [alias:]subdir/to
-
-tahoe put [--mutable] [localfrom:-] dircap:to
-
-tahoe put [--mutable] [localfrom:-] dircap:./subdir/to
-
-tahoe put [localfrom:-] mutable-file-writecap
-
-tahoe get [alias:]from [localto:-]
-
-tahoe cp [-r] [alias:]frompath [alias:]topath
-
-tahoe rm [alias:]what
-
-tahoe mv [alias:]from [alias:]to
-
-tahoe ln [alias:]from [alias:]to
-
-tahoe backup localfrom [alias:]to
-
-Command Examples
-----------------
-
-``tahoe mkdir``
-
- This creates a new empty unlinked directory, and prints its write-cap to
- stdout. The new directory is not attached to anything else.
-
-``tahoe add-alias fun DIRCAP``
-
- An example would be::
-
- tahoe add-alias fun URI:DIR2:ovjy4yhylqlfoqg2vcze36dhde:4d4f47qko2xm5g7osgo2yyidi5m4muyo2vjjy53q4vjju2u55mfa
-
- This creates an alias "fun:" and configures it to use the given directory
- cap. Once this is done, "tahoe ls fun:" will list the contents of this
- directory. Use "tahoe add-alias tahoe DIRCAP" to set the contents of the
- default "tahoe:" alias.
-
-``tahoe create-alias fun``
-
- This combines "``tahoe mkdir``" and "``tahoe add-alias``" into a single step.
-
-``tahoe list-aliases``
-
- This displays a table of all configured aliases.
-
-``tahoe mkdir subdir``
-
-``tahoe mkdir /subdir``
-
- This both create a new empty directory and attaches it to your root with the
- name "subdir".
-
-``tahoe ls``
-
-``tahoe ls /``
-
-``tahoe ls tahoe:``
-
-``tahoe ls tahoe:/``
-
- All four list the root directory of your personal virtual filesystem.
-
-``tahoe ls subdir``
-
- This lists a subdirectory of your filesystem.
-
-``tahoe webopen``
-
-``tahoe webopen tahoe:``
-
-``tahoe webopen tahoe:subdir/``
-
-``tahoe webopen subdir/``
-
- This uses the python 'webbrowser' module to cause a local web browser to
- open to the web page for the given directory. This page offers interfaces to
- add, dowlonad, rename, and delete files in the directory. If not given an
- alias or path, opens "tahoe:", the root dir of the default alias.
-
-``tahoe put file.txt``
-
-``tahoe put ./file.txt``
-
-``tahoe put /tmp/file.txt``
-
-``tahoe put ~/file.txt``
-
- These upload the local file into the grid, and prints the new read-cap to
- stdout. The uploaded file is not attached to any directory. All one-argument
- forms of "``tahoe put``" perform an unlinked upload.
-
-``tahoe put -``
-
-``tahoe put``
-
- These also perform an unlinked upload, but the data to be uploaded is taken
- from stdin.
-
-``tahoe put file.txt uploaded.txt``
-
-``tahoe put file.txt tahoe:uploaded.txt``
-
- These upload the local file and add it to your root with the name
- "uploaded.txt"
-
-``tahoe put file.txt subdir/foo.txt``
-
-``tahoe put - subdir/foo.txt``
-
-``tahoe put file.txt tahoe:subdir/foo.txt``
-
-``tahoe put file.txt DIRCAP:./foo.txt``
-
-``tahoe put file.txt DIRCAP:./subdir/foo.txt``
-
- These upload the named file and attach them to a subdirectory of the given
- root directory, under the name "foo.txt". Note that to use a directory
- write-cap instead of an alias, you must use ":./" as a separator, rather
- than ":", to help the CLI parser figure out where the dircap ends. When the
- source file is named "-", the contents are taken from stdin.
-
-``tahoe put file.txt --mutable``
-
- Create a new mutable file, fill it with the contents of file.txt, and print
- the new write-cap to stdout.
-
-``tahoe put file.txt MUTABLE-FILE-WRITECAP``
-
- Replace the contents of the given mutable file with the contents of file.txt
- and prints the same write-cap to stdout.
-
-``tahoe cp file.txt tahoe:uploaded.txt``
-
-``tahoe cp file.txt tahoe:``
-
-``tahoe cp file.txt tahoe:/``
-
-``tahoe cp ./file.txt tahoe:``
-
- These upload the local file and add it to your root with the name
- "uploaded.txt".
-
-``tahoe cp tahoe:uploaded.txt downloaded.txt``
-
-``tahoe cp tahoe:uploaded.txt ./downloaded.txt``
-
-``tahoe cp tahoe:uploaded.txt /tmp/downloaded.txt``
-
-``tahoe cp tahoe:uploaded.txt ~/downloaded.txt``
-
- This downloads the named file from your tahoe root, and puts the result on
- your local filesystem.
-
-``tahoe cp tahoe:uploaded.txt fun:stuff.txt``
-
- This copies a file from your tahoe root to a different virtual directory,
- set up earlier with "tahoe add-alias fun DIRCAP".
-
-``tahoe rm uploaded.txt``
-
-``tahoe rm tahoe:uploaded.txt``
-
- This deletes a file from your tahoe root.
-
-``tahoe mv uploaded.txt renamed.txt``
-
-``tahoe mv tahoe:uploaded.txt tahoe:renamed.txt``
-
- These rename a file within your tahoe root directory.
-
-``tahoe mv uploaded.txt fun:``
-
-``tahoe mv tahoe:uploaded.txt fun:``
-
-``tahoe mv tahoe:uploaded.txt fun:uploaded.txt``
-
- These move a file from your tahoe root directory to the virtual directory
- set up earlier with "tahoe add-alias fun DIRCAP"
-
-``tahoe backup ~ work:backups``
-
- This command performs a full versioned backup of every file and directory
- underneath your "~" home directory, placing an immutable timestamped
- snapshot in e.g. work:backups/Archives/2009-02-06_04:00:05Z/ (note that the
- timestamp is in UTC, hence the "Z" suffix), and a link to the latest
- snapshot in work:backups/Latest/ . This command uses a small SQLite database
- known as the "backupdb", stored in ~/.tahoe/private/backupdb.sqlite, to
- remember which local files have been backed up already, and will avoid
- uploading files that have already been backed up. It compares timestamps and
- filesizes when making this comparison. It also re-uses existing directories
- which have identical contents. This lets it run faster and reduces the
- number of directories created.
-
- If you reconfigure your client node to switch to a different grid, you
- should delete the stale backupdb.sqlite file, to force "tahoe backup" to
- upload all files to the new grid.
-
-``tahoe backup --exclude=*~ ~ work:backups``
-
- Same as above, but this time the backup process will ignore any
- filename that will end with '~'. '--exclude' will accept any standard
- unix shell-style wildcards, have a look at
- http://docs.python.org/library/fnmatch.html for a more detailed
- reference. You may give multiple '--exclude' options. Please pay
- attention that the pattern will be matched against any level of the
- directory tree, it's still impossible to specify absolute path exclusions.
-
-``tahoe backup --exclude-from=/path/to/filename ~ work:backups``
-
- '--exclude-from' is similar to '--exclude', but reads exclusion
- patterns from '/path/to/filename', one per line.
-
-``tahoe backup --exclude-vcs ~ work:backups``
-
- This command will ignore any known file or directory that's used by
- version control systems to store metadata. The excluded names are:
-
- * CVS
- * RCS
- * SCCS
- * .git
- * .gitignore
- * .cvsignore
- * .svn
- * .arch-ids
- * {arch}
- * =RELEASE-ID
- * =meta-update
- * =update
- * .bzr
- * .bzrignore
- * .bzrtags
- * .hg
- * .hgignore
- * _darcs
-
-Storage Grid Maintenance
-========================
-
-``tahoe manifest tahoe:``
-
-``tahoe manifest --storage-index tahoe:``
-
-``tahoe manifest --verify-cap tahoe:``
-
-``tahoe manifest --repair-cap tahoe:``
-
-``tahoe manifest --raw tahoe:``
-
- This performs a recursive walk of the given directory, visiting every file
- and directory that can be reached from that point. It then emits one line to
- stdout for each object it encounters.
-
- The default behavior is to print the access cap string (like URI:CHK:.. or
- URI:DIR2:..), followed by a space, followed by the full path name.
-
- If --storage-index is added, each line will instead contain the object's
- storage index. This (string) value is useful to determine which share files
- (on the server) are associated with this directory tree. The --verify-cap
- and --repair-cap options are similar, but emit a verify-cap and repair-cap,
- respectively. If --raw is provided instead, the output will be a
- JSON-encoded dictionary that includes keys for pathnames, storage index
- strings, and cap strings. The last line of the --raw output will be a JSON
- encoded deep-stats dictionary.
-
-``tahoe stats tahoe:``
-
- This performs a recursive walk of the given directory, visiting every file
- and directory that can be reached from that point. It gathers statistics on
- the sizes of the objects it encounters, and prints a summary to stdout.
-
-
-Debugging
-=========
-
-For a list of all debugging commands, use "tahoe debug".
-
-"``tahoe debug find-shares STORAGEINDEX NODEDIRS..``" will look through one or
-more storage nodes for the share files that are providing storage for the
-given storage index.
-
-"``tahoe debug catalog-shares NODEDIRS..``" will look through one or more
-storage nodes and locate every single share they contain. It produces a report
-on stdout with one line per share, describing what kind of share it is, the
-storage index, the size of the file is used for, etc. It may be useful to
-concatenate these reports from all storage hosts and use it to look for
-anomalies.
-
-"``tahoe debug dump-share SHAREFILE``" will take the name of a single share file
-(as found by "tahoe find-shares") and print a summary of its contents to
-stdout. This includes a list of leases, summaries of the hash tree, and
-information from the UEB (URI Extension Block). For mutable file shares, it
-will describe which version (seqnum and root-hash) is being stored in this
-share.
-
-"``tahoe debug dump-cap CAP``" will take a URI (a file read-cap, or a directory
-read- or write- cap) and unpack it into separate pieces. The most useful
-aspect of this command is to reveal the storage index for any given URI. This
-can be used to locate the share files that are holding the encoded+encrypted
-data for this file.
-
-"``tahoe debug repl``" will launch an interactive python interpreter in which
-the Tahoe packages and modules are available on sys.path (e.g. by using 'import
-allmydata'). This is most useful from a source tree: it simply sets the
-PYTHONPATH correctly and runs the 'python' executable.
-
-"``tahoe debug corrupt-share SHAREFILE``" will flip a bit in the given
-sharefile. This can be used to test the client-side verification/repair code.
-Obviously, this command should not be used during normal operation.
--- /dev/null
+=================================
+Tahoe-LAFS FTP and SFTP Frontends
+=================================
+
+1. `FTP/SFTP Background`_
+2. `Tahoe-LAFS Support`_
+3. `Creating an Account File`_
+4. `Configuring FTP Access`_
+5. `Configuring SFTP Access`_
+6. `Dependencies`_
+7. `Immutable and mutable files`_
+8. `Known Issues`_
+
+
+FTP/SFTP Background
+===================
+
+FTP is the venerable internet file-transfer protocol, first developed in
+1971. The FTP server usually listens on port 21. A separate connection is
+used for the actual data transfers, either in the same direction as the
+initial client-to-server connection (for PORT mode), or in the reverse
+direction (for PASV) mode. Connections are unencrypted, so passwords, file
+names, and file contents are visible to eavesdroppers.
+
+SFTP is the modern replacement, developed as part of the SSH "secure shell"
+protocol, and runs as a subchannel of the regular SSH connection. The SSH
+server usually listens on port 22. All connections are encrypted.
+
+Both FTP and SFTP were developed assuming a UNIX-like server, with accounts
+and passwords, octal file modes (user/group/other, read/write/execute), and
+ctime/mtime timestamps.
+
+Tahoe-LAFS Support
+==================
+
+All Tahoe-LAFS client nodes can run a frontend FTP server, allowing regular FTP
+clients (like /usr/bin/ftp, ncftp, and countless others) to access the
+virtual filesystem. They can also run an SFTP server, so SFTP clients (like
+/usr/bin/sftp, the sshfs FUSE plugin, and others) can too. These frontends
+sit at the same level as the webapi interface.
+
+Since Tahoe-LAFS does not use user accounts or passwords, the FTP/SFTP servers
+must be configured with a way to first authenticate a user (confirm that a
+prospective client has a legitimate claim to whatever authorities we might
+grant a particular user), and second to decide what root directory cap should
+be granted to the authenticated username. A username and password is used
+for this purpose. (The SFTP protocol is also capable of using client
+RSA or DSA public keys, but this is not currently implemented.)
+
+Tahoe-LAFS provides two mechanisms to perform this user-to-rootcap mapping. The
+first is a simple flat file with one account per line. The second is an
+HTTP-based login mechanism, backed by simple PHP script and a database. The
+latter form is used by allmydata.com to provide secure access to customer
+rootcaps.
+
+Creating an Account File
+========================
+
+To use the first form, create a file (probably in
+BASEDIR/private/ftp.accounts) in which each non-comment/non-blank line is a
+space-separated line of (USERNAME, PASSWORD, ROOTCAP), like so::
+
+ % cat BASEDIR/private/ftp.accounts
+ # This is a password line, (username, password, rootcap)
+ alice password URI:DIR2:ioej8xmzrwilg772gzj4fhdg7a:wtiizszzz2rgmczv4wl6bqvbv33ag4kvbr6prz3u6w3geixa6m6a
+ bob sekrit URI:DIR2:6bdmeitystckbl9yqlw7g56f4e:serp5ioqxnh34mlbmzwvkp3odehsyrr7eytt5f64we3k9hhcrcja
+
+Future versions of Tahoe-LAFS may support using client public keys for SFTP.
+The words "ssh-rsa" and "ssh-dsa" after the username are reserved to specify
+the public key format, so users cannot have a password equal to either of
+these strings.
+
+Now add an 'accounts.file' directive to your tahoe.cfg file, as described
+in the next sections.
+
+Configuring FTP Access
+======================
+
+To enable the FTP server with an accounts file, add the following lines to
+the BASEDIR/tahoe.cfg file::
+
+ [ftpd]
+ enabled = true
+ port = tcp:8021:interface=127.0.0.1
+ accounts.file = private/ftp.accounts
+
+The FTP server will listen on the given port number and on the loopback
+interface only. The "accounts.file" pathname will be interpreted
+relative to the node's BASEDIR.
+
+To enable the FTP server with an account server instead, provide the URL of
+that server in an "accounts.url" directive::
+
+ [ftpd]
+ enabled = true
+ port = tcp:8021:interface=127.0.0.1
+ accounts.url = https://example.com/login
+
+You can provide both accounts.file and accounts.url, although it probably
+isn't very useful except for testing.
+
+FTP provides no security, and so your password or caps could be eavesdropped
+if you connect to the FTP server remotely. The examples above include
+":interface=127.0.0.1" in the "port" option, which causes the server to only
+accept connections from localhost.
+
+Configuring SFTP Access
+=======================
+
+The Tahoe-LAFS SFTP server requires a host keypair, just like the regular SSH
+server. It is important to give each server a distinct keypair, to prevent
+one server from masquerading as different one. The first time a client
+program talks to a given server, it will store the host key it receives, and
+will complain if a subsequent connection uses a different key. This reduces
+the opportunity for man-in-the-middle attacks to just the first connection.
+
+Exercise caution when connecting to the SFTP server remotely. The AES
+implementation used by the SFTP code does not have defenses against timing
+attacks. The code for encrypting the SFTP connection was not written by the
+Tahoe-LAFS team, and we have not reviewed it as carefully as we have reviewed
+the code for encrypting files and directories in Tahoe-LAFS itself. If you
+can connect to the SFTP server (which is provided by the Tahoe-LAFS gateway)
+only from a client on the same host, then you would be safe from any problem
+with the SFTP connection security. The examples given below enforce this
+policy by including ":interface=127.0.0.1" in the "port" option, which
+causes the server to only accept connections from localhost.
+
+You will use directives in the tahoe.cfg file to tell the SFTP code where to
+find these keys. To create one, use the ``ssh-keygen`` tool (which comes with
+the standard openssh client distribution)::
+
+ % cd BASEDIR
+ % ssh-keygen -f private/ssh_host_rsa_key
+
+The server private key file must not have a passphrase.
+
+Then, to enable the SFTP server with an accounts file, add the following
+lines to the BASEDIR/tahoe.cfg file::
+
+ [sftpd]
+ enabled = true
+ port = tcp:8022:interface=127.0.0.1
+ host_pubkey_file = private/ssh_host_rsa_key.pub
+ host_privkey_file = private/ssh_host_rsa_key
+ accounts.file = private/ftp.accounts
+
+The SFTP server will listen on the given port number and on the loopback
+interface only. The "accounts.file" pathname will be interpreted
+relative to the node's BASEDIR.
+
+Or, to use an account server instead, do this::
+
+ [sftpd]
+ enabled = true
+ port = tcp:8022:interface=127.0.0.1
+ host_pubkey_file = private/ssh_host_rsa_key.pub
+ host_privkey_file = private/ssh_host_rsa_key
+ accounts.url = https://example.com/login
+
+You can provide both accounts.file and accounts.url, although it probably
+isn't very useful except for testing.
+
+For further information on SFTP compatibility and known issues with various
+clients and with the sshfs filesystem, see
+http://tahoe-lafs.org/trac/tahoe-lafs/wiki/SftpFrontend .
+
+Dependencies
+============
+
+The Tahoe-LAFS SFTP server requires the Twisted "Conch" component (a "conch" is
+a twisted shell, get it?). Many Linux distributions package the Conch code
+separately: debian puts it in the "python-twisted-conch" package. Conch
+requires the "pycrypto" package, which is a Python+C implementation of many
+cryptographic functions (the debian package is named "python-crypto").
+
+Note that "pycrypto" is different than the "pycryptopp" package that Tahoe-LAFS
+uses (which is a Python wrapper around the C++ -based Crypto++ library, a
+library that is frequently installed as /usr/lib/libcryptopp.a, to avoid
+problems with non-alphanumerics in filenames).
+
+The FTP server requires code in Twisted that enables asynchronous closing of
+file-upload operations. This code was landed to Twisted's SVN trunk in r28453
+on 23-Feb-2010, slightly too late for the Twisted-10.0 release, but it should
+be present in the next release after that. To use Tahoe-LAFS's FTP server with
+Twisted-10.0 or earlier, you will need to apply the patch attached to
+http://twistedmatrix.com/trac/ticket/3462 . The Tahoe-LAFS node will refuse to
+start the FTP server unless it detects the necessary support code in Twisted.
+This patch is not needed for SFTP.
+
+Immutable and Mutable Files
+===========================
+
+All files created via SFTP (and FTP) are immutable files. However, files
+can only be created in writeable directories, which allows the directory
+entry to be relinked to a different file. Normally, when the path of an
+immutable file is opened for writing by SFTP, the directory entry is
+relinked to another file with the newly written contents when the file
+handle is closed. The old file is still present on the grid, and any other
+caps to it will remain valid. (See docs/garbage-collection.txt for how to
+reclaim the space used by files that are no longer needed.)
+
+The 'no-write' metadata field of a directory entry can override this
+behaviour. If the 'no-write' field holds a true value, then a permission
+error will occur when trying to write to the file, even if it is in a
+writeable directory. This does not prevent the directory entry from being
+unlinked or replaced.
+
+When using sshfs, the 'no-write' field can be set by clearing the 'w'
+bits in the Unix permissions, for example using the command
+'chmod 444 path/to/file'. Note that this does not mean that arbitrary
+combinations of Unix permissions are supported. If the 'w' bits are
+cleared on a link to a mutable file or directory, that link will become
+read-only.
+
+If SFTP is used to write to an existing mutable file, it will publish a
+new version when the file handle is closed.
+
+Known Issues
+============
+
+Mutable files are not supported by the FTP frontend (`ticket #680
+<http://tahoe-lafs.org/trac/tahoe-lafs/ticket/680>`_). Currently, a directory
+containing mutable files cannot even be listed over FTP.
+
+The FTP frontend sometimes fails to report errors, for example if an upload
+fails because it does meet the "servers of happiness" threshold (`ticket #1081
+<http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1081>`_). Upload errors also may not
+be reported when writing files using SFTP via sshfs (`ticket #1059
+<http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1059>`_).
+
+Non-ASCII filenames are not supported by FTP (`ticket #682
+<http://tahoe-lafs.org/trac/tahoe-lafs/ticket/682>`_). They can be used
+with SFTP only if the client encodes filenames as UTF-8 (`ticket #1089
+<http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1089>`_).
+
+The gateway node may incur a memory leak when accessing many files via SFTP
+(`ticket #1045 <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1045>`_).
+
+For other known issues in SFTP, see
+<http://tahoe-lafs.org/trac/tahoe-lafs/wiki/SftpFrontend>.
+++ /dev/null
-=================================
-Tahoe-LAFS FTP and SFTP Frontends
-=================================
-
-1. `FTP/SFTP Background`_
-2. `Tahoe-LAFS Support`_
-3. `Creating an Account File`_
-4. `Configuring FTP Access`_
-5. `Configuring SFTP Access`_
-6. `Dependencies`_
-7. `Immutable and mutable files`_
-8. `Known Issues`_
-
-
-FTP/SFTP Background
-===================
-
-FTP is the venerable internet file-transfer protocol, first developed in
-1971. The FTP server usually listens on port 21. A separate connection is
-used for the actual data transfers, either in the same direction as the
-initial client-to-server connection (for PORT mode), or in the reverse
-direction (for PASV) mode. Connections are unencrypted, so passwords, file
-names, and file contents are visible to eavesdroppers.
-
-SFTP is the modern replacement, developed as part of the SSH "secure shell"
-protocol, and runs as a subchannel of the regular SSH connection. The SSH
-server usually listens on port 22. All connections are encrypted.
-
-Both FTP and SFTP were developed assuming a UNIX-like server, with accounts
-and passwords, octal file modes (user/group/other, read/write/execute), and
-ctime/mtime timestamps.
-
-Tahoe-LAFS Support
-==================
-
-All Tahoe-LAFS client nodes can run a frontend FTP server, allowing regular FTP
-clients (like /usr/bin/ftp, ncftp, and countless others) to access the
-virtual filesystem. They can also run an SFTP server, so SFTP clients (like
-/usr/bin/sftp, the sshfs FUSE plugin, and others) can too. These frontends
-sit at the same level as the webapi interface.
-
-Since Tahoe-LAFS does not use user accounts or passwords, the FTP/SFTP servers
-must be configured with a way to first authenticate a user (confirm that a
-prospective client has a legitimate claim to whatever authorities we might
-grant a particular user), and second to decide what root directory cap should
-be granted to the authenticated username. A username and password is used
-for this purpose. (The SFTP protocol is also capable of using client
-RSA or DSA public keys, but this is not currently implemented.)
-
-Tahoe-LAFS provides two mechanisms to perform this user-to-rootcap mapping. The
-first is a simple flat file with one account per line. The second is an
-HTTP-based login mechanism, backed by simple PHP script and a database. The
-latter form is used by allmydata.com to provide secure access to customer
-rootcaps.
-
-Creating an Account File
-========================
-
-To use the first form, create a file (probably in
-BASEDIR/private/ftp.accounts) in which each non-comment/non-blank line is a
-space-separated line of (USERNAME, PASSWORD, ROOTCAP), like so::
-
- % cat BASEDIR/private/ftp.accounts
- # This is a password line, (username, password, rootcap)
- alice password URI:DIR2:ioej8xmzrwilg772gzj4fhdg7a:wtiizszzz2rgmczv4wl6bqvbv33ag4kvbr6prz3u6w3geixa6m6a
- bob sekrit URI:DIR2:6bdmeitystckbl9yqlw7g56f4e:serp5ioqxnh34mlbmzwvkp3odehsyrr7eytt5f64we3k9hhcrcja
-
-Future versions of Tahoe-LAFS may support using client public keys for SFTP.
-The words "ssh-rsa" and "ssh-dsa" after the username are reserved to specify
-the public key format, so users cannot have a password equal to either of
-these strings.
-
-Now add an 'accounts.file' directive to your tahoe.cfg file, as described
-in the next sections.
-
-Configuring FTP Access
-======================
-
-To enable the FTP server with an accounts file, add the following lines to
-the BASEDIR/tahoe.cfg file::
-
- [ftpd]
- enabled = true
- port = tcp:8021:interface=127.0.0.1
- accounts.file = private/ftp.accounts
-
-The FTP server will listen on the given port number and on the loopback
-interface only. The "accounts.file" pathname will be interpreted
-relative to the node's BASEDIR.
-
-To enable the FTP server with an account server instead, provide the URL of
-that server in an "accounts.url" directive::
-
- [ftpd]
- enabled = true
- port = tcp:8021:interface=127.0.0.1
- accounts.url = https://example.com/login
-
-You can provide both accounts.file and accounts.url, although it probably
-isn't very useful except for testing.
-
-FTP provides no security, and so your password or caps could be eavesdropped
-if you connect to the FTP server remotely. The examples above include
-":interface=127.0.0.1" in the "port" option, which causes the server to only
-accept connections from localhost.
-
-Configuring SFTP Access
-=======================
-
-The Tahoe-LAFS SFTP server requires a host keypair, just like the regular SSH
-server. It is important to give each server a distinct keypair, to prevent
-one server from masquerading as different one. The first time a client
-program talks to a given server, it will store the host key it receives, and
-will complain if a subsequent connection uses a different key. This reduces
-the opportunity for man-in-the-middle attacks to just the first connection.
-
-Exercise caution when connecting to the SFTP server remotely. The AES
-implementation used by the SFTP code does not have defenses against timing
-attacks. The code for encrypting the SFTP connection was not written by the
-Tahoe-LAFS team, and we have not reviewed it as carefully as we have reviewed
-the code for encrypting files and directories in Tahoe-LAFS itself. If you
-can connect to the SFTP server (which is provided by the Tahoe-LAFS gateway)
-only from a client on the same host, then you would be safe from any problem
-with the SFTP connection security. The examples given below enforce this
-policy by including ":interface=127.0.0.1" in the "port" option, which
-causes the server to only accept connections from localhost.
-
-You will use directives in the tahoe.cfg file to tell the SFTP code where to
-find these keys. To create one, use the ``ssh-keygen`` tool (which comes with
-the standard openssh client distribution)::
-
- % cd BASEDIR
- % ssh-keygen -f private/ssh_host_rsa_key
-
-The server private key file must not have a passphrase.
-
-Then, to enable the SFTP server with an accounts file, add the following
-lines to the BASEDIR/tahoe.cfg file::
-
- [sftpd]
- enabled = true
- port = tcp:8022:interface=127.0.0.1
- host_pubkey_file = private/ssh_host_rsa_key.pub
- host_privkey_file = private/ssh_host_rsa_key
- accounts.file = private/ftp.accounts
-
-The SFTP server will listen on the given port number and on the loopback
-interface only. The "accounts.file" pathname will be interpreted
-relative to the node's BASEDIR.
-
-Or, to use an account server instead, do this::
-
- [sftpd]
- enabled = true
- port = tcp:8022:interface=127.0.0.1
- host_pubkey_file = private/ssh_host_rsa_key.pub
- host_privkey_file = private/ssh_host_rsa_key
- accounts.url = https://example.com/login
-
-You can provide both accounts.file and accounts.url, although it probably
-isn't very useful except for testing.
-
-For further information on SFTP compatibility and known issues with various
-clients and with the sshfs filesystem, see
-http://tahoe-lafs.org/trac/tahoe-lafs/wiki/SftpFrontend .
-
-Dependencies
-============
-
-The Tahoe-LAFS SFTP server requires the Twisted "Conch" component (a "conch" is
-a twisted shell, get it?). Many Linux distributions package the Conch code
-separately: debian puts it in the "python-twisted-conch" package. Conch
-requires the "pycrypto" package, which is a Python+C implementation of many
-cryptographic functions (the debian package is named "python-crypto").
-
-Note that "pycrypto" is different than the "pycryptopp" package that Tahoe-LAFS
-uses (which is a Python wrapper around the C++ -based Crypto++ library, a
-library that is frequently installed as /usr/lib/libcryptopp.a, to avoid
-problems with non-alphanumerics in filenames).
-
-The FTP server requires code in Twisted that enables asynchronous closing of
-file-upload operations. This code was landed to Twisted's SVN trunk in r28453
-on 23-Feb-2010, slightly too late for the Twisted-10.0 release, but it should
-be present in the next release after that. To use Tahoe-LAFS's FTP server with
-Twisted-10.0 or earlier, you will need to apply the patch attached to
-http://twistedmatrix.com/trac/ticket/3462 . The Tahoe-LAFS node will refuse to
-start the FTP server unless it detects the necessary support code in Twisted.
-This patch is not needed for SFTP.
-
-Immutable and Mutable Files
-===========================
-
-All files created via SFTP (and FTP) are immutable files. However, files
-can only be created in writeable directories, which allows the directory
-entry to be relinked to a different file. Normally, when the path of an
-immutable file is opened for writing by SFTP, the directory entry is
-relinked to another file with the newly written contents when the file
-handle is closed. The old file is still present on the grid, and any other
-caps to it will remain valid. (See docs/garbage-collection.txt for how to
-reclaim the space used by files that are no longer needed.)
-
-The 'no-write' metadata field of a directory entry can override this
-behaviour. If the 'no-write' field holds a true value, then a permission
-error will occur when trying to write to the file, even if it is in a
-writeable directory. This does not prevent the directory entry from being
-unlinked or replaced.
-
-When using sshfs, the 'no-write' field can be set by clearing the 'w'
-bits in the Unix permissions, for example using the command
-'chmod 444 path/to/file'. Note that this does not mean that arbitrary
-combinations of Unix permissions are supported. If the 'w' bits are
-cleared on a link to a mutable file or directory, that link will become
-read-only.
-
-If SFTP is used to write to an existing mutable file, it will publish a
-new version when the file handle is closed.
-
-Known Issues
-============
-
-Mutable files are not supported by the FTP frontend (`ticket #680
-<http://tahoe-lafs.org/trac/tahoe-lafs/ticket/680>`_). Currently, a directory
-containing mutable files cannot even be listed over FTP.
-
-The FTP frontend sometimes fails to report errors, for example if an upload
-fails because it does meet the "servers of happiness" threshold (`ticket #1081
-<http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1081>`_). Upload errors also may not
-be reported when writing files using SFTP via sshfs (`ticket #1059
-<http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1059>`_).
-
-Non-ASCII filenames are not supported by FTP (`ticket #682
-<http://tahoe-lafs.org/trac/tahoe-lafs/ticket/682>`_). They can be used
-with SFTP only if the client encodes filenames as UTF-8 (`ticket #1089
-<http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1089>`_).
-
-The gateway node may incur a memory leak when accessing many files via SFTP
-(`ticket #1045 <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1045>`_).
-
-For other known issues in SFTP, see
-<http://tahoe-lafs.org/trac/tahoe-lafs/wiki/SftpFrontend>.
--- /dev/null
+===============
+Download status
+===============
+
+
+Introduction
+============
+
+The WUI will display the "status" of uploads and downloads.
+
+The Welcome Page has a link entitled "Recent Uploads and Downloads"
+which goes to this URL:
+
+http://$GATEWAY/status
+
+Each entry in the list of recent operations has a "status" link which
+will take you to a page describing that operation.
+
+For immutable downloads, the page has a lot of information, and this
+document is to explain what it all means. It was written by Brian
+Warner, who wrote the v1.8.0 downloader code and the code which
+generates this status report about the v1.8.0 downloader's
+behavior. Brian posted it to the trac:
+http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1169#comment:1
+
+Then Zooko lightly edited it while copying it into the docs/
+directory.
+
+What's involved in a download?
+==============================
+
+Downloads are triggered by read() calls, each with a starting offset (defaults
+to 0) and a length (defaults to the whole file). A regular webapi GET request
+will result in a whole-file read() call.
+
+Each read() call turns into an ordered sequence of get_segment() calls. A
+whole-file read will fetch all segments, in order, but partial reads or
+multiple simultaneous reads will result in random-access of segments. Segment
+reads always return ciphertext: the layer above that (in read()) is responsible
+for decryption.
+
+Before we can satisfy any segment reads, we need to find some shares. ("DYHB"
+is an abbreviation for "Do You Have Block", and is the message we send to
+storage servers to ask them if they have any shares for us. The name is
+historical, from Mojo Nation/Mnet/Mountain View, but nicely distinctive.
+Tahoe-LAFS's actual message name is remote_get_buckets().). Responses come
+back eventually, or don't.
+
+Once we get enough positive DYHB responses, we have enough shares to start
+downloading. We send "block requests" for various pieces of the share.
+Responses come back eventually, or don't.
+
+When we get enough block-request responses for a given segment, we can decode
+the data and satisfy the segment read.
+
+When the segment read completes, some or all of the segment data is used to
+satisfy the read() call (if the read call started or ended in the middle of a
+segment, we'll only use part of the data, otherwise we'll use all of it).
+
+Data on the download-status page
+================================
+
+DYHB Requests
+-------------
+
+This shows every Do-You-Have-Block query sent to storage servers and their
+results. Each line shows the following:
+
+* the serverid to which the request was sent
+* the time at which the request was sent. Note that all timestamps are
+ relative to the start of the first read() call and indicated with a "+" sign
+* the time at which the response was received (if ever)
+* the share numbers that the server has, if any
+* the elapsed time taken by the request
+
+Also, each line is colored according to the serverid. This color is also used
+in the "Requests" section below.
+
+Read Events
+-----------
+
+This shows all the FileNode read() calls and their overall results. Each line
+shows:
+
+* the range of the file that was requested (as [OFFSET:+LENGTH]). A whole-file
+ GET will start at 0 and read the entire file.
+* the time at which the read() was made
+* the time at which the request finished, either because the last byte of data
+ was returned to the read() caller, or because they cancelled the read by
+ calling stopProducing (i.e. closing the HTTP connection)
+* the number of bytes returned to the caller so far
+* the time spent on the read, so far
+* the total time spent in AES decryption
+* total time spend paused by the client (pauseProducing), generally because the
+ HTTP connection filled up, which most streaming media players will do to
+ limit how much data they have to buffer
+* effective speed of the read(), not including paused time
+
+Segment Events
+--------------
+
+This shows each get_segment() call and its resolution. This table is not well
+organized, and my post-1.8.0 work will clean it up a lot. In its present form,
+it records "request" and "delivery" events separately, indicated by the "type"
+column.
+
+Each request shows the segment number being requested and the time at which the
+get_segment() call was made.
+
+Each delivery shows:
+
+* segment number
+* range of file data (as [OFFSET:+SIZE]) delivered
+* elapsed time spent doing ZFEC decoding
+* overall elapsed time fetching the segment
+* effective speed of the segment fetch
+
+Requests
+--------
+
+This shows every block-request sent to the storage servers. Each line shows:
+
+* the server to which the request was sent
+* which share number it is referencing
+* the portion of the share data being requested (as [OFFSET:+SIZE])
+* the time the request was sent
+* the time the response was received (if ever)
+* the amount of data that was received (which might be less than SIZE if we
+ tried to read off the end of the share)
+* the elapsed time for the request (RTT=Round-Trip-Time)
+
+Also note that each Request line is colored according to the serverid it was
+sent to. And all timestamps are shown relative to the start of the first
+read() call: for example the first DYHB message was sent at +0.001393s about
+1.4 milliseconds after the read() call started everything off.
+++ /dev/null
-===============
-Download status
-===============
-
-
-Introduction
-============
-
-The WUI will display the "status" of uploads and downloads.
-
-The Welcome Page has a link entitled "Recent Uploads and Downloads"
-which goes to this URL:
-
-http://$GATEWAY/status
-
-Each entry in the list of recent operations has a "status" link which
-will take you to a page describing that operation.
-
-For immutable downloads, the page has a lot of information, and this
-document is to explain what it all means. It was written by Brian
-Warner, who wrote the v1.8.0 downloader code and the code which
-generates this status report about the v1.8.0 downloader's
-behavior. Brian posted it to the trac:
-http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1169#comment:1
-
-Then Zooko lightly edited it while copying it into the docs/
-directory.
-
-What's involved in a download?
-==============================
-
-Downloads are triggered by read() calls, each with a starting offset (defaults
-to 0) and a length (defaults to the whole file). A regular webapi GET request
-will result in a whole-file read() call.
-
-Each read() call turns into an ordered sequence of get_segment() calls. A
-whole-file read will fetch all segments, in order, but partial reads or
-multiple simultaneous reads will result in random-access of segments. Segment
-reads always return ciphertext: the layer above that (in read()) is responsible
-for decryption.
-
-Before we can satisfy any segment reads, we need to find some shares. ("DYHB"
-is an abbreviation for "Do You Have Block", and is the message we send to
-storage servers to ask them if they have any shares for us. The name is
-historical, from Mojo Nation/Mnet/Mountain View, but nicely distinctive.
-Tahoe-LAFS's actual message name is remote_get_buckets().). Responses come
-back eventually, or don't.
-
-Once we get enough positive DYHB responses, we have enough shares to start
-downloading. We send "block requests" for various pieces of the share.
-Responses come back eventually, or don't.
-
-When we get enough block-request responses for a given segment, we can decode
-the data and satisfy the segment read.
-
-When the segment read completes, some or all of the segment data is used to
-satisfy the read() call (if the read call started or ended in the middle of a
-segment, we'll only use part of the data, otherwise we'll use all of it).
-
-Data on the download-status page
-================================
-
-DYHB Requests
--------------
-
-This shows every Do-You-Have-Block query sent to storage servers and their
-results. Each line shows the following:
-
-* the serverid to which the request was sent
-* the time at which the request was sent. Note that all timestamps are
- relative to the start of the first read() call and indicated with a "+" sign
-* the time at which the response was received (if ever)
-* the share numbers that the server has, if any
-* the elapsed time taken by the request
-
-Also, each line is colored according to the serverid. This color is also used
-in the "Requests" section below.
-
-Read Events
------------
-
-This shows all the FileNode read() calls and their overall results. Each line
-shows:
-
-* the range of the file that was requested (as [OFFSET:+LENGTH]). A whole-file
- GET will start at 0 and read the entire file.
-* the time at which the read() was made
-* the time at which the request finished, either because the last byte of data
- was returned to the read() caller, or because they cancelled the read by
- calling stopProducing (i.e. closing the HTTP connection)
-* the number of bytes returned to the caller so far
-* the time spent on the read, so far
-* the total time spent in AES decryption
-* total time spend paused by the client (pauseProducing), generally because the
- HTTP connection filled up, which most streaming media players will do to
- limit how much data they have to buffer
-* effective speed of the read(), not including paused time
-
-Segment Events
---------------
-
-This shows each get_segment() call and its resolution. This table is not well
-organized, and my post-1.8.0 work will clean it up a lot. In its present form,
-it records "request" and "delivery" events separately, indicated by the "type"
-column.
-
-Each request shows the segment number being requested and the time at which the
-get_segment() call was made.
-
-Each delivery shows:
-
-* segment number
-* range of file data (as [OFFSET:+SIZE]) delivered
-* elapsed time spent doing ZFEC decoding
-* overall elapsed time fetching the segment
-* effective speed of the segment fetch
-
-Requests
---------
-
-This shows every block-request sent to the storage servers. Each line shows:
-
-* the server to which the request was sent
-* which share number it is referencing
-* the portion of the share data being requested (as [OFFSET:+SIZE])
-* the time the request was sent
-* the time the response was received (if ever)
-* the amount of data that was received (which might be less than SIZE if we
- tried to read off the end of the share)
-* the elapsed time for the request (RTT=Round-Trip-Time)
-
-Also note that each Request line is colored according to the serverid it was
-sent to. And all timestamps are shown relative to the start of the first
-read() call: for example the first DYHB message was sent at +0.001393s about
-1.4 milliseconds after the read() call started everything off.
--- /dev/null
+==========================
+The Tahoe REST-ful Web API
+==========================
+
+1. `Enabling the web-API port`_
+2. `Basic Concepts: GET, PUT, DELETE, POST`_
+3. `URLs`_
+
+ 1. `Child Lookup`_
+
+4. `Slow Operations, Progress, and Cancelling`_
+5. `Programmatic Operations`_
+
+ 1. `Reading a file`_
+ 2. `Writing/Uploading a File`_
+ 3. `Creating a New Directory`_
+ 4. `Get Information About A File Or Directory (as JSON)`_
+ 5. `Attaching an existing File or Directory by its read- or write-cap`_
+ 6. `Adding multiple files or directories to a parent directory at once`_
+ 7. `Deleting a File or Directory`_
+
+6. `Browser Operations: Human-Oriented Interfaces`_
+
+ 1. `Viewing A Directory (as HTML)`_
+ 2. `Viewing/Downloading a File`_
+ 3. `Get Information About A File Or Directory (as HTML)`_
+ 4. `Creating a Directory`_
+ 5. `Uploading a File`_
+ 6. `Attaching An Existing File Or Directory (by URI)`_
+ 7. `Deleting A Child`_
+ 8. `Renaming A Child`_
+ 9. `Other Utilities`_
+ 10. `Debugging and Testing Features`_
+
+7. `Other Useful Pages`_
+8. `Static Files in /public_html`_
+9. `Safety and security issues -- names vs. URIs`_
+10. `Concurrency Issues`_
+
+Enabling the web-API port
+=========================
+
+Every Tahoe node is capable of running a built-in HTTP server. To enable
+this, just write a port number into the "[node]web.port" line of your node's
+tahoe.cfg file. For example, writing "web.port = 3456" into the "[node]"
+section of $NODEDIR/tahoe.cfg will cause the node to run a webserver on port
+3456.
+
+This string is actually a Twisted "strports" specification, meaning you can
+get more control over the interface to which the server binds by supplying
+additional arguments. For more details, see the documentation on
+`twisted.application.strports
+<http://twistedmatrix.com/documents/current/api/twisted.application.strports.html>`_.
+
+Writing "tcp:3456:interface=127.0.0.1" into the web.port line does the same
+but binds to the loopback interface, ensuring that only the programs on the
+local host can connect. Using "ssl:3456:privateKey=mykey.pem:certKey=cert.pem"
+runs an SSL server.
+
+This webport can be set when the node is created by passing a --webport
+option to the 'tahoe create-node' command. By default, the node listens on
+port 3456, on the loopback (127.0.0.1) interface.
+
+Basic Concepts: GET, PUT, DELETE, POST
+======================================
+
+As described in `architecture.rst`_, each file and directory in a Tahoe virtual
+filesystem is referenced by an identifier that combines the designation of
+the object with the authority to do something with it (such as read or modify
+the contents). This identifier is called a "read-cap" or "write-cap",
+depending upon whether it enables read-only or read-write access. These
+"caps" are also referred to as URIs.
+
+.. _architecture.rst: http://tahoe-lafs.org/source/tahoe-lafs/trunk/docs/architecture.rst
+
+The Tahoe web-based API is "REST-ful", meaning it implements the concepts of
+"REpresentational State Transfer": the original scheme by which the World
+Wide Web was intended to work. Each object (file or directory) is referenced
+by a URL that includes the read- or write- cap. HTTP methods (GET, PUT, and
+DELETE) are used to manipulate these objects. You can think of the URL as a
+noun, and the method as a verb.
+
+In REST, the GET method is used to retrieve information about an object, or
+to retrieve some representation of the object itself. When the object is a
+file, the basic GET method will simply return the contents of that file.
+Other variations (generally implemented by adding query parameters to the
+URL) will return information about the object, such as metadata. GET
+operations are required to have no side-effects.
+
+PUT is used to upload new objects into the filesystem, or to replace an
+existing object. DELETE it used to delete objects from the filesystem. Both
+PUT and DELETE are required to be idempotent: performing the same operation
+multiple times must have the same side-effects as only performing it once.
+
+POST is used for more complicated actions that cannot be expressed as a GET,
+PUT, or DELETE. POST operations can be thought of as a method call: sending
+some message to the object referenced by the URL. In Tahoe, POST is also used
+for operations that must be triggered by an HTML form (including upload and
+delete), because otherwise a regular web browser has no way to accomplish
+these tasks. In general, everything that can be done with a PUT or DELETE can
+also be done with a POST.
+
+Tahoe's web API is designed for two different kinds of consumer. The first is
+a program that needs to manipulate the virtual file system. Such programs are
+expected to use the RESTful interface described above. The second is a human
+using a standard web browser to work with the filesystem. This user is given
+a series of HTML pages with links to download files, and forms that use POST
+actions to upload, rename, and delete files.
+
+When an error occurs, the HTTP response code will be set to an appropriate
+400-series code (like 404 Not Found for an unknown childname, or 400 Bad Request
+when the parameters to a webapi operation are invalid), and the HTTP response
+body will usually contain a few lines of explanation as to the cause of the
+error and possible responses. Unusual exceptions may result in a 500 Internal
+Server Error as a catch-all, with a default response body containing
+a Nevow-generated HTML-ized representation of the Python exception stack trace
+that caused the problem. CLI programs which want to copy the response body to
+stderr should provide an "Accept: text/plain" header to their requests to get
+a plain text stack trace instead. If the Accept header contains ``*/*``, or
+``text/*``, or text/html (or if there is no Accept header), HTML tracebacks will
+be generated.
+
+URLs
+====
+
+Tahoe uses a variety of read- and write- caps to identify files and
+directories. The most common of these is the "immutable file read-cap", which
+is used for most uploaded files. These read-caps look like the following::
+
+ URI:CHK:ime6pvkaxuetdfah2p2f35pe54:4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a:3:10:202
+
+The next most common is a "directory write-cap", which provides both read and
+write access to a directory, and look like this::
+
+ URI:DIR2:djrdkfawoqihigoett4g6auz6a:jx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq
+
+There are also "directory read-caps", which start with "URI:DIR2-RO:", and
+give read-only access to a directory. Finally there are also mutable file
+read- and write- caps, which start with "URI:SSK", and give access to mutable
+files.
+
+(Later versions of Tahoe will make these strings shorter, and will remove the
+unfortunate colons, which must be escaped when these caps are embedded in
+URLs.)
+
+To refer to any Tahoe object through the web API, you simply need to combine
+a prefix (which indicates the HTTP server to use) with the cap (which
+indicates which object inside that server to access). Since the default Tahoe
+webport is 3456, the most common prefix is one that will use a local node
+listening on this port::
+
+ http://127.0.0.1:3456/uri/ + $CAP
+
+So, to access the directory named above (which happens to be the
+publically-writeable sample directory on the Tahoe test grid, described at
+http://allmydata.org/trac/tahoe/wiki/TestGrid), the URL would be::
+
+ http://127.0.0.1:3456/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/
+
+(note that the colons in the directory-cap are url-encoded into "%3A"
+sequences).
+
+Likewise, to access the file named above, use::
+
+ http://127.0.0.1:3456/uri/URI%3ACHK%3Aime6pvkaxuetdfah2p2f35pe54%3A4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a%3A3%3A10%3A202
+
+In the rest of this document, we'll use "$DIRCAP" as shorthand for a read-cap
+or write-cap that refers to a directory, and "$FILECAP" to abbreviate a cap
+that refers to a file (whether mutable or immutable). So those URLs above can
+be abbreviated as::
+
+ http://127.0.0.1:3456/uri/$DIRCAP/
+ http://127.0.0.1:3456/uri/$FILECAP
+
+The operation summaries below will abbreviate these further, by eliding the
+server prefix. They will be displayed like this::
+
+ /uri/$DIRCAP/
+ /uri/$FILECAP
+
+
+Child Lookup
+------------
+
+Tahoe directories contain named child entries, just like directories in a regular
+local filesystem. These child entries, called "dirnodes", consist of a name,
+metadata, a write slot, and a read slot. The write and read slots normally contain
+a write-cap and read-cap referring to the same object, which can be either a file
+or a subdirectory. The write slot may be empty (actually, both may be empty,
+but that is unusual).
+
+If you have a Tahoe URL that refers to a directory, and want to reference a
+named child inside it, just append the child name to the URL. For example, if
+our sample directory contains a file named "welcome.txt", we can refer to
+that file with::
+
+ http://127.0.0.1:3456/uri/$DIRCAP/welcome.txt
+
+(or http://127.0.0.1:3456/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/welcome.txt)
+
+Multiple levels of subdirectories can be handled this way::
+
+ http://127.0.0.1:3456/uri/$DIRCAP/tahoe-source/docs/webapi.txt
+
+In this document, when we need to refer to a URL that references a file using
+this child-of-some-directory format, we'll use the following string::
+
+ /uri/$DIRCAP/[SUBDIRS../]FILENAME
+
+The "[SUBDIRS../]" part means that there are zero or more (optional)
+subdirectory names in the middle of the URL. The "FILENAME" at the end means
+that this whole URL refers to a file of some sort, rather than to a
+directory.
+
+When we need to refer specifically to a directory in this way, we'll write::
+
+ /uri/$DIRCAP/[SUBDIRS../]SUBDIR
+
+
+Note that all components of pathnames in URLs are required to be UTF-8
+encoded, so "resume.doc" (with an acute accent on both E's) would be accessed
+with::
+
+ http://127.0.0.1:3456/uri/$DIRCAP/r%C3%A9sum%C3%A9.doc
+
+Also note that the filenames inside upload POST forms are interpreted using
+whatever character set was provided in the conventional '_charset' field, and
+defaults to UTF-8 if not otherwise specified. The JSON representation of each
+directory contains native unicode strings. Tahoe directories are specified to
+contain unicode filenames, and cannot contain binary strings that are not
+representable as such.
+
+All Tahoe operations that refer to existing files or directories must include
+a suitable read- or write- cap in the URL: the webapi server won't add one
+for you. If you don't know the cap, you can't access the file. This allows
+the security properties of Tahoe caps to be extended across the webapi
+interface.
+
+Slow Operations, Progress, and Cancelling
+=========================================
+
+Certain operations can be expected to take a long time. The "t=deep-check",
+described below, will recursively visit every file and directory reachable
+from a given starting point, which can take minutes or even hours for
+extremely large directory structures. A single long-running HTTP request is a
+fragile thing: proxies, NAT boxes, browsers, and users may all grow impatient
+with waiting and give up on the connection.
+
+For this reason, long-running operations have an "operation handle", which
+can be used to poll for status/progress messages while the operation
+proceeds. This handle can also be used to cancel the operation. These handles
+are created by the client, and passed in as a an "ophandle=" query argument
+to the POST or PUT request which starts the operation. The following
+operations can then be used to retrieve status:
+
+``GET /operations/$HANDLE?output=HTML (with or without t=status)``
+
+``GET /operations/$HANDLE?output=JSON (same)``
+
+ These two retrieve the current status of the given operation. Each operation
+ presents a different sort of information, but in general the page retrieved
+ will indicate:
+
+ * whether the operation is complete, or if it is still running
+ * how much of the operation is complete, and how much is left, if possible
+
+ Note that the final status output can be quite large: a deep-manifest of a
+ directory structure with 300k directories and 200k unique files is about
+ 275MB of JSON, and might take two minutes to generate. For this reason, the
+ full status is not provided until the operation has completed.
+
+ The HTML form will include a meta-refresh tag, which will cause a regular
+ web browser to reload the status page about 60 seconds later. This tag will
+ be removed once the operation has completed.
+
+ There may be more status information available under
+ /operations/$HANDLE/$ETC : i.e., the handle forms the root of a URL space.
+
+``POST /operations/$HANDLE?t=cancel``
+
+ This terminates the operation, and returns an HTML page explaining what was
+ cancelled. If the operation handle has already expired (see below), this
+ POST will return a 404, which indicates that the operation is no longer
+ running (either it was completed or terminated). The response body will be
+ the same as a GET /operations/$HANDLE on this operation handle, and the
+ handle will be expired immediately afterwards.
+
+The operation handle will eventually expire, to avoid consuming an unbounded
+amount of memory. The handle's time-to-live can be reset at any time, by
+passing a retain-for= argument (with a count of seconds) to either the
+initial POST that starts the operation, or the subsequent GET request which
+asks about the operation. For example, if a 'GET
+/operations/$HANDLE?output=JSON&retain-for=600' query is performed, the
+handle will remain active for 600 seconds (10 minutes) after the GET was
+received.
+
+In addition, if the GET includes a release-after-complete=True argument, and
+the operation has completed, the operation handle will be released
+immediately.
+
+If a retain-for= argument is not used, the default handle lifetimes are:
+
+ * handles will remain valid at least until their operation finishes
+ * uncollected handles for finished operations (i.e. handles for
+ operations that have finished but for which the GET page has not been
+ accessed since completion) will remain valid for four days, or for
+ the total time consumed by the operation, whichever is greater.
+ * collected handles (i.e. the GET page has been retrieved at least once
+ since the operation completed) will remain valid for one day.
+
+Many "slow" operations can begin to use unacceptable amounts of memory when
+operating on large directory structures. The memory usage increases when the
+ophandle is polled, as the results must be copied into a JSON string, sent
+over the wire, then parsed by a client. So, as an alternative, many "slow"
+operations have streaming equivalents. These equivalents do not use operation
+handles. Instead, they emit line-oriented status results immediately. Client
+code can cancel the operation by simply closing the HTTP connection.
+
+Programmatic Operations
+=======================
+
+Now that we know how to build URLs that refer to files and directories in a
+Tahoe virtual filesystem, what sorts of operations can we do with those URLs?
+This section contains a catalog of GET, PUT, DELETE, and POST operations that
+can be performed on these URLs. This set of operations are aimed at programs
+that use HTTP to communicate with a Tahoe node. A later section describes
+operations that are intended for web browsers.
+
+Reading A File
+--------------
+
+``GET /uri/$FILECAP``
+
+``GET /uri/$DIRCAP/[SUBDIRS../]FILENAME``
+
+ This will retrieve the contents of the given file. The HTTP response body
+ will contain the sequence of bytes that make up the file.
+
+ To view files in a web browser, you may want more control over the
+ Content-Type and Content-Disposition headers. Please see the next section
+ "Browser Operations", for details on how to modify these URLs for that
+ purpose.
+
+Writing/Uploading A File
+------------------------
+
+``PUT /uri/$FILECAP``
+
+``PUT /uri/$DIRCAP/[SUBDIRS../]FILENAME``
+
+ Upload a file, using the data from the HTTP request body, and add whatever
+ child links and subdirectories are necessary to make the file available at
+ the given location. Once this operation succeeds, a GET on the same URL will
+ retrieve the same contents that were just uploaded. This will create any
+ necessary intermediate subdirectories.
+
+ To use the /uri/$FILECAP form, $FILECAP must be a write-cap for a mutable file.
+
+ In the /uri/$DIRCAP/[SUBDIRS../]FILENAME form, if the target file is a
+ writeable mutable file, that file's contents will be overwritten in-place. If
+ it is a read-cap for a mutable file, an error will occur. If it is an
+ immutable file, the old file will be discarded, and a new one will be put in
+ its place.
+
+ When creating a new file, if "mutable=true" is in the query arguments, the
+ operation will create a mutable file instead of an immutable one.
+
+ This returns the file-cap of the resulting file. If a new file was created
+ by this method, the HTTP response code (as dictated by rfc2616) will be set
+ to 201 CREATED. If an existing file was replaced or modified, the response
+ code will be 200 OK.
+
+ Note that the 'curl -T localfile http://127.0.0.1:3456/uri/$DIRCAP/foo.txt'
+ command can be used to invoke this operation.
+
+``PUT /uri``
+
+ This uploads a file, and produces a file-cap for the contents, but does not
+ attach the file into the filesystem. No directories will be modified by
+ this operation. The file-cap is returned as the body of the HTTP response.
+
+ If "mutable=true" is in the query arguments, the operation will create a
+ mutable file, and return its write-cap in the HTTP respose. The default is
+ to create an immutable file, returning the read-cap as a response.
+
+Creating A New Directory
+------------------------
+
+``POST /uri?t=mkdir``
+
+``PUT /uri?t=mkdir``
+
+ Create a new empty directory and return its write-cap as the HTTP response
+ body. This does not make the newly created directory visible from the
+ filesystem. The "PUT" operation is provided for backwards compatibility:
+ new code should use POST.
+
+``POST /uri?t=mkdir-with-children``
+
+ Create a new directory, populated with a set of child nodes, and return its
+ write-cap as the HTTP response body. The new directory is not attached to
+ any other directory: the returned write-cap is the only reference to it.
+
+ Initial children are provided as the body of the POST form (this is more
+ efficient than doing separate mkdir and set_children operations). If the
+ body is empty, the new directory will be empty. If not empty, the body will
+ be interpreted as a UTF-8 JSON-encoded dictionary of children with which the
+ new directory should be populated, using the same format as would be
+ returned in the 'children' value of the t=json GET request, described below.
+ Each dictionary key should be a child name, and each value should be a list
+ of [TYPE, PROPDICT], where PROPDICT contains "rw_uri", "ro_uri", and
+ "metadata" keys (all others are ignored). For example, the PUT request body
+ could be::
+
+ {
+ "Fran\u00e7ais": [ "filenode", {
+ "ro_uri": "URI:CHK:...",
+ "size": bytes,
+ "metadata": {
+ "ctime": 1202777696.7564139,
+ "mtime": 1202777696.7564139,
+ "tahoe": {
+ "linkcrtime": 1202777696.7564139,
+ "linkmotime": 1202777696.7564139
+ } } } ],
+ "subdir": [ "dirnode", {
+ "rw_uri": "URI:DIR2:...",
+ "ro_uri": "URI:DIR2-RO:...",
+ "metadata": {
+ "ctime": 1202778102.7589991,
+ "mtime": 1202778111.2160511,
+ "tahoe": {
+ "linkcrtime": 1202777696.7564139,
+ "linkmotime": 1202777696.7564139
+ } } } ]
+ }
+
+ For forward-compatibility, a mutable directory can also contain caps in
+ a format that is unknown to the webapi server. When such caps are retrieved
+ from a mutable directory in a "ro_uri" field, they will be prefixed with
+ the string "ro.", indicating that they must not be decoded without
+ checking that they are read-only. The "ro." prefix must not be stripped
+ off without performing this check. (Future versions of the webapi server
+ will perform it where necessary.)
+
+ If both the "rw_uri" and "ro_uri" fields are present in a given PROPDICT,
+ and the webapi server recognizes the rw_uri as a write cap, then it will
+ reset the ro_uri to the corresponding read cap and discard the original
+ contents of ro_uri (in order to ensure that the two caps correspond to the
+ same object and that the ro_uri is in fact read-only). However this may not
+ happen for caps in a format unknown to the webapi server. Therefore, when
+ writing a directory the webapi client should ensure that the contents
+ of "rw_uri" and "ro_uri" for a given PROPDICT are a consistent
+ (write cap, read cap) pair if possible. If the webapi client only has
+ one cap and does not know whether it is a write cap or read cap, then
+ it is acceptable to set "rw_uri" to that cap and omit "ro_uri". The
+ client must not put a write cap into a "ro_uri" field.
+
+ The metadata may have a "no-write" field. If this is set to true in the
+ metadata of a link, it will not be possible to open that link for writing
+ via the SFTP frontend; see `FTP-and-SFTP.rst`_ for details.
+ Also, if the "no-write" field is set to true in the metadata of a link to
+ a mutable child, it will cause the link to be diminished to read-only.
+
+ .. _FTP-and-SFTP.rst: http://tahoe-lafs.org/source/tahoe-lafs/trunk/docs/frontents/FTP-and-SFTP.rst
+
+ Note that the webapi-using client application must not provide the
+ "Content-Type: multipart/form-data" header that usually accompanies HTML
+ form submissions, since the body is not formatted this way. Doing so will
+ cause a server error as the lower-level code misparses the request body.
+
+ Child file names should each be expressed as a unicode string, then used as
+ keys of the dictionary. The dictionary should then be converted into JSON,
+ and the resulting string encoded into UTF-8. This UTF-8 bytestring should
+ then be used as the POST body.
+
+``POST /uri?t=mkdir-immutable``
+
+ Like t=mkdir-with-children above, but the new directory will be
+ deep-immutable. This means that the directory itself is immutable, and that
+ it can only contain objects that are treated as being deep-immutable, like
+ immutable files, literal files, and deep-immutable directories.
+
+ For forward-compatibility, a deep-immutable directory can also contain caps
+ in a format that is unknown to the webapi server. When such caps are retrieved
+ from a deep-immutable directory in a "ro_uri" field, they will be prefixed
+ with the string "imm.", indicating that they must not be decoded without
+ checking that they are immutable. The "imm." prefix must not be stripped
+ off without performing this check. (Future versions of the webapi server
+ will perform it where necessary.)
+
+ The cap for each child may be given either in the "rw_uri" or "ro_uri"
+ field of the PROPDICT (not both). If a cap is given in the "rw_uri" field,
+ then the webapi server will check that it is an immutable read-cap of a
+ *known* format, and give an error if it is not. If a cap is given in the
+ "ro_uri" field, then the webapi server will still check whether known
+ caps are immutable, but for unknown caps it will simply assume that the
+ cap can be stored, as described above. Note that an attacker would be
+ able to store any cap in an immutable directory, so this check when
+ creating the directory is only to help non-malicious clients to avoid
+ accidentally giving away more authority than intended.
+
+ A non-empty request body is mandatory, since after the directory is created,
+ it will not be possible to add more children to it.
+
+``POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir``
+
+``PUT /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir``
+
+ Create new directories as necessary to make sure that the named target
+ ($DIRCAP/SUBDIRS../SUBDIR) is a directory. This will create additional
+ intermediate mutable directories as necessary. If the named target directory
+ already exists, this will make no changes to it.
+
+ If the final directory is created, it will be empty.
+
+ This operation will return an error if a blocking file is present at any of
+ the parent names, preventing the server from creating the necessary parent
+ directory; or if it would require changing an immutable directory.
+
+ The write-cap of the new directory will be returned as the HTTP response
+ body.
+
+``POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir-with-children``
+
+ Like /uri?t=mkdir-with-children, but the final directory is created as a
+ child of an existing mutable directory. This will create additional
+ intermediate mutable directories as necessary. If the final directory is
+ created, it will be populated with initial children from the POST request
+ body, as described above.
+
+ This operation will return an error if a blocking file is present at any of
+ the parent names, preventing the server from creating the necessary parent
+ directory; or if it would require changing an immutable directory; or if
+ the immediate parent directory already has a a child named SUBDIR.
+
+``POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir-immutable``
+
+ Like /uri?t=mkdir-immutable, but the final directory is created as a child
+ of an existing mutable directory. The final directory will be deep-immutable,
+ and will be populated with the children specified as a JSON dictionary in
+ the POST request body.
+
+ In Tahoe 1.6 this operation creates intermediate mutable directories if
+ necessary, but that behaviour should not be relied on; see ticket #920.
+
+ This operation will return an error if the parent directory is immutable,
+ or already has a child named SUBDIR.
+
+``POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=NAME``
+
+ Create a new empty mutable directory and attach it to the given existing
+ directory. This will create additional intermediate directories as necessary.
+
+ This operation will return an error if a blocking file is present at any of
+ the parent names, preventing the server from creating the necessary parent
+ directory, or if it would require changing any immutable directory.
+
+ The URL of this operation points to the parent of the bottommost new directory,
+ whereas the /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir operation above has a URL
+ that points directly to the bottommost new directory.
+
+``POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir-with-children&name=NAME``
+
+ Like /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=NAME, but the new directory will
+ be populated with initial children via the POST request body. This command
+ will create additional intermediate mutable directories as necessary.
+
+ This operation will return an error if a blocking file is present at any of
+ the parent names, preventing the server from creating the necessary parent
+ directory; or if it would require changing an immutable directory; or if
+ the immediate parent directory already has a a child named NAME.
+
+ Note that the name= argument must be passed as a queryarg, because the POST
+ request body is used for the initial children JSON.
+
+``POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir-immutable&name=NAME``
+
+ Like /uri/$DIRCAP/[SUBDIRS../]?t=mkdir-with-children&name=NAME, but the
+ final directory will be deep-immutable. The children are specified as a
+ JSON dictionary in the POST request body. Again, the name= argument must be
+ passed as a queryarg.
+
+ In Tahoe 1.6 this operation creates intermediate mutable directories if
+ necessary, but that behaviour should not be relied on; see ticket #920.
+
+ This operation will return an error if the parent directory is immutable,
+ or already has a child named NAME.
+
+Get Information About A File Or Directory (as JSON)
+---------------------------------------------------
+
+``GET /uri/$FILECAP?t=json``
+
+``GET /uri/$DIRCAP?t=json``
+
+``GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json``
+
+``GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json``
+
+ This returns a machine-parseable JSON-encoded description of the given
+ object. The JSON always contains a list, and the first element of the list is
+ always a flag that indicates whether the referenced object is a file or a
+ directory. If it is a capability to a file, then the information includes
+ file size and URI, like this::
+
+ GET /uri/$FILECAP?t=json :
+
+ [ "filenode", {
+ "ro_uri": file_uri,
+ "verify_uri": verify_uri,
+ "size": bytes,
+ "mutable": false
+ } ]
+
+ If it is a capability to a directory followed by a path from that directory
+ to a file, then the information also includes metadata from the link to the
+ file in the parent directory, like this::
+
+ GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json
+
+ [ "filenode", {
+ "ro_uri": file_uri,
+ "verify_uri": verify_uri,
+ "size": bytes,
+ "mutable": false,
+ "metadata": {
+ "ctime": 1202777696.7564139,
+ "mtime": 1202777696.7564139,
+ "tahoe": {
+ "linkcrtime": 1202777696.7564139,
+ "linkmotime": 1202777696.7564139
+ } } } ]
+
+ If it is a directory, then it includes information about the children of
+ this directory, as a mapping from child name to a set of data about the
+ child (the same data that would appear in a corresponding GET?t=json of the
+ child itself). The child entries also include metadata about each child,
+ including link-creation- and link-change- timestamps. The output looks like
+ this::
+
+ GET /uri/$DIRCAP?t=json :
+ GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json :
+
+ [ "dirnode", {
+ "rw_uri": read_write_uri,
+ "ro_uri": read_only_uri,
+ "verify_uri": verify_uri,
+ "mutable": true,
+ "children": {
+ "foo.txt": [ "filenode", {
+ "ro_uri": uri,
+ "size": bytes,
+ "metadata": {
+ "ctime": 1202777696.7564139,
+ "mtime": 1202777696.7564139,
+ "tahoe": {
+ "linkcrtime": 1202777696.7564139,
+ "linkmotime": 1202777696.7564139
+ } } } ],
+ "subdir": [ "dirnode", {
+ "rw_uri": rwuri,
+ "ro_uri": rouri,
+ "metadata": {
+ "ctime": 1202778102.7589991,
+ "mtime": 1202778111.2160511,
+ "tahoe": {
+ "linkcrtime": 1202777696.7564139,
+ "linkmotime": 1202777696.7564139
+ } } } ]
+ } } ]
+
+ In the above example, note how 'children' is a dictionary in which the keys
+ are child names and the values depend upon whether the child is a file or a
+ directory. The value is mostly the same as the JSON representation of the
+ child object (except that directories do not recurse -- the "children"
+ entry of the child is omitted, and the directory view includes the metadata
+ that is stored on the directory edge).
+
+ The rw_uri field will be present in the information about a directory
+ if and only if you have read-write access to that directory. The verify_uri
+ field will be present if and only if the object has a verify-cap
+ (non-distributed LIT files do not have verify-caps).
+
+ If the cap is of an unknown format, then the file size and verify_uri will
+ not be available::
+
+ GET /uri/$UNKNOWNCAP?t=json :
+
+ [ "unknown", {
+ "ro_uri": unknown_read_uri
+ } ]
+
+ GET /uri/$DIRCAP/[SUBDIRS../]UNKNOWNCHILDNAME?t=json :
+
+ [ "unknown", {
+ "rw_uri": unknown_write_uri,
+ "ro_uri": unknown_read_uri,
+ "mutable": true,
+ "metadata": {
+ "ctime": 1202777696.7564139,
+ "mtime": 1202777696.7564139,
+ "tahoe": {
+ "linkcrtime": 1202777696.7564139,
+ "linkmotime": 1202777696.7564139
+ } } } ]
+
+ As in the case of file nodes, the metadata will only be present when the
+ capability is to a directory followed by a path. The "mutable" field is also
+ not always present; when it is absent, the mutability of the object is not
+ known.
+
+About the metadata
+``````````````````
+
+The value of the 'tahoe':'linkmotime' key is updated whenever a link to a
+child is set. The value of the 'tahoe':'linkcrtime' key is updated whenever
+a link to a child is created -- i.e. when there was not previously a link
+under that name.
+
+Note however, that if the edge in the Tahoe filesystem points to a mutable
+file and the contents of that mutable file is changed, then the
+'tahoe':'linkmotime' value on that edge will *not* be updated, since the
+edge itself wasn't updated -- only the mutable file was.
+
+The timestamps are represented as a number of seconds since the UNIX epoch
+(1970-01-01 00:00:00 UTC), with leap seconds not being counted in the long
+term.
+
+In Tahoe earlier than v1.4.0, 'mtime' and 'ctime' keys were populated
+instead of the 'tahoe':'linkmotime' and 'tahoe':'linkcrtime' keys. Starting
+in Tahoe v1.4.0, the 'linkmotime'/'linkcrtime' keys in the 'tahoe' sub-dict
+are populated. However, prior to Tahoe v1.7beta, a bug caused the 'tahoe'
+sub-dict to be deleted by webapi requests in which new metadata is
+specified, and not to be added to existing child links that lack it.
+
+From Tahoe v1.7.0 onward, the 'mtime' and 'ctime' fields are no longer
+populated or updated (see ticket #924), except by "tahoe backup" as
+explained below. For backward compatibility, when an existing link is
+updated and 'tahoe':'linkcrtime' is not present in the previous metadata
+but 'ctime' is, the old value of 'ctime' is used as the new value of
+'tahoe':'linkcrtime'.
+
+The reason we added the new fields in Tahoe v1.4.0 is that there is a
+"set_children" API (described below) which you can use to overwrite the
+values of the 'mtime'/'ctime' pair, and this API is used by the
+"tahoe backup" command (in Tahoe v1.3.0 and later) to set the 'mtime' and
+'ctime' values when backing up files from a local filesystem into the
+Tahoe filesystem. As of Tahoe v1.4.0, the set_children API cannot be used
+to set anything under the 'tahoe' key of the metadata dict -- if you
+include 'tahoe' keys in your 'metadata' arguments then it will silently
+ignore those keys.
+
+Therefore, if the 'tahoe' sub-dict is present, you can rely on the
+'linkcrtime' and 'linkmotime' values therein to have the semantics described
+above. (This is assuming that only official Tahoe clients have been used to
+write those links, and that their system clocks were set to what you expected
+-- there is nothing preventing someone from editing their Tahoe client or
+writing their own Tahoe client which would overwrite those values however
+they like, and there is nothing to constrain their system clock from taking
+any value.)
+
+When an edge is created or updated by "tahoe backup", the 'mtime' and
+'ctime' keys on that edge are set as follows:
+
+* 'mtime' is set to the timestamp read from the local filesystem for the
+ "mtime" of the local file in question, which means the last time the
+ contents of that file were changed.
+
+* On Windows, 'ctime' is set to the creation timestamp for the file
+ read from the local filesystem. On other platforms, 'ctime' is set to
+ the UNIX "ctime" of the local file, which means the last time that
+ either the contents or the metadata of the local file was changed.
+
+There are several ways that the 'ctime' field could be confusing:
+
+1. You might be confused about whether it reflects the time of the creation
+ of a link in the Tahoe filesystem (by a version of Tahoe < v1.7.0) or a
+ timestamp copied in by "tahoe backup" from a local filesystem.
+
+2. You might be confused about whether it is a copy of the file creation
+ time (if "tahoe backup" was run on a Windows system) or of the last
+ contents-or-metadata change (if "tahoe backup" was run on a different
+ operating system).
+
+3. You might be confused by the fact that changing the contents of a
+ mutable file in Tahoe doesn't have any effect on any links pointing at
+ that file in any directories, although "tahoe backup" sets the link
+ 'ctime'/'mtime' to reflect timestamps about the local file corresponding
+ to the Tahoe file to which the link points.
+
+4. Also, quite apart from Tahoe, you might be confused about the meaning
+ of the "ctime" in UNIX local filesystems, which people sometimes think
+ means file creation time, but which actually means, in UNIX local
+ filesystems, the most recent time that the file contents or the file
+ metadata (such as owner, permission bits, extended attributes, etc.)
+ has changed. Note that although "ctime" does not mean file creation time
+ in UNIX, links created by a version of Tahoe prior to v1.7.0, and never
+ written by "tahoe backup", will have 'ctime' set to the link creation
+ time.
+
+
+Attaching an existing File or Directory by its read- or write-cap
+-----------------------------------------------------------------
+
+``PUT /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri``
+
+ This attaches a child object (either a file or directory) to a specified
+ location in the virtual filesystem. The child object is referenced by its
+ read- or write- cap, as provided in the HTTP request body. This will create
+ intermediate directories as necessary.
+
+ This is similar to a UNIX hardlink: by referencing a previously-uploaded file
+ (or previously-created directory) instead of uploading/creating a new one,
+ you can create two references to the same object.
+
+ The read- or write- cap of the child is provided in the body of the HTTP
+ request, and this same cap is returned in the response body.
+
+ The default behavior is to overwrite any existing object at the same
+ location. To prevent this (and make the operation return an error instead
+ of overwriting), add a "replace=false" argument, as "?t=uri&replace=false".
+ With replace=false, this operation will return an HTTP 409 "Conflict" error
+ if there is already an object at the given location, rather than
+ overwriting the existing object. To allow the operation to overwrite a
+ file, but return an error when trying to overwrite a directory, use
+ "replace=only-files" (this behavior is closer to the traditional UNIX "mv"
+ command). Note that "true", "t", and "1" are all synonyms for "True", and
+ "false", "f", and "0" are synonyms for "False", and the parameter is
+ case-insensitive.
+
+ Note that this operation does not take its child cap in the form of
+ separate "rw_uri" and "ro_uri" fields. Therefore, it cannot accept a
+ child cap in a format unknown to the webapi server, unless its URI
+ starts with "ro." or "imm.". This restriction is necessary because the
+ server is not able to attenuate an unknown write cap to a read cap.
+ Unknown URIs starting with "ro." or "imm.", on the other hand, are
+ assumed to represent read caps. The client should not prefix a write
+ cap with "ro." or "imm." and pass it to this operation, since that
+ would result in granting the cap's write authority to holders of the
+ directory read cap.
+
+Adding multiple files or directories to a parent directory at once
+------------------------------------------------------------------
+
+``POST /uri/$DIRCAP/[SUBDIRS..]?t=set_children``
+
+``POST /uri/$DIRCAP/[SUBDIRS..]?t=set-children`` (Tahoe >= v1.6)
+
+ This command adds multiple children to a directory in a single operation.
+ It reads the request body and interprets it as a JSON-encoded description
+ of the child names and read/write-caps that should be added.
+
+ The body should be a JSON-encoded dictionary, in the same format as the
+ "children" value returned by the "GET /uri/$DIRCAP?t=json" operation
+ described above. In this format, each key is a child names, and the
+ corresponding value is a tuple of (type, childinfo). "type" is ignored, and
+ "childinfo" is a dictionary that contains "rw_uri", "ro_uri", and
+ "metadata" keys. You can take the output of "GET /uri/$DIRCAP1?t=json" and
+ use it as the input to "POST /uri/$DIRCAP2?t=set_children" to make DIR2
+ look very much like DIR1 (except for any existing children of DIR2 that
+ were not overwritten, and any existing "tahoe" metadata keys as described
+ below).
+
+ When the set_children request contains a child name that already exists in
+ the target directory, this command defaults to overwriting that child with
+ the new value (both child cap and metadata, but if the JSON data does not
+ contain a "metadata" key, the old child's metadata is preserved). The
+ command takes a boolean "overwrite=" query argument to control this
+ behavior. If you use "?t=set_children&overwrite=false", then an attempt to
+ replace an existing child will instead cause an error.
+
+ Any "tahoe" key in the new child's "metadata" value is ignored. Any
+ existing "tahoe" metadata is preserved. The metadata["tahoe"] value is
+ reserved for metadata generated by the tahoe node itself. The only two keys
+ currently placed here are "linkcrtime" and "linkmotime". For details, see
+ the section above entitled "Get Information About A File Or Directory (as
+ JSON)", in the "About the metadata" subsection.
+
+ Note that this command was introduced with the name "set_children", which
+ uses an underscore rather than a hyphen as other multi-word command names
+ do. The variant with a hyphen is now accepted, but clients that desire
+ backward compatibility should continue to use "set_children".
+
+
+Deleting a File or Directory
+----------------------------
+
+``DELETE /uri/$DIRCAP/[SUBDIRS../]CHILDNAME``
+
+ This removes the given name from its parent directory. CHILDNAME is the
+ name to be removed, and $DIRCAP/SUBDIRS.. indicates the directory that will
+ be modified.
+
+ Note that this does not actually delete the file or directory that the name
+ points to from the tahoe grid -- it only removes the named reference from
+ this directory. If there are other names in this directory or in other
+ directories that point to the resource, then it will remain accessible
+ through those paths. Even if all names pointing to this object are removed
+ from their parent directories, then someone with possession of its read-cap
+ can continue to access the object through that cap.
+
+ The object will only become completely unreachable once 1: there are no
+ reachable directories that reference it, and 2: nobody is holding a read-
+ or write- cap to the object. (This behavior is very similar to the way
+ hardlinks and anonymous files work in traditional UNIX filesystems).
+
+ This operation will not modify more than a single directory. Intermediate
+ directories which were implicitly created by PUT or POST methods will *not*
+ be automatically removed by DELETE.
+
+ This method returns the file- or directory- cap of the object that was just
+ removed.
+
+Browser Operations: Human-oriented interfaces
+=============================================
+
+This section describes the HTTP operations that provide support for humans
+running a web browser. Most of these operations use HTML forms that use POST
+to drive the Tahoe node. This section is intended for HTML authors who want
+to write web pages that contain forms and buttons which manipulate the Tahoe
+filesystem.
+
+Note that for all POST operations, the arguments listed can be provided
+either as URL query arguments or as form body fields. URL query arguments are
+separated from the main URL by "?", and from each other by "&". For example,
+"POST /uri/$DIRCAP?t=upload&mutable=true". Form body fields are usually
+specified by using <input type="hidden"> elements. For clarity, the
+descriptions below display the most significant arguments as URL query args.
+
+Viewing A Directory (as HTML)
+-----------------------------
+
+``GET /uri/$DIRCAP/[SUBDIRS../]``
+
+ This returns an HTML page, intended to be displayed to a human by a web
+ browser, which contains HREF links to all files and directories reachable
+ from this directory. These HREF links do not have a t= argument, meaning
+ that a human who follows them will get pages also meant for a human. It also
+ contains forms to upload new files, and to delete files and directories.
+ Those forms use POST methods to do their job.
+
+Viewing/Downloading a File
+--------------------------
+
+``GET /uri/$FILECAP``
+
+``GET /uri/$DIRCAP/[SUBDIRS../]FILENAME``
+
+ This will retrieve the contents of the given file. The HTTP response body
+ will contain the sequence of bytes that make up the file.
+
+ If you want the HTTP response to include a useful Content-Type header,
+ either use the second form (which starts with a $DIRCAP), or add a
+ "filename=foo" query argument, like "GET /uri/$FILECAP?filename=foo.jpg".
+ The bare "GET /uri/$FILECAP" does not give the Tahoe node enough information
+ to determine a Content-Type (since Tahoe immutable files are merely
+ sequences of bytes, not typed+named file objects).
+
+ If the URL has both filename= and "save=true" in the query arguments, then
+ the server to add a "Content-Disposition: attachment" header, along with a
+ filename= parameter. When a user clicks on such a link, most browsers will
+ offer to let the user save the file instead of displaying it inline (indeed,
+ most browsers will refuse to display it inline). "true", "t", "1", and other
+ case-insensitive equivalents are all treated the same.
+
+ Character-set handling in URLs and HTTP headers is a dubious art [1]_. For
+ maximum compatibility, Tahoe simply copies the bytes from the filename=
+ argument into the Content-Disposition header's filename= parameter, without
+ trying to interpret them in any particular way.
+
+
+``GET /named/$FILECAP/FILENAME``
+
+ This is an alternate download form which makes it easier to get the correct
+ filename. The Tahoe server will provide the contents of the given file, with
+ a Content-Type header derived from the given filename. This form is used to
+ get browsers to use the "Save Link As" feature correctly, and also helps
+ command-line tools like "wget" and "curl" use the right filename. Note that
+ this form can *only* be used with file caps; it is an error to use a
+ directory cap after the /named/ prefix.
+
+Get Information About A File Or Directory (as HTML)
+---------------------------------------------------
+
+``GET /uri/$FILECAP?t=info``
+
+``GET /uri/$DIRCAP/?t=info``
+
+``GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR/?t=info``
+
+``GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=info``
+
+ This returns a human-oriented HTML page with more detail about the selected
+ file or directory object. This page contains the following items:
+
+ * object size
+ * storage index
+ * JSON representation
+ * raw contents (text/plain)
+ * access caps (URIs): verify-cap, read-cap, write-cap (for mutable objects)
+ * check/verify/repair form
+ * deep-check/deep-size/deep-stats/manifest (for directories)
+ * replace-conents form (for mutable files)
+
+Creating a Directory
+--------------------
+
+``POST /uri?t=mkdir``
+
+ This creates a new empty directory, but does not attach it to the virtual
+ filesystem.
+
+ If a "redirect_to_result=true" argument is provided, then the HTTP response
+ will cause the web browser to be redirected to a /uri/$DIRCAP page that
+ gives access to the newly-created directory. If you bookmark this page,
+ you'll be able to get back to the directory again in the future. This is the
+ recommended way to start working with a Tahoe server: create a new unlinked
+ directory (using redirect_to_result=true), then bookmark the resulting
+ /uri/$DIRCAP page. There is a "create directory" button on the Welcome page
+ to invoke this action.
+
+ If "redirect_to_result=true" is not provided (or is given a value of
+ "false"), then the HTTP response body will simply be the write-cap of the
+ new directory.
+
+``POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=CHILDNAME``
+
+ This creates a new empty directory as a child of the designated SUBDIR. This
+ will create additional intermediate directories as necessary.
+
+ If a "when_done=URL" argument is provided, the HTTP response will cause the
+ web browser to redirect to the given URL. This provides a convenient way to
+ return the browser to the directory that was just modified. Without a
+ when_done= argument, the HTTP response will simply contain the write-cap of
+ the directory that was just created.
+
+
+Uploading a File
+----------------
+
+``POST /uri?t=upload``
+
+ This uploads a file, and produces a file-cap for the contents, but does not
+ attach the file into the filesystem. No directories will be modified by
+ this operation.
+
+ The file must be provided as the "file" field of an HTML encoded form body,
+ produced in response to an HTML form like this::
+
+ <form action="/uri" method="POST" enctype="multipart/form-data">
+ <input type="hidden" name="t" value="upload" />
+ <input type="file" name="file" />
+ <input type="submit" value="Upload Unlinked" />
+ </form>
+
+ If a "when_done=URL" argument is provided, the response body will cause the
+ browser to redirect to the given URL. If the when_done= URL has the string
+ "%(uri)s" in it, that string will be replaced by a URL-escaped form of the
+ newly created file-cap. (Note that without this substitution, there is no
+ way to access the file that was just uploaded).
+
+ The default (in the absence of when_done=) is to return an HTML page that
+ describes the results of the upload. This page will contain information
+ about which storage servers were used for the upload, how long each
+ operation took, etc.
+
+ If a "mutable=true" argument is provided, the operation will create a
+ mutable file, and the response body will contain the write-cap instead of
+ the upload results page. The default is to create an immutable file,
+ returning the upload results page as a response.
+
+
+``POST /uri/$DIRCAP/[SUBDIRS../]?t=upload``
+
+ This uploads a file, and attaches it as a new child of the given directory,
+ which must be mutable. The file must be provided as the "file" field of an
+ HTML-encoded form body, produced in response to an HTML form like this::
+
+ <form action="." method="POST" enctype="multipart/form-data">
+ <input type="hidden" name="t" value="upload" />
+ <input type="file" name="file" />
+ <input type="submit" value="Upload" />
+ </form>
+
+ A "name=" argument can be provided to specify the new child's name,
+ otherwise it will be taken from the "filename" field of the upload form
+ (most web browsers will copy the last component of the original file's
+ pathname into this field). To avoid confusion, name= is not allowed to
+ contain a slash.
+
+ If there is already a child with that name, and it is a mutable file, then
+ its contents are replaced with the data being uploaded. If it is not a
+ mutable file, the default behavior is to remove the existing child before
+ creating a new one. To prevent this (and make the operation return an error
+ instead of overwriting the old child), add a "replace=false" argument, as
+ "?t=upload&replace=false". With replace=false, this operation will return an
+ HTTP 409 "Conflict" error if there is already an object at the given
+ location, rather than overwriting the existing object. Note that "true",
+ "t", and "1" are all synonyms for "True", and "false", "f", and "0" are
+ synonyms for "False". the parameter is case-insensitive.
+
+ This will create additional intermediate directories as necessary, although
+ since it is expected to be triggered by a form that was retrieved by "GET
+ /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will
+ already exist.
+
+ If a "mutable=true" argument is provided, any new file that is created will
+ be a mutable file instead of an immutable one. <input type="checkbox"
+ name="mutable" /> will give the user a way to set this option.
+
+ If a "when_done=URL" argument is provided, the HTTP response will cause the
+ web browser to redirect to the given URL. This provides a convenient way to
+ return the browser to the directory that was just modified. Without a
+ when_done= argument, the HTTP response will simply contain the file-cap of
+ the file that was just uploaded (a write-cap for mutable files, or a
+ read-cap for immutable files).
+
+``POST /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=upload``
+
+ This also uploads a file and attaches it as a new child of the given
+ directory, which must be mutable. It is a slight variant of the previous
+ operation, as the URL refers to the target file rather than the parent
+ directory. It is otherwise identical: this accepts mutable= and when_done=
+ arguments too.
+
+``POST /uri/$FILECAP?t=upload``
+
+ This modifies the contents of an existing mutable file in-place. An error is
+ signalled if $FILECAP does not refer to a mutable file. It behaves just like
+ the "PUT /uri/$FILECAP" form, but uses a POST for the benefit of HTML forms
+ in a web browser.
+
+Attaching An Existing File Or Directory (by URI)
+------------------------------------------------
+
+``POST /uri/$DIRCAP/[SUBDIRS../]?t=uri&name=CHILDNAME&uri=CHILDCAP``
+
+ This attaches a given read- or write- cap "CHILDCAP" to the designated
+ directory, with a specified child name. This behaves much like the PUT t=uri
+ operation, and is a lot like a UNIX hardlink. It is subject to the same
+ restrictions as that operation on the use of cap formats unknown to the
+ webapi server.
+
+ This will create additional intermediate directories as necessary, although
+ since it is expected to be triggered by a form that was retrieved by "GET
+ /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will
+ already exist.
+
+ This accepts the same replace= argument as POST t=upload.
+
+Deleting A Child
+----------------
+
+``POST /uri/$DIRCAP/[SUBDIRS../]?t=delete&name=CHILDNAME``
+
+ This instructs the node to remove a child object (file or subdirectory) from
+ the given directory, which must be mutable. Note that the entire subtree is
+ unlinked from the parent. Unlike deleting a subdirectory in a UNIX local
+ filesystem, the subtree need not be empty; if it isn't, then other references
+ into the subtree will see that the child subdirectories are not modified by
+ this operation. Only the link from the given directory to its child is severed.
+
+Renaming A Child
+----------------
+
+``POST /uri/$DIRCAP/[SUBDIRS../]?t=rename&from_name=OLD&to_name=NEW``
+
+ This instructs the node to rename a child of the given directory, which must
+ be mutable. This has a similar effect to removing the child, then adding the
+ same child-cap under the new name, except that it preserves metadata. This
+ operation cannot move the child to a different directory.
+
+ This operation will replace any existing child of the new name, making it
+ behave like the UNIX "``mv -f``" command.
+
+Other Utilities
+---------------
+
+``GET /uri?uri=$CAP``
+
+ This causes a redirect to /uri/$CAP, and retains any additional query
+ arguments (like filename= or save=). This is for the convenience of web
+ forms which allow the user to paste in a read- or write- cap (obtained
+ through some out-of-band channel, like IM or email).
+
+ Note that this form merely redirects to the specific file or directory
+ indicated by the $CAP: unlike the GET /uri/$DIRCAP form, you cannot
+ traverse to children by appending additional path segments to the URL.
+
+``GET /uri/$DIRCAP/[SUBDIRS../]?t=rename-form&name=$CHILDNAME``
+
+ This provides a useful facility to browser-based user interfaces. It
+ returns a page containing a form targetting the "POST $DIRCAP t=rename"
+ functionality described above, with the provided $CHILDNAME present in the
+ 'from_name' field of that form. I.e. this presents a form offering to
+ rename $CHILDNAME, requesting the new name, and submitting POST rename.
+
+``GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri``
+
+ This returns the file- or directory- cap for the specified object.
+
+``GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=readonly-uri``
+
+ This returns a read-only file- or directory- cap for the specified object.
+ If the object is an immutable file, this will return the same value as
+ t=uri.
+
+Debugging and Testing Features
+------------------------------
+
+These URLs are less-likely to be helpful to the casual Tahoe user, and are
+mainly intended for developers.
+
+``POST $URL?t=check``
+
+ This triggers the FileChecker to determine the current "health" of the
+ given file or directory, by counting how many shares are available. The
+ page that is returned will display the results. This can be used as a "show
+ me detailed information about this file" page.
+
+ If a verify=true argument is provided, the node will perform a more
+ intensive check, downloading and verifying every single bit of every share.
+
+ If an add-lease=true argument is provided, the node will also add (or
+ renew) a lease to every share it encounters. Each lease will keep the share
+ alive for a certain period of time (one month by default). Once the last
+ lease expires or is explicitly cancelled, the storage server is allowed to
+ delete the share.
+
+ If an output=JSON argument is provided, the response will be
+ machine-readable JSON instead of human-oriented HTML. The data is a
+ dictionary with the following keys::
+
+ storage-index: a base32-encoded string with the objects's storage index,
+ or an empty string for LIT files
+ summary: a string, with a one-line summary of the stats of the file
+ results: a dictionary that describes the state of the file. For LIT files,
+ this dictionary has only the 'healthy' key, which will always be
+ True. For distributed files, this dictionary has the following
+ keys:
+ count-shares-good: the number of good shares that were found
+ count-shares-needed: 'k', the number of shares required for recovery
+ count-shares-expected: 'N', the number of total shares generated
+ count-good-share-hosts: this was intended to be the number of distinct
+ storage servers with good shares. It is currently
+ (as of Tahoe-LAFS v1.8.0) computed incorrectly;
+ see ticket #1115.
+ count-wrong-shares: for mutable files, the number of shares for
+ versions other than the 'best' one (highest
+ sequence number, highest roothash). These are
+ either old ...
+ count-recoverable-versions: for mutable files, the number of
+ recoverable versions of the file. For
+ a healthy file, this will equal 1.
+ count-unrecoverable-versions: for mutable files, the number of
+ unrecoverable versions of the file.
+ For a healthy file, this will be 0.
+ count-corrupt-shares: the number of shares with integrity failures
+ list-corrupt-shares: a list of "share locators", one for each share
+ that was found to be corrupt. Each share locator
+ is a list of (serverid, storage_index, sharenum).
+ needs-rebalancing: (bool) True if there are multiple shares on a single
+ storage server, indicating a reduction in reliability
+ that could be resolved by moving shares to new
+ servers.
+ servers-responding: list of base32-encoded storage server identifiers,
+ one for each server which responded to the share
+ query.
+ healthy: (bool) True if the file is completely healthy, False otherwise.
+ Healthy files have at least N good shares. Overlapping shares
+ do not currently cause a file to be marked unhealthy. If there
+ are at least N good shares, then corrupt shares do not cause the
+ file to be marked unhealthy, although the corrupt shares will be
+ listed in the results (list-corrupt-shares) and should be manually
+ removed to wasting time in subsequent downloads (as the
+ downloader rediscovers the corruption and uses alternate shares).
+ Future compatibility: the meaning of this field may change to
+ reflect whether the servers-of-happiness criterion is met
+ (see ticket #614).
+ sharemap: dict mapping share identifier to list of serverids
+ (base32-encoded strings). This indicates which servers are
+ holding which shares. For immutable files, the shareid is
+ an integer (the share number, from 0 to N-1). For
+ immutable files, it is a string of the form
+ 'seq%d-%s-sh%d', containing the sequence number, the
+ roothash, and the share number.
+
+``POST $URL?t=start-deep-check`` (must add &ophandle=XYZ)
+
+ This initiates a recursive walk of all files and directories reachable from
+ the target, performing a check on each one just like t=check. The result
+ page will contain a summary of the results, including details on any
+ file/directory that was not fully healthy.
+
+ t=start-deep-check can only be invoked on a directory. An error (400
+ BAD_REQUEST) will be signalled if it is invoked on a file. The recursive
+ walker will deal with loops safely.
+
+ This accepts the same verify= and add-lease= arguments as t=check.
+
+ Since this operation can take a long time (perhaps a second per object),
+ the ophandle= argument is required (see "Slow Operations, Progress, and
+ Cancelling" above). The response to this POST will be a redirect to the
+ corresponding /operations/$HANDLE page (with output=HTML or output=JSON to
+ match the output= argument given to the POST). The deep-check operation
+ will continue to run in the background, and the /operations page should be
+ used to find out when the operation is done.
+
+ Detailed check results for non-healthy files and directories will be
+ available under /operations/$HANDLE/$STORAGEINDEX, and the HTML status will
+ contain links to these detailed results.
+
+ The HTML /operations/$HANDLE page for incomplete operations will contain a
+ meta-refresh tag, set to 60 seconds, so that a browser which uses
+ deep-check will automatically poll until the operation has completed.
+
+ The JSON page (/options/$HANDLE?output=JSON) will contain a
+ machine-readable JSON dictionary with the following keys::
+
+ finished: a boolean, True if the operation is complete, else False. Some
+ of the remaining keys may not be present until the operation
+ is complete.
+ root-storage-index: a base32-encoded string with the storage index of the
+ starting point of the deep-check operation
+ count-objects-checked: count of how many objects were checked. Note that
+ non-distributed objects (i.e. small immutable LIT
+ files) are not checked, since for these objects,
+ the data is contained entirely in the URI.
+ count-objects-healthy: how many of those objects were completely healthy
+ count-objects-unhealthy: how many were damaged in some way
+ count-corrupt-shares: how many shares were found to have corruption,
+ summed over all objects examined
+ list-corrupt-shares: a list of "share identifiers", one for each share
+ that was found to be corrupt. Each share identifier
+ is a list of (serverid, storage_index, sharenum).
+ list-unhealthy-files: a list of (pathname, check-results) tuples, for
+ each file that was not fully healthy. 'pathname' is
+ a list of strings (which can be joined by "/"
+ characters to turn it into a single string),
+ relative to the directory on which deep-check was
+ invoked. The 'check-results' field is the same as
+ that returned by t=check&output=JSON, described
+ above.
+ stats: a dictionary with the same keys as the t=start-deep-stats command
+ (described below)
+
+``POST $URL?t=stream-deep-check``
+
+ This initiates a recursive walk of all files and directories reachable from
+ the target, performing a check on each one just like t=check. For each
+ unique object (duplicates are skipped), a single line of JSON is emitted to
+ the HTTP response channel (or an error indication, see below). When the walk
+ is complete, a final line of JSON is emitted which contains the accumulated
+ file-size/count "deep-stats" data.
+
+ This command takes the same arguments as t=start-deep-check.
+
+ A CLI tool can split the response stream on newlines into "response units",
+ and parse each response unit as JSON. Each such parsed unit will be a
+ dictionary, and will contain at least the "type" key: a string, one of
+ "file", "directory", or "stats".
+
+ For all units that have a type of "file" or "directory", the dictionary will
+ contain the following keys::
+
+ "path": a list of strings, with the path that is traversed to reach the
+ object
+ "cap": a write-cap URI for the file or directory, if available, else a
+ read-cap URI
+ "verifycap": a verify-cap URI for the file or directory
+ "repaircap": an URI for the weakest cap that can still be used to repair
+ the object
+ "storage-index": a base32 storage index for the object
+ "check-results": a copy of the dictionary which would be returned by
+ t=check&output=json, with three top-level keys:
+ "storage-index", "summary", and "results", and a variety
+ of counts and sharemaps in the "results" value.
+
+ Note that non-distributed files (i.e. LIT files) will have values of None
+ for verifycap, repaircap, and storage-index, since these files can neither
+ be verified nor repaired, and are not stored on the storage servers.
+ Likewise the check-results dictionary will be limited: an empty string for
+ storage-index, and a results dictionary with only the "healthy" key.
+
+ The last unit in the stream will have a type of "stats", and will contain
+ the keys described in the "start-deep-stats" operation, below.
+
+ If any errors occur during the traversal (specifically if a directory is
+ unrecoverable, such that further traversal is not possible), an error
+ indication is written to the response body, instead of the usual line of
+ JSON. This error indication line will begin with the string "ERROR:" (in all
+ caps), and contain a summary of the error on the rest of the line. The
+ remaining lines of the response body will be a python exception. The client
+ application should look for the ERROR: and stop processing JSON as soon as
+ it is seen. Note that neither a file being unrecoverable nor a directory
+ merely being unhealthy will cause traversal to stop. The line just before
+ the ERROR: will describe the directory that was untraversable, since the
+ unit is emitted to the HTTP response body before the child is traversed.
+
+
+``POST $URL?t=check&repair=true``
+
+ This performs a health check of the given file or directory, and if the
+ checker determines that the object is not healthy (some shares are missing
+ or corrupted), it will perform a "repair". During repair, any missing
+ shares will be regenerated and uploaded to new servers.
+
+ This accepts the same verify=true and add-lease= arguments as t=check. When
+ an output=JSON argument is provided, the machine-readable JSON response
+ will contain the following keys::
+
+ storage-index: a base32-encoded string with the objects's storage index,
+ or an empty string for LIT files
+ repair-attempted: (bool) True if repair was attempted
+ repair-successful: (bool) True if repair was attempted and the file was
+ fully healthy afterwards. False if no repair was
+ attempted, or if a repair attempt failed.
+ pre-repair-results: a dictionary that describes the state of the file
+ before any repair was performed. This contains exactly
+ the same keys as the 'results' value of the t=check
+ response, described above.
+ post-repair-results: a dictionary that describes the state of the file
+ after any repair was performed. If no repair was
+ performed, post-repair-results and pre-repair-results
+ will be the same. This contains exactly the same keys
+ as the 'results' value of the t=check response,
+ described above.
+
+``POST $URL?t=start-deep-check&repair=true`` (must add &ophandle=XYZ)
+
+ This triggers a recursive walk of all files and directories, performing a
+ t=check&repair=true on each one.
+
+ Like t=start-deep-check without the repair= argument, this can only be
+ invoked on a directory. An error (400 BAD_REQUEST) will be signalled if it
+ is invoked on a file. The recursive walker will deal with loops safely.
+
+ This accepts the same verify= and add-lease= arguments as
+ t=start-deep-check. It uses the same ophandle= mechanism as
+ start-deep-check. When an output=JSON argument is provided, the response
+ will contain the following keys::
+
+ finished: (bool) True if the operation has completed, else False
+ root-storage-index: a base32-encoded string with the storage index of the
+ starting point of the deep-check operation
+ count-objects-checked: count of how many objects were checked
+
+ count-objects-healthy-pre-repair: how many of those objects were completely
+ healthy, before any repair
+ count-objects-unhealthy-pre-repair: how many were damaged in some way
+ count-objects-healthy-post-repair: how many of those objects were completely
+ healthy, after any repair
+ count-objects-unhealthy-post-repair: how many were damaged in some way
+
+ count-repairs-attempted: repairs were attempted on this many objects.
+ count-repairs-successful: how many repairs resulted in healthy objects
+ count-repairs-unsuccessful: how many repairs resulted did not results in
+ completely healthy objects
+ count-corrupt-shares-pre-repair: how many shares were found to have
+ corruption, summed over all objects
+ examined, before any repair
+ count-corrupt-shares-post-repair: how many shares were found to have
+ corruption, summed over all objects
+ examined, after any repair
+ list-corrupt-shares: a list of "share identifiers", one for each share
+ that was found to be corrupt (before any repair).
+ Each share identifier is a list of (serverid,
+ storage_index, sharenum).
+ list-remaining-corrupt-shares: like list-corrupt-shares, but mutable shares
+ that were successfully repaired are not
+ included. These are shares that need
+ manual processing. Since immutable shares
+ cannot be modified by clients, all corruption
+ in immutable shares will be listed here.
+ list-unhealthy-files: a list of (pathname, check-results) tuples, for
+ each file that was not fully healthy. 'pathname' is
+ relative to the directory on which deep-check was
+ invoked. The 'check-results' field is the same as
+ that returned by t=check&repair=true&output=JSON,
+ described above.
+ stats: a dictionary with the same keys as the t=start-deep-stats command
+ (described below)
+
+``POST $URL?t=stream-deep-check&repair=true``
+
+ This triggers a recursive walk of all files and directories, performing a
+ t=check&repair=true on each one. For each unique object (duplicates are
+ skipped), a single line of JSON is emitted to the HTTP response channel (or
+ an error indication). When the walk is complete, a final line of JSON is
+ emitted which contains the accumulated file-size/count "deep-stats" data.
+
+ This emits the same data as t=stream-deep-check (without the repair=true),
+ except that the "check-results" field is replaced with a
+ "check-and-repair-results" field, which contains the keys returned by
+ t=check&repair=true&output=json (i.e. repair-attempted, repair-successful,
+ pre-repair-results, and post-repair-results). The output does not contain
+ the summary dictionary that is provied by t=start-deep-check&repair=true
+ (the one with count-objects-checked and list-unhealthy-files), since the
+ receiving client is expected to calculate those values itself from the
+ stream of per-object check-and-repair-results.
+
+ Note that the "ERROR:" indication will only be emitted if traversal stops,
+ which will only occur if an unrecoverable directory is encountered. If a
+ file or directory repair fails, the traversal will continue, and the repair
+ failure will be indicated in the JSON data (in the "repair-successful" key).
+
+``POST $DIRURL?t=start-manifest`` (must add &ophandle=XYZ)
+
+ This operation generates a "manfest" of the given directory tree, mostly
+ for debugging. This is a table of (path, filecap/dircap), for every object
+ reachable from the starting directory. The path will be slash-joined, and
+ the filecap/dircap will contain a link to the object in question. This page
+ gives immediate access to every object in the virtual filesystem subtree.
+
+ This operation uses the same ophandle= mechanism as deep-check. The
+ corresponding /operations/$HANDLE page has three different forms. The
+ default is output=HTML.
+
+ If output=text is added to the query args, the results will be a text/plain
+ list. The first line is special: it is either "finished: yes" or "finished:
+ no"; if the operation is not finished, you must periodically reload the
+ page until it completes. The rest of the results are a plaintext list, with
+ one file/dir per line, slash-separated, with the filecap/dircap separated
+ by a space.
+
+ If output=JSON is added to the queryargs, then the results will be a
+ JSON-formatted dictionary with six keys. Note that because large directory
+ structures can result in very large JSON results, the full results will not
+ be available until the operation is complete (i.e. until output["finished"]
+ is True)::
+
+ finished (bool): if False then you must reload the page until True
+ origin_si (base32 str): the storage index of the starting point
+ manifest: list of (path, cap) tuples, where path is a list of strings.
+ verifycaps: list of (printable) verify cap strings
+ storage-index: list of (base32) storage index strings
+ stats: a dictionary with the same keys as the t=start-deep-stats command
+ (described below)
+
+``POST $DIRURL?t=start-deep-size`` (must add &ophandle=XYZ)
+
+ This operation generates a number (in bytes) containing the sum of the
+ filesize of all directories and immutable files reachable from the given
+ directory. This is a rough lower bound of the total space consumed by this
+ subtree. It does not include space consumed by mutable files, nor does it
+ take expansion or encoding overhead into account. Later versions of the
+ code may improve this estimate upwards.
+
+ The /operations/$HANDLE status output consists of two lines of text::
+
+ finished: yes
+ size: 1234
+
+``POST $DIRURL?t=start-deep-stats`` (must add &ophandle=XYZ)
+
+ This operation performs a recursive walk of all files and directories
+ reachable from the given directory, and generates a collection of
+ statistics about those objects.
+
+ The result (obtained from the /operations/$OPHANDLE page) is a
+ JSON-serialized dictionary with the following keys (note that some of these
+ keys may be missing until 'finished' is True)::
+
+ finished: (bool) True if the operation has finished, else False
+ count-immutable-files: count of how many CHK files are in the set
+ count-mutable-files: same, for mutable files (does not include directories)
+ count-literal-files: same, for LIT files (data contained inside the URI)
+ count-files: sum of the above three
+ count-directories: count of directories
+ count-unknown: count of unrecognized objects (perhaps from the future)
+ size-immutable-files: total bytes for all CHK files in the set, =deep-size
+ size-mutable-files (TODO): same, for current version of all mutable files
+ size-literal-files: same, for LIT files
+ size-directories: size of directories (includes size-literal-files)
+ size-files-histogram: list of (minsize, maxsize, count) buckets,
+ with a histogram of filesizes, 5dB/bucket,
+ for both literal and immutable files
+ largest-directory: number of children in the largest directory
+ largest-immutable-file: number of bytes in the largest CHK file
+
+ size-mutable-files is not implemented, because it would require extra
+ queries to each mutable file to get their size. This may be implemented in
+ the future.
+
+ Assuming no sharing, the basic space consumed by a single root directory is
+ the sum of size-immutable-files, size-mutable-files, and size-directories.
+ The actual disk space used by the shares is larger, because of the
+ following sources of overhead::
+
+ integrity data
+ expansion due to erasure coding
+ share management data (leases)
+ backend (ext3) minimum block size
+
+``POST $URL?t=stream-manifest``
+
+ This operation performs a recursive walk of all files and directories
+ reachable from the given starting point. For each such unique object
+ (duplicates are skipped), a single line of JSON is emitted to the HTTP
+ response channel (or an error indication, see below). When the walk is
+ complete, a final line of JSON is emitted which contains the accumulated
+ file-size/count "deep-stats" data.
+
+ A CLI tool can split the response stream on newlines into "response units",
+ and parse each response unit as JSON. Each such parsed unit will be a
+ dictionary, and will contain at least the "type" key: a string, one of
+ "file", "directory", or "stats".
+
+ For all units that have a type of "file" or "directory", the dictionary will
+ contain the following keys::
+
+ "path": a list of strings, with the path that is traversed to reach the
+ object
+ "cap": a write-cap URI for the file or directory, if available, else a
+ read-cap URI
+ "verifycap": a verify-cap URI for the file or directory
+ "repaircap": an URI for the weakest cap that can still be used to repair
+ the object
+ "storage-index": a base32 storage index for the object
+
+ Note that non-distributed files (i.e. LIT files) will have values of None
+ for verifycap, repaircap, and storage-index, since these files can neither
+ be verified nor repaired, and are not stored on the storage servers.
+
+ The last unit in the stream will have a type of "stats", and will contain
+ the keys described in the "start-deep-stats" operation, below.
+
+ If any errors occur during the traversal (specifically if a directory is
+ unrecoverable, such that further traversal is not possible), an error
+ indication is written to the response body, instead of the usual line of
+ JSON. This error indication line will begin with the string "ERROR:" (in all
+ caps), and contain a summary of the error on the rest of the line. The
+ remaining lines of the response body will be a python exception. The client
+ application should look for the ERROR: and stop processing JSON as soon as
+ it is seen. The line just before the ERROR: will describe the directory that
+ was untraversable, since the manifest entry is emitted to the HTTP response
+ body before the child is traversed.
+
+Other Useful Pages
+==================
+
+The portion of the web namespace that begins with "/uri" (and "/named") is
+dedicated to giving users (both humans and programs) access to the Tahoe
+virtual filesystem. The rest of the namespace provides status information
+about the state of the Tahoe node.
+
+``GET /`` (the root page)
+
+This is the "Welcome Page", and contains a few distinct sections::
+
+ Node information: library versions, local nodeid, services being provided.
+
+ Filesystem Access Forms: create a new directory, view a file/directory by
+ URI, upload a file (unlinked), download a file by
+ URI.
+
+ Grid Status: introducer information, helper information, connected storage
+ servers.
+
+``GET /status/``
+
+ This page lists all active uploads and downloads, and contains a short list
+ of recent upload/download operations. Each operation has a link to a page
+ that describes file sizes, servers that were involved, and the time consumed
+ in each phase of the operation.
+
+ A GET of /status/?t=json will contain a machine-readable subset of the same
+ data. It returns a JSON-encoded dictionary. The only key defined at this
+ time is "active", with a value that is a list of operation dictionaries, one
+ for each active operation. Once an operation is completed, it will no longer
+ appear in data["active"] .
+
+ Each op-dict contains a "type" key, one of "upload", "download",
+ "mapupdate", "publish", or "retrieve" (the first two are for immutable
+ files, while the latter three are for mutable files and directories).
+
+ The "upload" op-dict will contain the following keys::
+
+ type (string): "upload"
+ storage-index-string (string): a base32-encoded storage index
+ total-size (int): total size of the file
+ status (string): current status of the operation
+ progress-hash (float): 1.0 when the file has been hashed
+ progress-ciphertext (float): 1.0 when the file has been encrypted.
+ progress-encode-push (float): 1.0 when the file has been encoded and
+ pushed to the storage servers. For helper
+ uploads, the ciphertext value climbs to 1.0
+ first, then encoding starts. For unassisted
+ uploads, ciphertext and encode-push progress
+ will climb at the same pace.
+
+ The "download" op-dict will contain the following keys::
+
+ type (string): "download"
+ storage-index-string (string): a base32-encoded storage index
+ total-size (int): total size of the file
+ status (string): current status of the operation
+ progress (float): 1.0 when the file has been fully downloaded
+
+ Front-ends which want to report progress information are advised to simply
+ average together all the progress-* indicators. A slightly more accurate
+ value can be found by ignoring the progress-hash value (since the current
+ implementation hashes synchronously, so clients will probably never see
+ progress-hash!=1.0).
+
+``GET /provisioning/``
+
+ This page provides a basic tool to predict the likely storage and bandwidth
+ requirements of a large Tahoe grid. It provides forms to input things like
+ total number of users, number of files per user, average file size, number
+ of servers, expansion ratio, hard drive failure rate, etc. It then provides
+ numbers like how many disks per server will be needed, how many read
+ operations per second should be expected, and the likely MTBF for files in
+ the grid. This information is very preliminary, and the model upon which it
+ is based still needs a lot of work.
+
+``GET /helper_status/``
+
+ If the node is running a helper (i.e. if [helper]enabled is set to True in
+ tahoe.cfg), then this page will provide a list of all the helper operations
+ currently in progress. If "?t=json" is added to the URL, it will return a
+ JSON-formatted list of helper statistics, which can then be used to produce
+ graphs to indicate how busy the helper is.
+
+``GET /statistics/``
+
+ This page provides "node statistics", which are collected from a variety of
+ sources::
+
+ load_monitor: every second, the node schedules a timer for one second in
+ the future, then measures how late the subsequent callback
+ is. The "load_average" is this tardiness, measured in
+ seconds, averaged over the last minute. It is an indication
+ of a busy node, one which is doing more work than can be
+ completed in a timely fashion. The "max_load" value is the
+ highest value that has been seen in the last 60 seconds.
+
+ cpu_monitor: every minute, the node uses time.clock() to measure how much
+ CPU time it has used, and it uses this value to produce
+ 1min/5min/15min moving averages. These values range from 0%
+ (0.0) to 100% (1.0), and indicate what fraction of the CPU
+ has been used by the Tahoe node. Not all operating systems
+ provide meaningful data to time.clock(): they may report 100%
+ CPU usage at all times.
+
+ uploader: this counts how many immutable files (and bytes) have been
+ uploaded since the node was started
+
+ downloader: this counts how many immutable files have been downloaded
+ since the node was started
+
+ publishes: this counts how many mutable files (including directories) have
+ been modified since the node was started
+
+ retrieves: this counts how many mutable files (including directories) have
+ been read since the node was started
+
+ There are other statistics that are tracked by the node. The "raw stats"
+ section shows a formatted dump of all of them.
+
+ By adding "?t=json" to the URL, the node will return a JSON-formatted
+ dictionary of stats values, which can be used by other tools to produce
+ graphs of node behavior. The misc/munin/ directory in the source
+ distribution provides some tools to produce these graphs.
+
+``GET /`` (introducer status)
+
+ For Introducer nodes, the welcome page displays information about both
+ clients and servers which are connected to the introducer. Servers make
+ "service announcements", and these are listed in a table. Clients will
+ subscribe to hear about service announcements, and these subscriptions are
+ listed in a separate table. Both tables contain information about what
+ version of Tahoe is being run by the remote node, their advertised and
+ outbound IP addresses, their nodeid and nickname, and how long they have
+ been available.
+
+ By adding "?t=json" to the URL, the node will return a JSON-formatted
+ dictionary of stats values, which can be used to produce graphs of connected
+ clients over time. This dictionary has the following keys::
+
+ ["subscription_summary"] : a dictionary mapping service name (like
+ "storage") to an integer with the number of
+ clients that have subscribed to hear about that
+ service
+ ["announcement_summary"] : a dictionary mapping service name to an integer
+ with the number of servers which are announcing
+ that service
+ ["announcement_distinct_hosts"] : a dictionary mapping service name to an
+ integer which represents the number of
+ distinct hosts that are providing that
+ service. If two servers have announced
+ FURLs which use the same hostnames (but
+ different ports and tubids), they are
+ considered to be on the same host.
+
+
+Static Files in /public_html
+============================
+
+The webapi server will take any request for a URL that starts with /static
+and serve it from a configurable directory which defaults to
+$BASEDIR/public_html . This is configured by setting the "[node]web.static"
+value in $BASEDIR/tahoe.cfg . If this is left at the default value of
+"public_html", then http://localhost:3456/static/subdir/foo.html will be
+served with the contents of the file $BASEDIR/public_html/subdir/foo.html .
+
+This can be useful to serve a javascript application which provides a
+prettier front-end to the rest of the Tahoe webapi.
+
+
+Safety and security issues -- names vs. URIs
+============================================
+
+Summary: use explicit file- and dir- caps whenever possible, to reduce the
+potential for surprises when the filesystem structure is changed.
+
+Tahoe provides a mutable filesystem, but the ways that the filesystem can
+change are limited. The only thing that can change is that the mapping from
+child names to child objects that each directory contains can be changed by
+adding a new child name pointing to an object, removing an existing child name,
+or changing an existing child name to point to a different object.
+
+Obviously if you query Tahoe for information about the filesystem and then act
+to change the filesystem (such as by getting a listing of the contents of a
+directory and then adding a file to the directory), then the filesystem might
+have been changed after you queried it and before you acted upon it. However,
+if you use the URI instead of the pathname of an object when you act upon the
+object, then the only change that can happen is if the object is a directory
+then the set of child names it has might be different. If, on the other hand,
+you act upon the object using its pathname, then a different object might be in
+that place, which can result in more kinds of surprises.
+
+For example, suppose you are writing code which recursively downloads the
+contents of a directory. The first thing your code does is fetch the listing
+of the contents of the directory. For each child that it fetched, if that
+child is a file then it downloads the file, and if that child is a directory
+then it recurses into that directory. Now, if the download and the recurse
+actions are performed using the child's name, then the results might be
+wrong, because for example a child name that pointed to a sub-directory when
+you listed the directory might have been changed to point to a file (in which
+case your attempt to recurse into it would result in an error and the file
+would be skipped), or a child name that pointed to a file when you listed the
+directory might now point to a sub-directory (in which case your attempt to
+download the child would result in a file containing HTML text describing the
+sub-directory!).
+
+If your recursive algorithm uses the uri of the child instead of the name of
+the child, then those kinds of mistakes just can't happen. Note that both the
+child's name and the child's URI are included in the results of listing the
+parent directory, so it isn't any harder to use the URI for this purpose.
+
+The read and write caps in a given directory node are separate URIs, and
+can't be assumed to point to the same object even if they were retrieved in
+the same operation (although the webapi server attempts to ensure this
+in most cases). If you need to rely on that property, you should explicitly
+verify it. More generally, you should not make assumptions about the
+internal consistency of the contents of mutable directories. As a result
+of the signatures on mutable object versions, it is guaranteed that a given
+version was written in a single update, but -- as in the case of a file --
+the contents may have been chosen by a malicious writer in a way that is
+designed to confuse applications that rely on their consistency.
+
+In general, use names if you want "whatever object (whether file or
+directory) is found by following this name (or sequence of names) when my
+request reaches the server". Use URIs if you want "this particular object".
+
+Concurrency Issues
+==================
+
+Tahoe uses both mutable and immutable files. Mutable files can be created
+explicitly by doing an upload with ?mutable=true added, or implicitly by
+creating a new directory (since a directory is just a special way to
+interpret a given mutable file).
+
+Mutable files suffer from the same consistency-vs-availability tradeoff that
+all distributed data storage systems face. It is not possible to
+simultaneously achieve perfect consistency and perfect availability in the
+face of network partitions (servers being unreachable or faulty).
+
+Tahoe tries to achieve a reasonable compromise, but there is a basic rule in
+place, known as the Prime Coordination Directive: "Don't Do That". What this
+means is that if write-access to a mutable file is available to several
+parties, then those parties are responsible for coordinating their activities
+to avoid multiple simultaneous updates. This could be achieved by having
+these parties talk to each other and using some sort of locking mechanism, or
+by serializing all changes through a single writer.
+
+The consequences of performing uncoordinated writes can vary. Some of the
+writers may lose their changes, as somebody else wins the race condition. In
+many cases the file will be left in an "unhealthy" state, meaning that there
+are not as many redundant shares as we would like (reducing the reliability
+of the file against server failures). In the worst case, the file can be left
+in such an unhealthy state that no version is recoverable, even the old ones.
+It is this small possibility of data loss that prompts us to issue the Prime
+Coordination Directive.
+
+Tahoe nodes implement internal serialization to make sure that a single Tahoe
+node cannot conflict with itself. For example, it is safe to issue two
+directory modification requests to a single tahoe node's webapi server at the
+same time, because the Tahoe node will internally delay one of them until
+after the other has finished being applied. (This feature was introduced in
+Tahoe-1.1; back with Tahoe-1.0 the web client was responsible for serializing
+web requests themselves).
+
+For more details, please see the "Consistency vs Availability" and "The Prime
+Coordination Directive" sections of mutable.txt, in the same directory as
+this file.
+
+
+.. [1] URLs and HTTP and UTF-8, Oh My
+
+ HTTP does not provide a mechanism to specify the character set used to
+ encode non-ascii names in URLs (rfc2396#2.1). We prefer the convention that
+ the filename= argument shall be a URL-encoded UTF-8 encoded unicode object.
+ For example, suppose we want to provoke the server into using a filename of
+ "f i a n c e-acute e" (i.e. F I A N C U+00E9 E). The UTF-8 encoding of this
+ is 0x66 0x69 0x61 0x6e 0x63 0xc3 0xa9 0x65 (or "fianc\xC3\xA9e", as python's
+ repr() function would show). To encode this into a URL, the non-printable
+ characters must be escaped with the urlencode '%XX' mechansim, giving us
+ "fianc%C3%A9e". Thus, the first line of the HTTP request will be "GET
+ /uri/CAP...?save=true&filename=fianc%C3%A9e HTTP/1.1". Not all browsers
+ provide this: IE7 uses the Latin-1 encoding, which is fianc%E9e.
+
+ The response header will need to indicate a non-ASCII filename. The actual
+ mechanism to do this is not clear. For ASCII filenames, the response header
+ would look like::
+
+ Content-Disposition: attachment; filename="english.txt"
+
+ If Tahoe were to enforce the utf-8 convention, it would need to decode the
+ URL argument into a unicode string, and then encode it back into a sequence
+ of bytes when creating the response header. One possibility would be to use
+ unencoded utf-8. Developers suggest that IE7 might accept this::
+
+ #1: Content-Disposition: attachment; filename="fianc\xC3\xA9e"
+ (note, the last four bytes of that line, not including the newline, are
+ 0xC3 0xA9 0x65 0x22)
+
+ RFC2231#4 (dated 1997): suggests that the following might work, and some
+ developers (http://markmail.org/message/dsjyokgl7hv64ig3) have reported that
+ it is supported by firefox (but not IE7)::
+
+ #2: Content-Disposition: attachment; filename*=utf-8''fianc%C3%A9e
+
+ My reading of RFC2616#19.5.1 (which defines Content-Disposition) says that
+ the filename= parameter is defined to be wrapped in quotes (presumeably to
+ allow spaces without breaking the parsing of subsequent parameters), which
+ would give us::
+
+ #3: Content-Disposition: attachment; filename*=utf-8''"fianc%C3%A9e"
+
+ However this is contrary to the examples in the email thread listed above.
+
+ Developers report that IE7 (when it is configured for UTF-8 URL encoding,
+ which is not the default in asian countries), will accept::
+
+ #4: Content-Disposition: attachment; filename=fianc%C3%A9e
+
+ However, for maximum compatibility, Tahoe simply copies bytes from the URL
+ into the response header, rather than enforcing the utf-8 convention. This
+ means it does not try to decode the filename from the URL argument, nor does
+ it encode the filename into the response header.
+++ /dev/null
-==========================
-The Tahoe REST-ful Web API
-==========================
-
-1. `Enabling the web-API port`_
-2. `Basic Concepts: GET, PUT, DELETE, POST`_
-3. `URLs`_
-
- 1. `Child Lookup`_
-
-4. `Slow Operations, Progress, and Cancelling`_
-5. `Programmatic Operations`_
-
- 1. `Reading a file`_
- 2. `Writing/Uploading a File`_
- 3. `Creating a New Directory`_
- 4. `Get Information About A File Or Directory (as JSON)`_
- 5. `Attaching an existing File or Directory by its read- or write-cap`_
- 6. `Adding multiple files or directories to a parent directory at once`_
- 7. `Deleting a File or Directory`_
-
-6. `Browser Operations: Human-Oriented Interfaces`_
-
- 1. `Viewing A Directory (as HTML)`_
- 2. `Viewing/Downloading a File`_
- 3. `Get Information About A File Or Directory (as HTML)`_
- 4. `Creating a Directory`_
- 5. `Uploading a File`_
- 6. `Attaching An Existing File Or Directory (by URI)`_
- 7. `Deleting A Child`_
- 8. `Renaming A Child`_
- 9. `Other Utilities`_
- 10. `Debugging and Testing Features`_
-
-7. `Other Useful Pages`_
-8. `Static Files in /public_html`_
-9. `Safety and security issues -- names vs. URIs`_
-10. `Concurrency Issues`_
-
-Enabling the web-API port
-=========================
-
-Every Tahoe node is capable of running a built-in HTTP server. To enable
-this, just write a port number into the "[node]web.port" line of your node's
-tahoe.cfg file. For example, writing "web.port = 3456" into the "[node]"
-section of $NODEDIR/tahoe.cfg will cause the node to run a webserver on port
-3456.
-
-This string is actually a Twisted "strports" specification, meaning you can
-get more control over the interface to which the server binds by supplying
-additional arguments. For more details, see the documentation on
-`twisted.application.strports
-<http://twistedmatrix.com/documents/current/api/twisted.application.strports.html>`_.
-
-Writing "tcp:3456:interface=127.0.0.1" into the web.port line does the same
-but binds to the loopback interface, ensuring that only the programs on the
-local host can connect. Using "ssl:3456:privateKey=mykey.pem:certKey=cert.pem"
-runs an SSL server.
-
-This webport can be set when the node is created by passing a --webport
-option to the 'tahoe create-node' command. By default, the node listens on
-port 3456, on the loopback (127.0.0.1) interface.
-
-Basic Concepts: GET, PUT, DELETE, POST
-======================================
-
-As described in `architecture.rst`_, each file and directory in a Tahoe virtual
-filesystem is referenced by an identifier that combines the designation of
-the object with the authority to do something with it (such as read or modify
-the contents). This identifier is called a "read-cap" or "write-cap",
-depending upon whether it enables read-only or read-write access. These
-"caps" are also referred to as URIs.
-
-.. _architecture.rst: http://tahoe-lafs.org/source/tahoe-lafs/trunk/docs/architecture.rst
-
-The Tahoe web-based API is "REST-ful", meaning it implements the concepts of
-"REpresentational State Transfer": the original scheme by which the World
-Wide Web was intended to work. Each object (file or directory) is referenced
-by a URL that includes the read- or write- cap. HTTP methods (GET, PUT, and
-DELETE) are used to manipulate these objects. You can think of the URL as a
-noun, and the method as a verb.
-
-In REST, the GET method is used to retrieve information about an object, or
-to retrieve some representation of the object itself. When the object is a
-file, the basic GET method will simply return the contents of that file.
-Other variations (generally implemented by adding query parameters to the
-URL) will return information about the object, such as metadata. GET
-operations are required to have no side-effects.
-
-PUT is used to upload new objects into the filesystem, or to replace an
-existing object. DELETE it used to delete objects from the filesystem. Both
-PUT and DELETE are required to be idempotent: performing the same operation
-multiple times must have the same side-effects as only performing it once.
-
-POST is used for more complicated actions that cannot be expressed as a GET,
-PUT, or DELETE. POST operations can be thought of as a method call: sending
-some message to the object referenced by the URL. In Tahoe, POST is also used
-for operations that must be triggered by an HTML form (including upload and
-delete), because otherwise a regular web browser has no way to accomplish
-these tasks. In general, everything that can be done with a PUT or DELETE can
-also be done with a POST.
-
-Tahoe's web API is designed for two different kinds of consumer. The first is
-a program that needs to manipulate the virtual file system. Such programs are
-expected to use the RESTful interface described above. The second is a human
-using a standard web browser to work with the filesystem. This user is given
-a series of HTML pages with links to download files, and forms that use POST
-actions to upload, rename, and delete files.
-
-When an error occurs, the HTTP response code will be set to an appropriate
-400-series code (like 404 Not Found for an unknown childname, or 400 Bad Request
-when the parameters to a webapi operation are invalid), and the HTTP response
-body will usually contain a few lines of explanation as to the cause of the
-error and possible responses. Unusual exceptions may result in a 500 Internal
-Server Error as a catch-all, with a default response body containing
-a Nevow-generated HTML-ized representation of the Python exception stack trace
-that caused the problem. CLI programs which want to copy the response body to
-stderr should provide an "Accept: text/plain" header to their requests to get
-a plain text stack trace instead. If the Accept header contains ``*/*``, or
-``text/*``, or text/html (or if there is no Accept header), HTML tracebacks will
-be generated.
-
-URLs
-====
-
-Tahoe uses a variety of read- and write- caps to identify files and
-directories. The most common of these is the "immutable file read-cap", which
-is used for most uploaded files. These read-caps look like the following::
-
- URI:CHK:ime6pvkaxuetdfah2p2f35pe54:4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a:3:10:202
-
-The next most common is a "directory write-cap", which provides both read and
-write access to a directory, and look like this::
-
- URI:DIR2:djrdkfawoqihigoett4g6auz6a:jx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq
-
-There are also "directory read-caps", which start with "URI:DIR2-RO:", and
-give read-only access to a directory. Finally there are also mutable file
-read- and write- caps, which start with "URI:SSK", and give access to mutable
-files.
-
-(Later versions of Tahoe will make these strings shorter, and will remove the
-unfortunate colons, which must be escaped when these caps are embedded in
-URLs.)
-
-To refer to any Tahoe object through the web API, you simply need to combine
-a prefix (which indicates the HTTP server to use) with the cap (which
-indicates which object inside that server to access). Since the default Tahoe
-webport is 3456, the most common prefix is one that will use a local node
-listening on this port::
-
- http://127.0.0.1:3456/uri/ + $CAP
-
-So, to access the directory named above (which happens to be the
-publically-writeable sample directory on the Tahoe test grid, described at
-http://allmydata.org/trac/tahoe/wiki/TestGrid), the URL would be::
-
- http://127.0.0.1:3456/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/
-
-(note that the colons in the directory-cap are url-encoded into "%3A"
-sequences).
-
-Likewise, to access the file named above, use::
-
- http://127.0.0.1:3456/uri/URI%3ACHK%3Aime6pvkaxuetdfah2p2f35pe54%3A4btz54xk3tew6nd4y2ojpxj4m6wxjqqlwnztgre6gnjgtucd5r4a%3A3%3A10%3A202
-
-In the rest of this document, we'll use "$DIRCAP" as shorthand for a read-cap
-or write-cap that refers to a directory, and "$FILECAP" to abbreviate a cap
-that refers to a file (whether mutable or immutable). So those URLs above can
-be abbreviated as::
-
- http://127.0.0.1:3456/uri/$DIRCAP/
- http://127.0.0.1:3456/uri/$FILECAP
-
-The operation summaries below will abbreviate these further, by eliding the
-server prefix. They will be displayed like this::
-
- /uri/$DIRCAP/
- /uri/$FILECAP
-
-
-Child Lookup
-------------
-
-Tahoe directories contain named child entries, just like directories in a regular
-local filesystem. These child entries, called "dirnodes", consist of a name,
-metadata, a write slot, and a read slot. The write and read slots normally contain
-a write-cap and read-cap referring to the same object, which can be either a file
-or a subdirectory. The write slot may be empty (actually, both may be empty,
-but that is unusual).
-
-If you have a Tahoe URL that refers to a directory, and want to reference a
-named child inside it, just append the child name to the URL. For example, if
-our sample directory contains a file named "welcome.txt", we can refer to
-that file with::
-
- http://127.0.0.1:3456/uri/$DIRCAP/welcome.txt
-
-(or http://127.0.0.1:3456/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/welcome.txt)
-
-Multiple levels of subdirectories can be handled this way::
-
- http://127.0.0.1:3456/uri/$DIRCAP/tahoe-source/docs/webapi.txt
-
-In this document, when we need to refer to a URL that references a file using
-this child-of-some-directory format, we'll use the following string::
-
- /uri/$DIRCAP/[SUBDIRS../]FILENAME
-
-The "[SUBDIRS../]" part means that there are zero or more (optional)
-subdirectory names in the middle of the URL. The "FILENAME" at the end means
-that this whole URL refers to a file of some sort, rather than to a
-directory.
-
-When we need to refer specifically to a directory in this way, we'll write::
-
- /uri/$DIRCAP/[SUBDIRS../]SUBDIR
-
-
-Note that all components of pathnames in URLs are required to be UTF-8
-encoded, so "resume.doc" (with an acute accent on both E's) would be accessed
-with::
-
- http://127.0.0.1:3456/uri/$DIRCAP/r%C3%A9sum%C3%A9.doc
-
-Also note that the filenames inside upload POST forms are interpreted using
-whatever character set was provided in the conventional '_charset' field, and
-defaults to UTF-8 if not otherwise specified. The JSON representation of each
-directory contains native unicode strings. Tahoe directories are specified to
-contain unicode filenames, and cannot contain binary strings that are not
-representable as such.
-
-All Tahoe operations that refer to existing files or directories must include
-a suitable read- or write- cap in the URL: the webapi server won't add one
-for you. If you don't know the cap, you can't access the file. This allows
-the security properties of Tahoe caps to be extended across the webapi
-interface.
-
-Slow Operations, Progress, and Cancelling
-=========================================
-
-Certain operations can be expected to take a long time. The "t=deep-check",
-described below, will recursively visit every file and directory reachable
-from a given starting point, which can take minutes or even hours for
-extremely large directory structures. A single long-running HTTP request is a
-fragile thing: proxies, NAT boxes, browsers, and users may all grow impatient
-with waiting and give up on the connection.
-
-For this reason, long-running operations have an "operation handle", which
-can be used to poll for status/progress messages while the operation
-proceeds. This handle can also be used to cancel the operation. These handles
-are created by the client, and passed in as a an "ophandle=" query argument
-to the POST or PUT request which starts the operation. The following
-operations can then be used to retrieve status:
-
-``GET /operations/$HANDLE?output=HTML (with or without t=status)``
-
-``GET /operations/$HANDLE?output=JSON (same)``
-
- These two retrieve the current status of the given operation. Each operation
- presents a different sort of information, but in general the page retrieved
- will indicate:
-
- * whether the operation is complete, or if it is still running
- * how much of the operation is complete, and how much is left, if possible
-
- Note that the final status output can be quite large: a deep-manifest of a
- directory structure with 300k directories and 200k unique files is about
- 275MB of JSON, and might take two minutes to generate. For this reason, the
- full status is not provided until the operation has completed.
-
- The HTML form will include a meta-refresh tag, which will cause a regular
- web browser to reload the status page about 60 seconds later. This tag will
- be removed once the operation has completed.
-
- There may be more status information available under
- /operations/$HANDLE/$ETC : i.e., the handle forms the root of a URL space.
-
-``POST /operations/$HANDLE?t=cancel``
-
- This terminates the operation, and returns an HTML page explaining what was
- cancelled. If the operation handle has already expired (see below), this
- POST will return a 404, which indicates that the operation is no longer
- running (either it was completed or terminated). The response body will be
- the same as a GET /operations/$HANDLE on this operation handle, and the
- handle will be expired immediately afterwards.
-
-The operation handle will eventually expire, to avoid consuming an unbounded
-amount of memory. The handle's time-to-live can be reset at any time, by
-passing a retain-for= argument (with a count of seconds) to either the
-initial POST that starts the operation, or the subsequent GET request which
-asks about the operation. For example, if a 'GET
-/operations/$HANDLE?output=JSON&retain-for=600' query is performed, the
-handle will remain active for 600 seconds (10 minutes) after the GET was
-received.
-
-In addition, if the GET includes a release-after-complete=True argument, and
-the operation has completed, the operation handle will be released
-immediately.
-
-If a retain-for= argument is not used, the default handle lifetimes are:
-
- * handles will remain valid at least until their operation finishes
- * uncollected handles for finished operations (i.e. handles for
- operations that have finished but for which the GET page has not been
- accessed since completion) will remain valid for four days, or for
- the total time consumed by the operation, whichever is greater.
- * collected handles (i.e. the GET page has been retrieved at least once
- since the operation completed) will remain valid for one day.
-
-Many "slow" operations can begin to use unacceptable amounts of memory when
-operating on large directory structures. The memory usage increases when the
-ophandle is polled, as the results must be copied into a JSON string, sent
-over the wire, then parsed by a client. So, as an alternative, many "slow"
-operations have streaming equivalents. These equivalents do not use operation
-handles. Instead, they emit line-oriented status results immediately. Client
-code can cancel the operation by simply closing the HTTP connection.
-
-Programmatic Operations
-=======================
-
-Now that we know how to build URLs that refer to files and directories in a
-Tahoe virtual filesystem, what sorts of operations can we do with those URLs?
-This section contains a catalog of GET, PUT, DELETE, and POST operations that
-can be performed on these URLs. This set of operations are aimed at programs
-that use HTTP to communicate with a Tahoe node. A later section describes
-operations that are intended for web browsers.
-
-Reading A File
---------------
-
-``GET /uri/$FILECAP``
-
-``GET /uri/$DIRCAP/[SUBDIRS../]FILENAME``
-
- This will retrieve the contents of the given file. The HTTP response body
- will contain the sequence of bytes that make up the file.
-
- To view files in a web browser, you may want more control over the
- Content-Type and Content-Disposition headers. Please see the next section
- "Browser Operations", for details on how to modify these URLs for that
- purpose.
-
-Writing/Uploading A File
-------------------------
-
-``PUT /uri/$FILECAP``
-
-``PUT /uri/$DIRCAP/[SUBDIRS../]FILENAME``
-
- Upload a file, using the data from the HTTP request body, and add whatever
- child links and subdirectories are necessary to make the file available at
- the given location. Once this operation succeeds, a GET on the same URL will
- retrieve the same contents that were just uploaded. This will create any
- necessary intermediate subdirectories.
-
- To use the /uri/$FILECAP form, $FILECAP must be a write-cap for a mutable file.
-
- In the /uri/$DIRCAP/[SUBDIRS../]FILENAME form, if the target file is a
- writeable mutable file, that file's contents will be overwritten in-place. If
- it is a read-cap for a mutable file, an error will occur. If it is an
- immutable file, the old file will be discarded, and a new one will be put in
- its place.
-
- When creating a new file, if "mutable=true" is in the query arguments, the
- operation will create a mutable file instead of an immutable one.
-
- This returns the file-cap of the resulting file. If a new file was created
- by this method, the HTTP response code (as dictated by rfc2616) will be set
- to 201 CREATED. If an existing file was replaced or modified, the response
- code will be 200 OK.
-
- Note that the 'curl -T localfile http://127.0.0.1:3456/uri/$DIRCAP/foo.txt'
- command can be used to invoke this operation.
-
-``PUT /uri``
-
- This uploads a file, and produces a file-cap for the contents, but does not
- attach the file into the filesystem. No directories will be modified by
- this operation. The file-cap is returned as the body of the HTTP response.
-
- If "mutable=true" is in the query arguments, the operation will create a
- mutable file, and return its write-cap in the HTTP respose. The default is
- to create an immutable file, returning the read-cap as a response.
-
-Creating A New Directory
-------------------------
-
-``POST /uri?t=mkdir``
-
-``PUT /uri?t=mkdir``
-
- Create a new empty directory and return its write-cap as the HTTP response
- body. This does not make the newly created directory visible from the
- filesystem. The "PUT" operation is provided for backwards compatibility:
- new code should use POST.
-
-``POST /uri?t=mkdir-with-children``
-
- Create a new directory, populated with a set of child nodes, and return its
- write-cap as the HTTP response body. The new directory is not attached to
- any other directory: the returned write-cap is the only reference to it.
-
- Initial children are provided as the body of the POST form (this is more
- efficient than doing separate mkdir and set_children operations). If the
- body is empty, the new directory will be empty. If not empty, the body will
- be interpreted as a UTF-8 JSON-encoded dictionary of children with which the
- new directory should be populated, using the same format as would be
- returned in the 'children' value of the t=json GET request, described below.
- Each dictionary key should be a child name, and each value should be a list
- of [TYPE, PROPDICT], where PROPDICT contains "rw_uri", "ro_uri", and
- "metadata" keys (all others are ignored). For example, the PUT request body
- could be::
-
- {
- "Fran\u00e7ais": [ "filenode", {
- "ro_uri": "URI:CHK:...",
- "size": bytes,
- "metadata": {
- "ctime": 1202777696.7564139,
- "mtime": 1202777696.7564139,
- "tahoe": {
- "linkcrtime": 1202777696.7564139,
- "linkmotime": 1202777696.7564139
- } } } ],
- "subdir": [ "dirnode", {
- "rw_uri": "URI:DIR2:...",
- "ro_uri": "URI:DIR2-RO:...",
- "metadata": {
- "ctime": 1202778102.7589991,
- "mtime": 1202778111.2160511,
- "tahoe": {
- "linkcrtime": 1202777696.7564139,
- "linkmotime": 1202777696.7564139
- } } } ]
- }
-
- For forward-compatibility, a mutable directory can also contain caps in
- a format that is unknown to the webapi server. When such caps are retrieved
- from a mutable directory in a "ro_uri" field, they will be prefixed with
- the string "ro.", indicating that they must not be decoded without
- checking that they are read-only. The "ro." prefix must not be stripped
- off without performing this check. (Future versions of the webapi server
- will perform it where necessary.)
-
- If both the "rw_uri" and "ro_uri" fields are present in a given PROPDICT,
- and the webapi server recognizes the rw_uri as a write cap, then it will
- reset the ro_uri to the corresponding read cap and discard the original
- contents of ro_uri (in order to ensure that the two caps correspond to the
- same object and that the ro_uri is in fact read-only). However this may not
- happen for caps in a format unknown to the webapi server. Therefore, when
- writing a directory the webapi client should ensure that the contents
- of "rw_uri" and "ro_uri" for a given PROPDICT are a consistent
- (write cap, read cap) pair if possible. If the webapi client only has
- one cap and does not know whether it is a write cap or read cap, then
- it is acceptable to set "rw_uri" to that cap and omit "ro_uri". The
- client must not put a write cap into a "ro_uri" field.
-
- The metadata may have a "no-write" field. If this is set to true in the
- metadata of a link, it will not be possible to open that link for writing
- via the SFTP frontend; see `FTP-and-SFTP.rst`_ for details.
- Also, if the "no-write" field is set to true in the metadata of a link to
- a mutable child, it will cause the link to be diminished to read-only.
-
- .. _FTP-and-SFTP.rst: http://tahoe-lafs.org/source/tahoe-lafs/trunk/docs/frontents/FTP-and-SFTP.rst
-
- Note that the webapi-using client application must not provide the
- "Content-Type: multipart/form-data" header that usually accompanies HTML
- form submissions, since the body is not formatted this way. Doing so will
- cause a server error as the lower-level code misparses the request body.
-
- Child file names should each be expressed as a unicode string, then used as
- keys of the dictionary. The dictionary should then be converted into JSON,
- and the resulting string encoded into UTF-8. This UTF-8 bytestring should
- then be used as the POST body.
-
-``POST /uri?t=mkdir-immutable``
-
- Like t=mkdir-with-children above, but the new directory will be
- deep-immutable. This means that the directory itself is immutable, and that
- it can only contain objects that are treated as being deep-immutable, like
- immutable files, literal files, and deep-immutable directories.
-
- For forward-compatibility, a deep-immutable directory can also contain caps
- in a format that is unknown to the webapi server. When such caps are retrieved
- from a deep-immutable directory in a "ro_uri" field, they will be prefixed
- with the string "imm.", indicating that they must not be decoded without
- checking that they are immutable. The "imm." prefix must not be stripped
- off without performing this check. (Future versions of the webapi server
- will perform it where necessary.)
-
- The cap for each child may be given either in the "rw_uri" or "ro_uri"
- field of the PROPDICT (not both). If a cap is given in the "rw_uri" field,
- then the webapi server will check that it is an immutable read-cap of a
- *known* format, and give an error if it is not. If a cap is given in the
- "ro_uri" field, then the webapi server will still check whether known
- caps are immutable, but for unknown caps it will simply assume that the
- cap can be stored, as described above. Note that an attacker would be
- able to store any cap in an immutable directory, so this check when
- creating the directory is only to help non-malicious clients to avoid
- accidentally giving away more authority than intended.
-
- A non-empty request body is mandatory, since after the directory is created,
- it will not be possible to add more children to it.
-
-``POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir``
-
-``PUT /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir``
-
- Create new directories as necessary to make sure that the named target
- ($DIRCAP/SUBDIRS../SUBDIR) is a directory. This will create additional
- intermediate mutable directories as necessary. If the named target directory
- already exists, this will make no changes to it.
-
- If the final directory is created, it will be empty.
-
- This operation will return an error if a blocking file is present at any of
- the parent names, preventing the server from creating the necessary parent
- directory; or if it would require changing an immutable directory.
-
- The write-cap of the new directory will be returned as the HTTP response
- body.
-
-``POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir-with-children``
-
- Like /uri?t=mkdir-with-children, but the final directory is created as a
- child of an existing mutable directory. This will create additional
- intermediate mutable directories as necessary. If the final directory is
- created, it will be populated with initial children from the POST request
- body, as described above.
-
- This operation will return an error if a blocking file is present at any of
- the parent names, preventing the server from creating the necessary parent
- directory; or if it would require changing an immutable directory; or if
- the immediate parent directory already has a a child named SUBDIR.
-
-``POST /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir-immutable``
-
- Like /uri?t=mkdir-immutable, but the final directory is created as a child
- of an existing mutable directory. The final directory will be deep-immutable,
- and will be populated with the children specified as a JSON dictionary in
- the POST request body.
-
- In Tahoe 1.6 this operation creates intermediate mutable directories if
- necessary, but that behaviour should not be relied on; see ticket #920.
-
- This operation will return an error if the parent directory is immutable,
- or already has a child named SUBDIR.
-
-``POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=NAME``
-
- Create a new empty mutable directory and attach it to the given existing
- directory. This will create additional intermediate directories as necessary.
-
- This operation will return an error if a blocking file is present at any of
- the parent names, preventing the server from creating the necessary parent
- directory, or if it would require changing any immutable directory.
-
- The URL of this operation points to the parent of the bottommost new directory,
- whereas the /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=mkdir operation above has a URL
- that points directly to the bottommost new directory.
-
-``POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir-with-children&name=NAME``
-
- Like /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=NAME, but the new directory will
- be populated with initial children via the POST request body. This command
- will create additional intermediate mutable directories as necessary.
-
- This operation will return an error if a blocking file is present at any of
- the parent names, preventing the server from creating the necessary parent
- directory; or if it would require changing an immutable directory; or if
- the immediate parent directory already has a a child named NAME.
-
- Note that the name= argument must be passed as a queryarg, because the POST
- request body is used for the initial children JSON.
-
-``POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir-immutable&name=NAME``
-
- Like /uri/$DIRCAP/[SUBDIRS../]?t=mkdir-with-children&name=NAME, but the
- final directory will be deep-immutable. The children are specified as a
- JSON dictionary in the POST request body. Again, the name= argument must be
- passed as a queryarg.
-
- In Tahoe 1.6 this operation creates intermediate mutable directories if
- necessary, but that behaviour should not be relied on; see ticket #920.
-
- This operation will return an error if the parent directory is immutable,
- or already has a child named NAME.
-
-Get Information About A File Or Directory (as JSON)
----------------------------------------------------
-
-``GET /uri/$FILECAP?t=json``
-
-``GET /uri/$DIRCAP?t=json``
-
-``GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json``
-
-``GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json``
-
- This returns a machine-parseable JSON-encoded description of the given
- object. The JSON always contains a list, and the first element of the list is
- always a flag that indicates whether the referenced object is a file or a
- directory. If it is a capability to a file, then the information includes
- file size and URI, like this::
-
- GET /uri/$FILECAP?t=json :
-
- [ "filenode", {
- "ro_uri": file_uri,
- "verify_uri": verify_uri,
- "size": bytes,
- "mutable": false
- } ]
-
- If it is a capability to a directory followed by a path from that directory
- to a file, then the information also includes metadata from the link to the
- file in the parent directory, like this::
-
- GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=json
-
- [ "filenode", {
- "ro_uri": file_uri,
- "verify_uri": verify_uri,
- "size": bytes,
- "mutable": false,
- "metadata": {
- "ctime": 1202777696.7564139,
- "mtime": 1202777696.7564139,
- "tahoe": {
- "linkcrtime": 1202777696.7564139,
- "linkmotime": 1202777696.7564139
- } } } ]
-
- If it is a directory, then it includes information about the children of
- this directory, as a mapping from child name to a set of data about the
- child (the same data that would appear in a corresponding GET?t=json of the
- child itself). The child entries also include metadata about each child,
- including link-creation- and link-change- timestamps. The output looks like
- this::
-
- GET /uri/$DIRCAP?t=json :
- GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR?t=json :
-
- [ "dirnode", {
- "rw_uri": read_write_uri,
- "ro_uri": read_only_uri,
- "verify_uri": verify_uri,
- "mutable": true,
- "children": {
- "foo.txt": [ "filenode", {
- "ro_uri": uri,
- "size": bytes,
- "metadata": {
- "ctime": 1202777696.7564139,
- "mtime": 1202777696.7564139,
- "tahoe": {
- "linkcrtime": 1202777696.7564139,
- "linkmotime": 1202777696.7564139
- } } } ],
- "subdir": [ "dirnode", {
- "rw_uri": rwuri,
- "ro_uri": rouri,
- "metadata": {
- "ctime": 1202778102.7589991,
- "mtime": 1202778111.2160511,
- "tahoe": {
- "linkcrtime": 1202777696.7564139,
- "linkmotime": 1202777696.7564139
- } } } ]
- } } ]
-
- In the above example, note how 'children' is a dictionary in which the keys
- are child names and the values depend upon whether the child is a file or a
- directory. The value is mostly the same as the JSON representation of the
- child object (except that directories do not recurse -- the "children"
- entry of the child is omitted, and the directory view includes the metadata
- that is stored on the directory edge).
-
- The rw_uri field will be present in the information about a directory
- if and only if you have read-write access to that directory. The verify_uri
- field will be present if and only if the object has a verify-cap
- (non-distributed LIT files do not have verify-caps).
-
- If the cap is of an unknown format, then the file size and verify_uri will
- not be available::
-
- GET /uri/$UNKNOWNCAP?t=json :
-
- [ "unknown", {
- "ro_uri": unknown_read_uri
- } ]
-
- GET /uri/$DIRCAP/[SUBDIRS../]UNKNOWNCHILDNAME?t=json :
-
- [ "unknown", {
- "rw_uri": unknown_write_uri,
- "ro_uri": unknown_read_uri,
- "mutable": true,
- "metadata": {
- "ctime": 1202777696.7564139,
- "mtime": 1202777696.7564139,
- "tahoe": {
- "linkcrtime": 1202777696.7564139,
- "linkmotime": 1202777696.7564139
- } } } ]
-
- As in the case of file nodes, the metadata will only be present when the
- capability is to a directory followed by a path. The "mutable" field is also
- not always present; when it is absent, the mutability of the object is not
- known.
-
-About the metadata
-``````````````````
-
-The value of the 'tahoe':'linkmotime' key is updated whenever a link to a
-child is set. The value of the 'tahoe':'linkcrtime' key is updated whenever
-a link to a child is created -- i.e. when there was not previously a link
-under that name.
-
-Note however, that if the edge in the Tahoe filesystem points to a mutable
-file and the contents of that mutable file is changed, then the
-'tahoe':'linkmotime' value on that edge will *not* be updated, since the
-edge itself wasn't updated -- only the mutable file was.
-
-The timestamps are represented as a number of seconds since the UNIX epoch
-(1970-01-01 00:00:00 UTC), with leap seconds not being counted in the long
-term.
-
-In Tahoe earlier than v1.4.0, 'mtime' and 'ctime' keys were populated
-instead of the 'tahoe':'linkmotime' and 'tahoe':'linkcrtime' keys. Starting
-in Tahoe v1.4.0, the 'linkmotime'/'linkcrtime' keys in the 'tahoe' sub-dict
-are populated. However, prior to Tahoe v1.7beta, a bug caused the 'tahoe'
-sub-dict to be deleted by webapi requests in which new metadata is
-specified, and not to be added to existing child links that lack it.
-
-From Tahoe v1.7.0 onward, the 'mtime' and 'ctime' fields are no longer
-populated or updated (see ticket #924), except by "tahoe backup" as
-explained below. For backward compatibility, when an existing link is
-updated and 'tahoe':'linkcrtime' is not present in the previous metadata
-but 'ctime' is, the old value of 'ctime' is used as the new value of
-'tahoe':'linkcrtime'.
-
-The reason we added the new fields in Tahoe v1.4.0 is that there is a
-"set_children" API (described below) which you can use to overwrite the
-values of the 'mtime'/'ctime' pair, and this API is used by the
-"tahoe backup" command (in Tahoe v1.3.0 and later) to set the 'mtime' and
-'ctime' values when backing up files from a local filesystem into the
-Tahoe filesystem. As of Tahoe v1.4.0, the set_children API cannot be used
-to set anything under the 'tahoe' key of the metadata dict -- if you
-include 'tahoe' keys in your 'metadata' arguments then it will silently
-ignore those keys.
-
-Therefore, if the 'tahoe' sub-dict is present, you can rely on the
-'linkcrtime' and 'linkmotime' values therein to have the semantics described
-above. (This is assuming that only official Tahoe clients have been used to
-write those links, and that their system clocks were set to what you expected
--- there is nothing preventing someone from editing their Tahoe client or
-writing their own Tahoe client which would overwrite those values however
-they like, and there is nothing to constrain their system clock from taking
-any value.)
-
-When an edge is created or updated by "tahoe backup", the 'mtime' and
-'ctime' keys on that edge are set as follows:
-
-* 'mtime' is set to the timestamp read from the local filesystem for the
- "mtime" of the local file in question, which means the last time the
- contents of that file were changed.
-
-* On Windows, 'ctime' is set to the creation timestamp for the file
- read from the local filesystem. On other platforms, 'ctime' is set to
- the UNIX "ctime" of the local file, which means the last time that
- either the contents or the metadata of the local file was changed.
-
-There are several ways that the 'ctime' field could be confusing:
-
-1. You might be confused about whether it reflects the time of the creation
- of a link in the Tahoe filesystem (by a version of Tahoe < v1.7.0) or a
- timestamp copied in by "tahoe backup" from a local filesystem.
-
-2. You might be confused about whether it is a copy of the file creation
- time (if "tahoe backup" was run on a Windows system) or of the last
- contents-or-metadata change (if "tahoe backup" was run on a different
- operating system).
-
-3. You might be confused by the fact that changing the contents of a
- mutable file in Tahoe doesn't have any effect on any links pointing at
- that file in any directories, although "tahoe backup" sets the link
- 'ctime'/'mtime' to reflect timestamps about the local file corresponding
- to the Tahoe file to which the link points.
-
-4. Also, quite apart from Tahoe, you might be confused about the meaning
- of the "ctime" in UNIX local filesystems, which people sometimes think
- means file creation time, but which actually means, in UNIX local
- filesystems, the most recent time that the file contents or the file
- metadata (such as owner, permission bits, extended attributes, etc.)
- has changed. Note that although "ctime" does not mean file creation time
- in UNIX, links created by a version of Tahoe prior to v1.7.0, and never
- written by "tahoe backup", will have 'ctime' set to the link creation
- time.
-
-
-Attaching an existing File or Directory by its read- or write-cap
------------------------------------------------------------------
-
-``PUT /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri``
-
- This attaches a child object (either a file or directory) to a specified
- location in the virtual filesystem. The child object is referenced by its
- read- or write- cap, as provided in the HTTP request body. This will create
- intermediate directories as necessary.
-
- This is similar to a UNIX hardlink: by referencing a previously-uploaded file
- (or previously-created directory) instead of uploading/creating a new one,
- you can create two references to the same object.
-
- The read- or write- cap of the child is provided in the body of the HTTP
- request, and this same cap is returned in the response body.
-
- The default behavior is to overwrite any existing object at the same
- location. To prevent this (and make the operation return an error instead
- of overwriting), add a "replace=false" argument, as "?t=uri&replace=false".
- With replace=false, this operation will return an HTTP 409 "Conflict" error
- if there is already an object at the given location, rather than
- overwriting the existing object. To allow the operation to overwrite a
- file, but return an error when trying to overwrite a directory, use
- "replace=only-files" (this behavior is closer to the traditional UNIX "mv"
- command). Note that "true", "t", and "1" are all synonyms for "True", and
- "false", "f", and "0" are synonyms for "False", and the parameter is
- case-insensitive.
-
- Note that this operation does not take its child cap in the form of
- separate "rw_uri" and "ro_uri" fields. Therefore, it cannot accept a
- child cap in a format unknown to the webapi server, unless its URI
- starts with "ro." or "imm.". This restriction is necessary because the
- server is not able to attenuate an unknown write cap to a read cap.
- Unknown URIs starting with "ro." or "imm.", on the other hand, are
- assumed to represent read caps. The client should not prefix a write
- cap with "ro." or "imm." and pass it to this operation, since that
- would result in granting the cap's write authority to holders of the
- directory read cap.
-
-Adding multiple files or directories to a parent directory at once
-------------------------------------------------------------------
-
-``POST /uri/$DIRCAP/[SUBDIRS..]?t=set_children``
-
-``POST /uri/$DIRCAP/[SUBDIRS..]?t=set-children`` (Tahoe >= v1.6)
-
- This command adds multiple children to a directory in a single operation.
- It reads the request body and interprets it as a JSON-encoded description
- of the child names and read/write-caps that should be added.
-
- The body should be a JSON-encoded dictionary, in the same format as the
- "children" value returned by the "GET /uri/$DIRCAP?t=json" operation
- described above. In this format, each key is a child names, and the
- corresponding value is a tuple of (type, childinfo). "type" is ignored, and
- "childinfo" is a dictionary that contains "rw_uri", "ro_uri", and
- "metadata" keys. You can take the output of "GET /uri/$DIRCAP1?t=json" and
- use it as the input to "POST /uri/$DIRCAP2?t=set_children" to make DIR2
- look very much like DIR1 (except for any existing children of DIR2 that
- were not overwritten, and any existing "tahoe" metadata keys as described
- below).
-
- When the set_children request contains a child name that already exists in
- the target directory, this command defaults to overwriting that child with
- the new value (both child cap and metadata, but if the JSON data does not
- contain a "metadata" key, the old child's metadata is preserved). The
- command takes a boolean "overwrite=" query argument to control this
- behavior. If you use "?t=set_children&overwrite=false", then an attempt to
- replace an existing child will instead cause an error.
-
- Any "tahoe" key in the new child's "metadata" value is ignored. Any
- existing "tahoe" metadata is preserved. The metadata["tahoe"] value is
- reserved for metadata generated by the tahoe node itself. The only two keys
- currently placed here are "linkcrtime" and "linkmotime". For details, see
- the section above entitled "Get Information About A File Or Directory (as
- JSON)", in the "About the metadata" subsection.
-
- Note that this command was introduced with the name "set_children", which
- uses an underscore rather than a hyphen as other multi-word command names
- do. The variant with a hyphen is now accepted, but clients that desire
- backward compatibility should continue to use "set_children".
-
-
-Deleting a File or Directory
-----------------------------
-
-``DELETE /uri/$DIRCAP/[SUBDIRS../]CHILDNAME``
-
- This removes the given name from its parent directory. CHILDNAME is the
- name to be removed, and $DIRCAP/SUBDIRS.. indicates the directory that will
- be modified.
-
- Note that this does not actually delete the file or directory that the name
- points to from the tahoe grid -- it only removes the named reference from
- this directory. If there are other names in this directory or in other
- directories that point to the resource, then it will remain accessible
- through those paths. Even if all names pointing to this object are removed
- from their parent directories, then someone with possession of its read-cap
- can continue to access the object through that cap.
-
- The object will only become completely unreachable once 1: there are no
- reachable directories that reference it, and 2: nobody is holding a read-
- or write- cap to the object. (This behavior is very similar to the way
- hardlinks and anonymous files work in traditional UNIX filesystems).
-
- This operation will not modify more than a single directory. Intermediate
- directories which were implicitly created by PUT or POST methods will *not*
- be automatically removed by DELETE.
-
- This method returns the file- or directory- cap of the object that was just
- removed.
-
-Browser Operations: Human-oriented interfaces
-=============================================
-
-This section describes the HTTP operations that provide support for humans
-running a web browser. Most of these operations use HTML forms that use POST
-to drive the Tahoe node. This section is intended for HTML authors who want
-to write web pages that contain forms and buttons which manipulate the Tahoe
-filesystem.
-
-Note that for all POST operations, the arguments listed can be provided
-either as URL query arguments or as form body fields. URL query arguments are
-separated from the main URL by "?", and from each other by "&". For example,
-"POST /uri/$DIRCAP?t=upload&mutable=true". Form body fields are usually
-specified by using <input type="hidden"> elements. For clarity, the
-descriptions below display the most significant arguments as URL query args.
-
-Viewing A Directory (as HTML)
------------------------------
-
-``GET /uri/$DIRCAP/[SUBDIRS../]``
-
- This returns an HTML page, intended to be displayed to a human by a web
- browser, which contains HREF links to all files and directories reachable
- from this directory. These HREF links do not have a t= argument, meaning
- that a human who follows them will get pages also meant for a human. It also
- contains forms to upload new files, and to delete files and directories.
- Those forms use POST methods to do their job.
-
-Viewing/Downloading a File
---------------------------
-
-``GET /uri/$FILECAP``
-
-``GET /uri/$DIRCAP/[SUBDIRS../]FILENAME``
-
- This will retrieve the contents of the given file. The HTTP response body
- will contain the sequence of bytes that make up the file.
-
- If you want the HTTP response to include a useful Content-Type header,
- either use the second form (which starts with a $DIRCAP), or add a
- "filename=foo" query argument, like "GET /uri/$FILECAP?filename=foo.jpg".
- The bare "GET /uri/$FILECAP" does not give the Tahoe node enough information
- to determine a Content-Type (since Tahoe immutable files are merely
- sequences of bytes, not typed+named file objects).
-
- If the URL has both filename= and "save=true" in the query arguments, then
- the server to add a "Content-Disposition: attachment" header, along with a
- filename= parameter. When a user clicks on such a link, most browsers will
- offer to let the user save the file instead of displaying it inline (indeed,
- most browsers will refuse to display it inline). "true", "t", "1", and other
- case-insensitive equivalents are all treated the same.
-
- Character-set handling in URLs and HTTP headers is a dubious art [1]_. For
- maximum compatibility, Tahoe simply copies the bytes from the filename=
- argument into the Content-Disposition header's filename= parameter, without
- trying to interpret them in any particular way.
-
-
-``GET /named/$FILECAP/FILENAME``
-
- This is an alternate download form which makes it easier to get the correct
- filename. The Tahoe server will provide the contents of the given file, with
- a Content-Type header derived from the given filename. This form is used to
- get browsers to use the "Save Link As" feature correctly, and also helps
- command-line tools like "wget" and "curl" use the right filename. Note that
- this form can *only* be used with file caps; it is an error to use a
- directory cap after the /named/ prefix.
-
-Get Information About A File Or Directory (as HTML)
----------------------------------------------------
-
-``GET /uri/$FILECAP?t=info``
-
-``GET /uri/$DIRCAP/?t=info``
-
-``GET /uri/$DIRCAP/[SUBDIRS../]SUBDIR/?t=info``
-
-``GET /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=info``
-
- This returns a human-oriented HTML page with more detail about the selected
- file or directory object. This page contains the following items:
-
- * object size
- * storage index
- * JSON representation
- * raw contents (text/plain)
- * access caps (URIs): verify-cap, read-cap, write-cap (for mutable objects)
- * check/verify/repair form
- * deep-check/deep-size/deep-stats/manifest (for directories)
- * replace-conents form (for mutable files)
-
-Creating a Directory
---------------------
-
-``POST /uri?t=mkdir``
-
- This creates a new empty directory, but does not attach it to the virtual
- filesystem.
-
- If a "redirect_to_result=true" argument is provided, then the HTTP response
- will cause the web browser to be redirected to a /uri/$DIRCAP page that
- gives access to the newly-created directory. If you bookmark this page,
- you'll be able to get back to the directory again in the future. This is the
- recommended way to start working with a Tahoe server: create a new unlinked
- directory (using redirect_to_result=true), then bookmark the resulting
- /uri/$DIRCAP page. There is a "create directory" button on the Welcome page
- to invoke this action.
-
- If "redirect_to_result=true" is not provided (or is given a value of
- "false"), then the HTTP response body will simply be the write-cap of the
- new directory.
-
-``POST /uri/$DIRCAP/[SUBDIRS../]?t=mkdir&name=CHILDNAME``
-
- This creates a new empty directory as a child of the designated SUBDIR. This
- will create additional intermediate directories as necessary.
-
- If a "when_done=URL" argument is provided, the HTTP response will cause the
- web browser to redirect to the given URL. This provides a convenient way to
- return the browser to the directory that was just modified. Without a
- when_done= argument, the HTTP response will simply contain the write-cap of
- the directory that was just created.
-
-
-Uploading a File
-----------------
-
-``POST /uri?t=upload``
-
- This uploads a file, and produces a file-cap for the contents, but does not
- attach the file into the filesystem. No directories will be modified by
- this operation.
-
- The file must be provided as the "file" field of an HTML encoded form body,
- produced in response to an HTML form like this::
-
- <form action="/uri" method="POST" enctype="multipart/form-data">
- <input type="hidden" name="t" value="upload" />
- <input type="file" name="file" />
- <input type="submit" value="Upload Unlinked" />
- </form>
-
- If a "when_done=URL" argument is provided, the response body will cause the
- browser to redirect to the given URL. If the when_done= URL has the string
- "%(uri)s" in it, that string will be replaced by a URL-escaped form of the
- newly created file-cap. (Note that without this substitution, there is no
- way to access the file that was just uploaded).
-
- The default (in the absence of when_done=) is to return an HTML page that
- describes the results of the upload. This page will contain information
- about which storage servers were used for the upload, how long each
- operation took, etc.
-
- If a "mutable=true" argument is provided, the operation will create a
- mutable file, and the response body will contain the write-cap instead of
- the upload results page. The default is to create an immutable file,
- returning the upload results page as a response.
-
-
-``POST /uri/$DIRCAP/[SUBDIRS../]?t=upload``
-
- This uploads a file, and attaches it as a new child of the given directory,
- which must be mutable. The file must be provided as the "file" field of an
- HTML-encoded form body, produced in response to an HTML form like this::
-
- <form action="." method="POST" enctype="multipart/form-data">
- <input type="hidden" name="t" value="upload" />
- <input type="file" name="file" />
- <input type="submit" value="Upload" />
- </form>
-
- A "name=" argument can be provided to specify the new child's name,
- otherwise it will be taken from the "filename" field of the upload form
- (most web browsers will copy the last component of the original file's
- pathname into this field). To avoid confusion, name= is not allowed to
- contain a slash.
-
- If there is already a child with that name, and it is a mutable file, then
- its contents are replaced with the data being uploaded. If it is not a
- mutable file, the default behavior is to remove the existing child before
- creating a new one. To prevent this (and make the operation return an error
- instead of overwriting the old child), add a "replace=false" argument, as
- "?t=upload&replace=false". With replace=false, this operation will return an
- HTTP 409 "Conflict" error if there is already an object at the given
- location, rather than overwriting the existing object. Note that "true",
- "t", and "1" are all synonyms for "True", and "false", "f", and "0" are
- synonyms for "False". the parameter is case-insensitive.
-
- This will create additional intermediate directories as necessary, although
- since it is expected to be triggered by a form that was retrieved by "GET
- /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will
- already exist.
-
- If a "mutable=true" argument is provided, any new file that is created will
- be a mutable file instead of an immutable one. <input type="checkbox"
- name="mutable" /> will give the user a way to set this option.
-
- If a "when_done=URL" argument is provided, the HTTP response will cause the
- web browser to redirect to the given URL. This provides a convenient way to
- return the browser to the directory that was just modified. Without a
- when_done= argument, the HTTP response will simply contain the file-cap of
- the file that was just uploaded (a write-cap for mutable files, or a
- read-cap for immutable files).
-
-``POST /uri/$DIRCAP/[SUBDIRS../]FILENAME?t=upload``
-
- This also uploads a file and attaches it as a new child of the given
- directory, which must be mutable. It is a slight variant of the previous
- operation, as the URL refers to the target file rather than the parent
- directory. It is otherwise identical: this accepts mutable= and when_done=
- arguments too.
-
-``POST /uri/$FILECAP?t=upload``
-
- This modifies the contents of an existing mutable file in-place. An error is
- signalled if $FILECAP does not refer to a mutable file. It behaves just like
- the "PUT /uri/$FILECAP" form, but uses a POST for the benefit of HTML forms
- in a web browser.
-
-Attaching An Existing File Or Directory (by URI)
-------------------------------------------------
-
-``POST /uri/$DIRCAP/[SUBDIRS../]?t=uri&name=CHILDNAME&uri=CHILDCAP``
-
- This attaches a given read- or write- cap "CHILDCAP" to the designated
- directory, with a specified child name. This behaves much like the PUT t=uri
- operation, and is a lot like a UNIX hardlink. It is subject to the same
- restrictions as that operation on the use of cap formats unknown to the
- webapi server.
-
- This will create additional intermediate directories as necessary, although
- since it is expected to be triggered by a form that was retrieved by "GET
- /uri/$DIRCAP/[SUBDIRS../]", it is likely that the parent directory will
- already exist.
-
- This accepts the same replace= argument as POST t=upload.
-
-Deleting A Child
-----------------
-
-``POST /uri/$DIRCAP/[SUBDIRS../]?t=delete&name=CHILDNAME``
-
- This instructs the node to remove a child object (file or subdirectory) from
- the given directory, which must be mutable. Note that the entire subtree is
- unlinked from the parent. Unlike deleting a subdirectory in a UNIX local
- filesystem, the subtree need not be empty; if it isn't, then other references
- into the subtree will see that the child subdirectories are not modified by
- this operation. Only the link from the given directory to its child is severed.
-
-Renaming A Child
-----------------
-
-``POST /uri/$DIRCAP/[SUBDIRS../]?t=rename&from_name=OLD&to_name=NEW``
-
- This instructs the node to rename a child of the given directory, which must
- be mutable. This has a similar effect to removing the child, then adding the
- same child-cap under the new name, except that it preserves metadata. This
- operation cannot move the child to a different directory.
-
- This operation will replace any existing child of the new name, making it
- behave like the UNIX "``mv -f``" command.
-
-Other Utilities
----------------
-
-``GET /uri?uri=$CAP``
-
- This causes a redirect to /uri/$CAP, and retains any additional query
- arguments (like filename= or save=). This is for the convenience of web
- forms which allow the user to paste in a read- or write- cap (obtained
- through some out-of-band channel, like IM or email).
-
- Note that this form merely redirects to the specific file or directory
- indicated by the $CAP: unlike the GET /uri/$DIRCAP form, you cannot
- traverse to children by appending additional path segments to the URL.
-
-``GET /uri/$DIRCAP/[SUBDIRS../]?t=rename-form&name=$CHILDNAME``
-
- This provides a useful facility to browser-based user interfaces. It
- returns a page containing a form targetting the "POST $DIRCAP t=rename"
- functionality described above, with the provided $CHILDNAME present in the
- 'from_name' field of that form. I.e. this presents a form offering to
- rename $CHILDNAME, requesting the new name, and submitting POST rename.
-
-``GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=uri``
-
- This returns the file- or directory- cap for the specified object.
-
-``GET /uri/$DIRCAP/[SUBDIRS../]CHILDNAME?t=readonly-uri``
-
- This returns a read-only file- or directory- cap for the specified object.
- If the object is an immutable file, this will return the same value as
- t=uri.
-
-Debugging and Testing Features
-------------------------------
-
-These URLs are less-likely to be helpful to the casual Tahoe user, and are
-mainly intended for developers.
-
-``POST $URL?t=check``
-
- This triggers the FileChecker to determine the current "health" of the
- given file or directory, by counting how many shares are available. The
- page that is returned will display the results. This can be used as a "show
- me detailed information about this file" page.
-
- If a verify=true argument is provided, the node will perform a more
- intensive check, downloading and verifying every single bit of every share.
-
- If an add-lease=true argument is provided, the node will also add (or
- renew) a lease to every share it encounters. Each lease will keep the share
- alive for a certain period of time (one month by default). Once the last
- lease expires or is explicitly cancelled, the storage server is allowed to
- delete the share.
-
- If an output=JSON argument is provided, the response will be
- machine-readable JSON instead of human-oriented HTML. The data is a
- dictionary with the following keys::
-
- storage-index: a base32-encoded string with the objects's storage index,
- or an empty string for LIT files
- summary: a string, with a one-line summary of the stats of the file
- results: a dictionary that describes the state of the file. For LIT files,
- this dictionary has only the 'healthy' key, which will always be
- True. For distributed files, this dictionary has the following
- keys:
- count-shares-good: the number of good shares that were found
- count-shares-needed: 'k', the number of shares required for recovery
- count-shares-expected: 'N', the number of total shares generated
- count-good-share-hosts: this was intended to be the number of distinct
- storage servers with good shares. It is currently
- (as of Tahoe-LAFS v1.8.0) computed incorrectly;
- see ticket #1115.
- count-wrong-shares: for mutable files, the number of shares for
- versions other than the 'best' one (highest
- sequence number, highest roothash). These are
- either old ...
- count-recoverable-versions: for mutable files, the number of
- recoverable versions of the file. For
- a healthy file, this will equal 1.
- count-unrecoverable-versions: for mutable files, the number of
- unrecoverable versions of the file.
- For a healthy file, this will be 0.
- count-corrupt-shares: the number of shares with integrity failures
- list-corrupt-shares: a list of "share locators", one for each share
- that was found to be corrupt. Each share locator
- is a list of (serverid, storage_index, sharenum).
- needs-rebalancing: (bool) True if there are multiple shares on a single
- storage server, indicating a reduction in reliability
- that could be resolved by moving shares to new
- servers.
- servers-responding: list of base32-encoded storage server identifiers,
- one for each server which responded to the share
- query.
- healthy: (bool) True if the file is completely healthy, False otherwise.
- Healthy files have at least N good shares. Overlapping shares
- do not currently cause a file to be marked unhealthy. If there
- are at least N good shares, then corrupt shares do not cause the
- file to be marked unhealthy, although the corrupt shares will be
- listed in the results (list-corrupt-shares) and should be manually
- removed to wasting time in subsequent downloads (as the
- downloader rediscovers the corruption and uses alternate shares).
- Future compatibility: the meaning of this field may change to
- reflect whether the servers-of-happiness criterion is met
- (see ticket #614).
- sharemap: dict mapping share identifier to list of serverids
- (base32-encoded strings). This indicates which servers are
- holding which shares. For immutable files, the shareid is
- an integer (the share number, from 0 to N-1). For
- immutable files, it is a string of the form
- 'seq%d-%s-sh%d', containing the sequence number, the
- roothash, and the share number.
-
-``POST $URL?t=start-deep-check`` (must add &ophandle=XYZ)
-
- This initiates a recursive walk of all files and directories reachable from
- the target, performing a check on each one just like t=check. The result
- page will contain a summary of the results, including details on any
- file/directory that was not fully healthy.
-
- t=start-deep-check can only be invoked on a directory. An error (400
- BAD_REQUEST) will be signalled if it is invoked on a file. The recursive
- walker will deal with loops safely.
-
- This accepts the same verify= and add-lease= arguments as t=check.
-
- Since this operation can take a long time (perhaps a second per object),
- the ophandle= argument is required (see "Slow Operations, Progress, and
- Cancelling" above). The response to this POST will be a redirect to the
- corresponding /operations/$HANDLE page (with output=HTML or output=JSON to
- match the output= argument given to the POST). The deep-check operation
- will continue to run in the background, and the /operations page should be
- used to find out when the operation is done.
-
- Detailed check results for non-healthy files and directories will be
- available under /operations/$HANDLE/$STORAGEINDEX, and the HTML status will
- contain links to these detailed results.
-
- The HTML /operations/$HANDLE page for incomplete operations will contain a
- meta-refresh tag, set to 60 seconds, so that a browser which uses
- deep-check will automatically poll until the operation has completed.
-
- The JSON page (/options/$HANDLE?output=JSON) will contain a
- machine-readable JSON dictionary with the following keys::
-
- finished: a boolean, True if the operation is complete, else False. Some
- of the remaining keys may not be present until the operation
- is complete.
- root-storage-index: a base32-encoded string with the storage index of the
- starting point of the deep-check operation
- count-objects-checked: count of how many objects were checked. Note that
- non-distributed objects (i.e. small immutable LIT
- files) are not checked, since for these objects,
- the data is contained entirely in the URI.
- count-objects-healthy: how many of those objects were completely healthy
- count-objects-unhealthy: how many were damaged in some way
- count-corrupt-shares: how many shares were found to have corruption,
- summed over all objects examined
- list-corrupt-shares: a list of "share identifiers", one for each share
- that was found to be corrupt. Each share identifier
- is a list of (serverid, storage_index, sharenum).
- list-unhealthy-files: a list of (pathname, check-results) tuples, for
- each file that was not fully healthy. 'pathname' is
- a list of strings (which can be joined by "/"
- characters to turn it into a single string),
- relative to the directory on which deep-check was
- invoked. The 'check-results' field is the same as
- that returned by t=check&output=JSON, described
- above.
- stats: a dictionary with the same keys as the t=start-deep-stats command
- (described below)
-
-``POST $URL?t=stream-deep-check``
-
- This initiates a recursive walk of all files and directories reachable from
- the target, performing a check on each one just like t=check. For each
- unique object (duplicates are skipped), a single line of JSON is emitted to
- the HTTP response channel (or an error indication, see below). When the walk
- is complete, a final line of JSON is emitted which contains the accumulated
- file-size/count "deep-stats" data.
-
- This command takes the same arguments as t=start-deep-check.
-
- A CLI tool can split the response stream on newlines into "response units",
- and parse each response unit as JSON. Each such parsed unit will be a
- dictionary, and will contain at least the "type" key: a string, one of
- "file", "directory", or "stats".
-
- For all units that have a type of "file" or "directory", the dictionary will
- contain the following keys::
-
- "path": a list of strings, with the path that is traversed to reach the
- object
- "cap": a write-cap URI for the file or directory, if available, else a
- read-cap URI
- "verifycap": a verify-cap URI for the file or directory
- "repaircap": an URI for the weakest cap that can still be used to repair
- the object
- "storage-index": a base32 storage index for the object
- "check-results": a copy of the dictionary which would be returned by
- t=check&output=json, with three top-level keys:
- "storage-index", "summary", and "results", and a variety
- of counts and sharemaps in the "results" value.
-
- Note that non-distributed files (i.e. LIT files) will have values of None
- for verifycap, repaircap, and storage-index, since these files can neither
- be verified nor repaired, and are not stored on the storage servers.
- Likewise the check-results dictionary will be limited: an empty string for
- storage-index, and a results dictionary with only the "healthy" key.
-
- The last unit in the stream will have a type of "stats", and will contain
- the keys described in the "start-deep-stats" operation, below.
-
- If any errors occur during the traversal (specifically if a directory is
- unrecoverable, such that further traversal is not possible), an error
- indication is written to the response body, instead of the usual line of
- JSON. This error indication line will begin with the string "ERROR:" (in all
- caps), and contain a summary of the error on the rest of the line. The
- remaining lines of the response body will be a python exception. The client
- application should look for the ERROR: and stop processing JSON as soon as
- it is seen. Note that neither a file being unrecoverable nor a directory
- merely being unhealthy will cause traversal to stop. The line just before
- the ERROR: will describe the directory that was untraversable, since the
- unit is emitted to the HTTP response body before the child is traversed.
-
-
-``POST $URL?t=check&repair=true``
-
- This performs a health check of the given file or directory, and if the
- checker determines that the object is not healthy (some shares are missing
- or corrupted), it will perform a "repair". During repair, any missing
- shares will be regenerated and uploaded to new servers.
-
- This accepts the same verify=true and add-lease= arguments as t=check. When
- an output=JSON argument is provided, the machine-readable JSON response
- will contain the following keys::
-
- storage-index: a base32-encoded string with the objects's storage index,
- or an empty string for LIT files
- repair-attempted: (bool) True if repair was attempted
- repair-successful: (bool) True if repair was attempted and the file was
- fully healthy afterwards. False if no repair was
- attempted, or if a repair attempt failed.
- pre-repair-results: a dictionary that describes the state of the file
- before any repair was performed. This contains exactly
- the same keys as the 'results' value of the t=check
- response, described above.
- post-repair-results: a dictionary that describes the state of the file
- after any repair was performed. If no repair was
- performed, post-repair-results and pre-repair-results
- will be the same. This contains exactly the same keys
- as the 'results' value of the t=check response,
- described above.
-
-``POST $URL?t=start-deep-check&repair=true`` (must add &ophandle=XYZ)
-
- This triggers a recursive walk of all files and directories, performing a
- t=check&repair=true on each one.
-
- Like t=start-deep-check without the repair= argument, this can only be
- invoked on a directory. An error (400 BAD_REQUEST) will be signalled if it
- is invoked on a file. The recursive walker will deal with loops safely.
-
- This accepts the same verify= and add-lease= arguments as
- t=start-deep-check. It uses the same ophandle= mechanism as
- start-deep-check. When an output=JSON argument is provided, the response
- will contain the following keys::
-
- finished: (bool) True if the operation has completed, else False
- root-storage-index: a base32-encoded string with the storage index of the
- starting point of the deep-check operation
- count-objects-checked: count of how many objects were checked
-
- count-objects-healthy-pre-repair: how many of those objects were completely
- healthy, before any repair
- count-objects-unhealthy-pre-repair: how many were damaged in some way
- count-objects-healthy-post-repair: how many of those objects were completely
- healthy, after any repair
- count-objects-unhealthy-post-repair: how many were damaged in some way
-
- count-repairs-attempted: repairs were attempted on this many objects.
- count-repairs-successful: how many repairs resulted in healthy objects
- count-repairs-unsuccessful: how many repairs resulted did not results in
- completely healthy objects
- count-corrupt-shares-pre-repair: how many shares were found to have
- corruption, summed over all objects
- examined, before any repair
- count-corrupt-shares-post-repair: how many shares were found to have
- corruption, summed over all objects
- examined, after any repair
- list-corrupt-shares: a list of "share identifiers", one for each share
- that was found to be corrupt (before any repair).
- Each share identifier is a list of (serverid,
- storage_index, sharenum).
- list-remaining-corrupt-shares: like list-corrupt-shares, but mutable shares
- that were successfully repaired are not
- included. These are shares that need
- manual processing. Since immutable shares
- cannot be modified by clients, all corruption
- in immutable shares will be listed here.
- list-unhealthy-files: a list of (pathname, check-results) tuples, for
- each file that was not fully healthy. 'pathname' is
- relative to the directory on which deep-check was
- invoked. The 'check-results' field is the same as
- that returned by t=check&repair=true&output=JSON,
- described above.
- stats: a dictionary with the same keys as the t=start-deep-stats command
- (described below)
-
-``POST $URL?t=stream-deep-check&repair=true``
-
- This triggers a recursive walk of all files and directories, performing a
- t=check&repair=true on each one. For each unique object (duplicates are
- skipped), a single line of JSON is emitted to the HTTP response channel (or
- an error indication). When the walk is complete, a final line of JSON is
- emitted which contains the accumulated file-size/count "deep-stats" data.
-
- This emits the same data as t=stream-deep-check (without the repair=true),
- except that the "check-results" field is replaced with a
- "check-and-repair-results" field, which contains the keys returned by
- t=check&repair=true&output=json (i.e. repair-attempted, repair-successful,
- pre-repair-results, and post-repair-results). The output does not contain
- the summary dictionary that is provied by t=start-deep-check&repair=true
- (the one with count-objects-checked and list-unhealthy-files), since the
- receiving client is expected to calculate those values itself from the
- stream of per-object check-and-repair-results.
-
- Note that the "ERROR:" indication will only be emitted if traversal stops,
- which will only occur if an unrecoverable directory is encountered. If a
- file or directory repair fails, the traversal will continue, and the repair
- failure will be indicated in the JSON data (in the "repair-successful" key).
-
-``POST $DIRURL?t=start-manifest`` (must add &ophandle=XYZ)
-
- This operation generates a "manfest" of the given directory tree, mostly
- for debugging. This is a table of (path, filecap/dircap), for every object
- reachable from the starting directory. The path will be slash-joined, and
- the filecap/dircap will contain a link to the object in question. This page
- gives immediate access to every object in the virtual filesystem subtree.
-
- This operation uses the same ophandle= mechanism as deep-check. The
- corresponding /operations/$HANDLE page has three different forms. The
- default is output=HTML.
-
- If output=text is added to the query args, the results will be a text/plain
- list. The first line is special: it is either "finished: yes" or "finished:
- no"; if the operation is not finished, you must periodically reload the
- page until it completes. The rest of the results are a plaintext list, with
- one file/dir per line, slash-separated, with the filecap/dircap separated
- by a space.
-
- If output=JSON is added to the queryargs, then the results will be a
- JSON-formatted dictionary with six keys. Note that because large directory
- structures can result in very large JSON results, the full results will not
- be available until the operation is complete (i.e. until output["finished"]
- is True)::
-
- finished (bool): if False then you must reload the page until True
- origin_si (base32 str): the storage index of the starting point
- manifest: list of (path, cap) tuples, where path is a list of strings.
- verifycaps: list of (printable) verify cap strings
- storage-index: list of (base32) storage index strings
- stats: a dictionary with the same keys as the t=start-deep-stats command
- (described below)
-
-``POST $DIRURL?t=start-deep-size`` (must add &ophandle=XYZ)
-
- This operation generates a number (in bytes) containing the sum of the
- filesize of all directories and immutable files reachable from the given
- directory. This is a rough lower bound of the total space consumed by this
- subtree. It does not include space consumed by mutable files, nor does it
- take expansion or encoding overhead into account. Later versions of the
- code may improve this estimate upwards.
-
- The /operations/$HANDLE status output consists of two lines of text::
-
- finished: yes
- size: 1234
-
-``POST $DIRURL?t=start-deep-stats`` (must add &ophandle=XYZ)
-
- This operation performs a recursive walk of all files and directories
- reachable from the given directory, and generates a collection of
- statistics about those objects.
-
- The result (obtained from the /operations/$OPHANDLE page) is a
- JSON-serialized dictionary with the following keys (note that some of these
- keys may be missing until 'finished' is True)::
-
- finished: (bool) True if the operation has finished, else False
- count-immutable-files: count of how many CHK files are in the set
- count-mutable-files: same, for mutable files (does not include directories)
- count-literal-files: same, for LIT files (data contained inside the URI)
- count-files: sum of the above three
- count-directories: count of directories
- count-unknown: count of unrecognized objects (perhaps from the future)
- size-immutable-files: total bytes for all CHK files in the set, =deep-size
- size-mutable-files (TODO): same, for current version of all mutable files
- size-literal-files: same, for LIT files
- size-directories: size of directories (includes size-literal-files)
- size-files-histogram: list of (minsize, maxsize, count) buckets,
- with a histogram of filesizes, 5dB/bucket,
- for both literal and immutable files
- largest-directory: number of children in the largest directory
- largest-immutable-file: number of bytes in the largest CHK file
-
- size-mutable-files is not implemented, because it would require extra
- queries to each mutable file to get their size. This may be implemented in
- the future.
-
- Assuming no sharing, the basic space consumed by a single root directory is
- the sum of size-immutable-files, size-mutable-files, and size-directories.
- The actual disk space used by the shares is larger, because of the
- following sources of overhead::
-
- integrity data
- expansion due to erasure coding
- share management data (leases)
- backend (ext3) minimum block size
-
-``POST $URL?t=stream-manifest``
-
- This operation performs a recursive walk of all files and directories
- reachable from the given starting point. For each such unique object
- (duplicates are skipped), a single line of JSON is emitted to the HTTP
- response channel (or an error indication, see below). When the walk is
- complete, a final line of JSON is emitted which contains the accumulated
- file-size/count "deep-stats" data.
-
- A CLI tool can split the response stream on newlines into "response units",
- and parse each response unit as JSON. Each such parsed unit will be a
- dictionary, and will contain at least the "type" key: a string, one of
- "file", "directory", or "stats".
-
- For all units that have a type of "file" or "directory", the dictionary will
- contain the following keys::
-
- "path": a list of strings, with the path that is traversed to reach the
- object
- "cap": a write-cap URI for the file or directory, if available, else a
- read-cap URI
- "verifycap": a verify-cap URI for the file or directory
- "repaircap": an URI for the weakest cap that can still be used to repair
- the object
- "storage-index": a base32 storage index for the object
-
- Note that non-distributed files (i.e. LIT files) will have values of None
- for verifycap, repaircap, and storage-index, since these files can neither
- be verified nor repaired, and are not stored on the storage servers.
-
- The last unit in the stream will have a type of "stats", and will contain
- the keys described in the "start-deep-stats" operation, below.
-
- If any errors occur during the traversal (specifically if a directory is
- unrecoverable, such that further traversal is not possible), an error
- indication is written to the response body, instead of the usual line of
- JSON. This error indication line will begin with the string "ERROR:" (in all
- caps), and contain a summary of the error on the rest of the line. The
- remaining lines of the response body will be a python exception. The client
- application should look for the ERROR: and stop processing JSON as soon as
- it is seen. The line just before the ERROR: will describe the directory that
- was untraversable, since the manifest entry is emitted to the HTTP response
- body before the child is traversed.
-
-Other Useful Pages
-==================
-
-The portion of the web namespace that begins with "/uri" (and "/named") is
-dedicated to giving users (both humans and programs) access to the Tahoe
-virtual filesystem. The rest of the namespace provides status information
-about the state of the Tahoe node.
-
-``GET /`` (the root page)
-
-This is the "Welcome Page", and contains a few distinct sections::
-
- Node information: library versions, local nodeid, services being provided.
-
- Filesystem Access Forms: create a new directory, view a file/directory by
- URI, upload a file (unlinked), download a file by
- URI.
-
- Grid Status: introducer information, helper information, connected storage
- servers.
-
-``GET /status/``
-
- This page lists all active uploads and downloads, and contains a short list
- of recent upload/download operations. Each operation has a link to a page
- that describes file sizes, servers that were involved, and the time consumed
- in each phase of the operation.
-
- A GET of /status/?t=json will contain a machine-readable subset of the same
- data. It returns a JSON-encoded dictionary. The only key defined at this
- time is "active", with a value that is a list of operation dictionaries, one
- for each active operation. Once an operation is completed, it will no longer
- appear in data["active"] .
-
- Each op-dict contains a "type" key, one of "upload", "download",
- "mapupdate", "publish", or "retrieve" (the first two are for immutable
- files, while the latter three are for mutable files and directories).
-
- The "upload" op-dict will contain the following keys::
-
- type (string): "upload"
- storage-index-string (string): a base32-encoded storage index
- total-size (int): total size of the file
- status (string): current status of the operation
- progress-hash (float): 1.0 when the file has been hashed
- progress-ciphertext (float): 1.0 when the file has been encrypted.
- progress-encode-push (float): 1.0 when the file has been encoded and
- pushed to the storage servers. For helper
- uploads, the ciphertext value climbs to 1.0
- first, then encoding starts. For unassisted
- uploads, ciphertext and encode-push progress
- will climb at the same pace.
-
- The "download" op-dict will contain the following keys::
-
- type (string): "download"
- storage-index-string (string): a base32-encoded storage index
- total-size (int): total size of the file
- status (string): current status of the operation
- progress (float): 1.0 when the file has been fully downloaded
-
- Front-ends which want to report progress information are advised to simply
- average together all the progress-* indicators. A slightly more accurate
- value can be found by ignoring the progress-hash value (since the current
- implementation hashes synchronously, so clients will probably never see
- progress-hash!=1.0).
-
-``GET /provisioning/``
-
- This page provides a basic tool to predict the likely storage and bandwidth
- requirements of a large Tahoe grid. It provides forms to input things like
- total number of users, number of files per user, average file size, number
- of servers, expansion ratio, hard drive failure rate, etc. It then provides
- numbers like how many disks per server will be needed, how many read
- operations per second should be expected, and the likely MTBF for files in
- the grid. This information is very preliminary, and the model upon which it
- is based still needs a lot of work.
-
-``GET /helper_status/``
-
- If the node is running a helper (i.e. if [helper]enabled is set to True in
- tahoe.cfg), then this page will provide a list of all the helper operations
- currently in progress. If "?t=json" is added to the URL, it will return a
- JSON-formatted list of helper statistics, which can then be used to produce
- graphs to indicate how busy the helper is.
-
-``GET /statistics/``
-
- This page provides "node statistics", which are collected from a variety of
- sources::
-
- load_monitor: every second, the node schedules a timer for one second in
- the future, then measures how late the subsequent callback
- is. The "load_average" is this tardiness, measured in
- seconds, averaged over the last minute. It is an indication
- of a busy node, one which is doing more work than can be
- completed in a timely fashion. The "max_load" value is the
- highest value that has been seen in the last 60 seconds.
-
- cpu_monitor: every minute, the node uses time.clock() to measure how much
- CPU time it has used, and it uses this value to produce
- 1min/5min/15min moving averages. These values range from 0%
- (0.0) to 100% (1.0), and indicate what fraction of the CPU
- has been used by the Tahoe node. Not all operating systems
- provide meaningful data to time.clock(): they may report 100%
- CPU usage at all times.
-
- uploader: this counts how many immutable files (and bytes) have been
- uploaded since the node was started
-
- downloader: this counts how many immutable files have been downloaded
- since the node was started
-
- publishes: this counts how many mutable files (including directories) have
- been modified since the node was started
-
- retrieves: this counts how many mutable files (including directories) have
- been read since the node was started
-
- There are other statistics that are tracked by the node. The "raw stats"
- section shows a formatted dump of all of them.
-
- By adding "?t=json" to the URL, the node will return a JSON-formatted
- dictionary of stats values, which can be used by other tools to produce
- graphs of node behavior. The misc/munin/ directory in the source
- distribution provides some tools to produce these graphs.
-
-``GET /`` (introducer status)
-
- For Introducer nodes, the welcome page displays information about both
- clients and servers which are connected to the introducer. Servers make
- "service announcements", and these are listed in a table. Clients will
- subscribe to hear about service announcements, and these subscriptions are
- listed in a separate table. Both tables contain information about what
- version of Tahoe is being run by the remote node, their advertised and
- outbound IP addresses, their nodeid and nickname, and how long they have
- been available.
-
- By adding "?t=json" to the URL, the node will return a JSON-formatted
- dictionary of stats values, which can be used to produce graphs of connected
- clients over time. This dictionary has the following keys::
-
- ["subscription_summary"] : a dictionary mapping service name (like
- "storage") to an integer with the number of
- clients that have subscribed to hear about that
- service
- ["announcement_summary"] : a dictionary mapping service name to an integer
- with the number of servers which are announcing
- that service
- ["announcement_distinct_hosts"] : a dictionary mapping service name to an
- integer which represents the number of
- distinct hosts that are providing that
- service. If two servers have announced
- FURLs which use the same hostnames (but
- different ports and tubids), they are
- considered to be on the same host.
-
-
-Static Files in /public_html
-============================
-
-The webapi server will take any request for a URL that starts with /static
-and serve it from a configurable directory which defaults to
-$BASEDIR/public_html . This is configured by setting the "[node]web.static"
-value in $BASEDIR/tahoe.cfg . If this is left at the default value of
-"public_html", then http://localhost:3456/static/subdir/foo.html will be
-served with the contents of the file $BASEDIR/public_html/subdir/foo.html .
-
-This can be useful to serve a javascript application which provides a
-prettier front-end to the rest of the Tahoe webapi.
-
-
-Safety and security issues -- names vs. URIs
-============================================
-
-Summary: use explicit file- and dir- caps whenever possible, to reduce the
-potential for surprises when the filesystem structure is changed.
-
-Tahoe provides a mutable filesystem, but the ways that the filesystem can
-change are limited. The only thing that can change is that the mapping from
-child names to child objects that each directory contains can be changed by
-adding a new child name pointing to an object, removing an existing child name,
-or changing an existing child name to point to a different object.
-
-Obviously if you query Tahoe for information about the filesystem and then act
-to change the filesystem (such as by getting a listing of the contents of a
-directory and then adding a file to the directory), then the filesystem might
-have been changed after you queried it and before you acted upon it. However,
-if you use the URI instead of the pathname of an object when you act upon the
-object, then the only change that can happen is if the object is a directory
-then the set of child names it has might be different. If, on the other hand,
-you act upon the object using its pathname, then a different object might be in
-that place, which can result in more kinds of surprises.
-
-For example, suppose you are writing code which recursively downloads the
-contents of a directory. The first thing your code does is fetch the listing
-of the contents of the directory. For each child that it fetched, if that
-child is a file then it downloads the file, and if that child is a directory
-then it recurses into that directory. Now, if the download and the recurse
-actions are performed using the child's name, then the results might be
-wrong, because for example a child name that pointed to a sub-directory when
-you listed the directory might have been changed to point to a file (in which
-case your attempt to recurse into it would result in an error and the file
-would be skipped), or a child name that pointed to a file when you listed the
-directory might now point to a sub-directory (in which case your attempt to
-download the child would result in a file containing HTML text describing the
-sub-directory!).
-
-If your recursive algorithm uses the uri of the child instead of the name of
-the child, then those kinds of mistakes just can't happen. Note that both the
-child's name and the child's URI are included in the results of listing the
-parent directory, so it isn't any harder to use the URI for this purpose.
-
-The read and write caps in a given directory node are separate URIs, and
-can't be assumed to point to the same object even if they were retrieved in
-the same operation (although the webapi server attempts to ensure this
-in most cases). If you need to rely on that property, you should explicitly
-verify it. More generally, you should not make assumptions about the
-internal consistency of the contents of mutable directories. As a result
-of the signatures on mutable object versions, it is guaranteed that a given
-version was written in a single update, but -- as in the case of a file --
-the contents may have been chosen by a malicious writer in a way that is
-designed to confuse applications that rely on their consistency.
-
-In general, use names if you want "whatever object (whether file or
-directory) is found by following this name (or sequence of names) when my
-request reaches the server". Use URIs if you want "this particular object".
-
-Concurrency Issues
-==================
-
-Tahoe uses both mutable and immutable files. Mutable files can be created
-explicitly by doing an upload with ?mutable=true added, or implicitly by
-creating a new directory (since a directory is just a special way to
-interpret a given mutable file).
-
-Mutable files suffer from the same consistency-vs-availability tradeoff that
-all distributed data storage systems face. It is not possible to
-simultaneously achieve perfect consistency and perfect availability in the
-face of network partitions (servers being unreachable or faulty).
-
-Tahoe tries to achieve a reasonable compromise, but there is a basic rule in
-place, known as the Prime Coordination Directive: "Don't Do That". What this
-means is that if write-access to a mutable file is available to several
-parties, then those parties are responsible for coordinating their activities
-to avoid multiple simultaneous updates. This could be achieved by having
-these parties talk to each other and using some sort of locking mechanism, or
-by serializing all changes through a single writer.
-
-The consequences of performing uncoordinated writes can vary. Some of the
-writers may lose their changes, as somebody else wins the race condition. In
-many cases the file will be left in an "unhealthy" state, meaning that there
-are not as many redundant shares as we would like (reducing the reliability
-of the file against server failures). In the worst case, the file can be left
-in such an unhealthy state that no version is recoverable, even the old ones.
-It is this small possibility of data loss that prompts us to issue the Prime
-Coordination Directive.
-
-Tahoe nodes implement internal serialization to make sure that a single Tahoe
-node cannot conflict with itself. For example, it is safe to issue two
-directory modification requests to a single tahoe node's webapi server at the
-same time, because the Tahoe node will internally delay one of them until
-after the other has finished being applied. (This feature was introduced in
-Tahoe-1.1; back with Tahoe-1.0 the web client was responsible for serializing
-web requests themselves).
-
-For more details, please see the "Consistency vs Availability" and "The Prime
-Coordination Directive" sections of mutable.txt, in the same directory as
-this file.
-
-
-.. [1] URLs and HTTP and UTF-8, Oh My
-
- HTTP does not provide a mechanism to specify the character set used to
- encode non-ascii names in URLs (rfc2396#2.1). We prefer the convention that
- the filename= argument shall be a URL-encoded UTF-8 encoded unicode object.
- For example, suppose we want to provoke the server into using a filename of
- "f i a n c e-acute e" (i.e. F I A N C U+00E9 E). The UTF-8 encoding of this
- is 0x66 0x69 0x61 0x6e 0x63 0xc3 0xa9 0x65 (or "fianc\xC3\xA9e", as python's
- repr() function would show). To encode this into a URL, the non-printable
- characters must be escaped with the urlencode '%XX' mechansim, giving us
- "fianc%C3%A9e". Thus, the first line of the HTTP request will be "GET
- /uri/CAP...?save=true&filename=fianc%C3%A9e HTTP/1.1". Not all browsers
- provide this: IE7 uses the Latin-1 encoding, which is fianc%E9e.
-
- The response header will need to indicate a non-ASCII filename. The actual
- mechanism to do this is not clear. For ASCII filenames, the response header
- would look like::
-
- Content-Disposition: attachment; filename="english.txt"
-
- If Tahoe were to enforce the utf-8 convention, it would need to decode the
- URL argument into a unicode string, and then encode it back into a sequence
- of bytes when creating the response header. One possibility would be to use
- unencoded utf-8. Developers suggest that IE7 might accept this::
-
- #1: Content-Disposition: attachment; filename="fianc\xC3\xA9e"
- (note, the last four bytes of that line, not including the newline, are
- 0xC3 0xA9 0x65 0x22)
-
- RFC2231#4 (dated 1997): suggests that the following might work, and some
- developers (http://markmail.org/message/dsjyokgl7hv64ig3) have reported that
- it is supported by firefox (but not IE7)::
-
- #2: Content-Disposition: attachment; filename*=utf-8''fianc%C3%A9e
-
- My reading of RFC2616#19.5.1 (which defines Content-Disposition) says that
- the filename= parameter is defined to be wrapped in quotes (presumeably to
- allow spaces without breaking the parsing of subsequent parameters), which
- would give us::
-
- #3: Content-Disposition: attachment; filename*=utf-8''"fianc%C3%A9e"
-
- However this is contrary to the examples in the email thread listed above.
-
- Developers report that IE7 (when it is configured for UTF-8 URL encoding,
- which is not the default in asian countries), will accept::
-
- #4: Content-Disposition: attachment; filename=fianc%C3%A9e
-
- However, for maximum compatibility, Tahoe simply copies bytes from the URL
- into the response header, rather than enforcing the utf-8 convention. This
- means it does not try to decode the filename from the URL argument, nor does
- it encode the filename into the response header.
--- /dev/null
+===================
+URI Extension Block
+===================
+
+This block is a serialized dictionary with string keys and string values
+(some of which represent numbers, some of which are SHA-256 hashes). All
+buckets hold an identical copy. The hash of the serialized data is kept in
+the URI.
+
+The download process must obtain a valid copy of this data before any
+decoding can take place. The download process must also obtain other data
+before incremental validation can be performed. Full-file validation (for
+clients who do not wish to do incremental validation) can be performed solely
+with the data from this block.
+
+At the moment, this data block contains the following keys (and an estimate
+on their sizes)::
+
+ size 5
+ segment_size 7
+ num_segments 2
+ needed_shares 2
+ total_shares 3
+
+ codec_name 3
+ codec_params 5+1+2+1+3=12
+ tail_codec_params 12
+
+ share_root_hash 32 (binary) or 52 (base32-encoded) each
+ plaintext_hash
+ plaintext_root_hash
+ crypttext_hash
+ crypttext_root_hash
+
+Some pieces are needed elsewhere (size should be visible without pulling the
+block, the Tahoe3 algorithm needs total_shares to find the right peers, all
+peer selection algorithms need needed_shares to ask a minimal set of peers).
+Some pieces are arguably redundant but are convenient to have present
+(test_encode.py makes use of num_segments).
+
+The rule for this data block is that it should be a constant size for all
+files, regardless of file size. Therefore hash trees (which have a size that
+depends linearly upon the number of segments) are stored elsewhere in the
+bucket, with only the hash tree root stored in this data block.
+
+This block will be serialized as follows::
+
+ assert that all keys match ^[a-zA-z_\-]+$
+ sort all the keys lexicographically
+ for k in keys:
+ write("%s:" % k)
+ write(netstring(data[k]))
+
+
+Serialized size::
+
+ dense binary (but decimal) packing: 160+46=206
+ including 'key:' (185) and netstring (6*3+7*4=46) on values: 231
+ including 'key:%d\n' (185+13=198) and printable values (46+5*52=306)=504
+
+We'll go with the 231-sized block, and provide a tool to dump it as text if
+we really want one.
+++ /dev/null
-===================
-URI Extension Block
-===================
-
-This block is a serialized dictionary with string keys and string values
-(some of which represent numbers, some of which are SHA-256 hashes). All
-buckets hold an identical copy. The hash of the serialized data is kept in
-the URI.
-
-The download process must obtain a valid copy of this data before any
-decoding can take place. The download process must also obtain other data
-before incremental validation can be performed. Full-file validation (for
-clients who do not wish to do incremental validation) can be performed solely
-with the data from this block.
-
-At the moment, this data block contains the following keys (and an estimate
-on their sizes)::
-
- size 5
- segment_size 7
- num_segments 2
- needed_shares 2
- total_shares 3
-
- codec_name 3
- codec_params 5+1+2+1+3=12
- tail_codec_params 12
-
- share_root_hash 32 (binary) or 52 (base32-encoded) each
- plaintext_hash
- plaintext_root_hash
- crypttext_hash
- crypttext_root_hash
-
-Some pieces are needed elsewhere (size should be visible without pulling the
-block, the Tahoe3 algorithm needs total_shares to find the right peers, all
-peer selection algorithms need needed_shares to ask a minimal set of peers).
-Some pieces are arguably redundant but are convenient to have present
-(test_encode.py makes use of num_segments).
-
-The rule for this data block is that it should be a constant size for all
-files, regardless of file size. Therefore hash trees (which have a size that
-depends linearly upon the number of segments) are stored elsewhere in the
-bucket, with only the hash tree root stored in this data block.
-
-This block will be serialized as follows::
-
- assert that all keys match ^[a-zA-z_\-]+$
- sort all the keys lexicographically
- for k in keys:
- write("%s:" % k)
- write(netstring(data[k]))
-
-
-Serialized size::
-
- dense binary (but decimal) packing: 160+46=206
- including 'key:' (185) and netstring (6*3+7*4=46) on values: 231
- including 'key:%d\n' (185+13=198) and printable values (46+5*52=306)=504
-
-We'll go with the 231-sized block, and provide a tool to dump it as text if
-we really want one.
--- /dev/null
+==========================
+Tahoe-LAFS Directory Nodes
+==========================
+
+As explained in the architecture docs, Tahoe-LAFS can be roughly viewed as
+a collection of three layers. The lowest layer is the key-value store: it
+provides operations that accept files and upload them to the grid, creating
+a URI in the process which securely references the file's contents.
+The middle layer is the filesystem, creating a structure of directories and
+filenames resembling the traditional unix/windows filesystems. The top layer
+is the application layer, which uses the lower layers to provide useful
+services to users, like a backup application, or a way to share files with
+friends.
+
+This document examines the middle layer, the "filesystem".
+
+1. `Key-value Store Primitives`_
+2. `Filesystem goals`_
+3. `Dirnode goals`_
+4. `Dirnode secret values`_
+5. `Dirnode storage format`_
+6. `Dirnode sizes, mutable-file initial read sizes`_
+7. `Design Goals, redux`_
+
+ 1. `Confidentiality leaks in the storage servers`_
+ 2. `Integrity failures in the storage servers`_
+ 3. `Improving the efficiency of dirnodes`_
+ 4. `Dirnode expiration and leases`_
+
+8. `Starting Points: root dirnodes`_
+9. `Mounting and Sharing Directories`_
+10. `Revocation`_
+
+Key-value Store Primitives
+==========================
+
+In the lowest layer (key-value store), there are two operations that reference
+immutable data (which we refer to as "CHK URIs" or "CHK read-capabilities" or
+"CHK read-caps"). One puts data into the grid (but only if it doesn't exist
+already), the other retrieves it::
+
+ chk_uri = put(data)
+ data = get(chk_uri)
+
+We also have three operations which reference mutable data (which we refer to
+as "mutable slots", or "mutable write-caps and read-caps", or sometimes "SSK
+slots"). One creates a slot with some initial contents, a second replaces the
+contents of a pre-existing slot, and the third retrieves the contents::
+
+ mutable_uri = create(initial_data)
+ replace(mutable_uri, new_data)
+ data = get(mutable_uri)
+
+Filesystem Goals
+================
+
+The main goal for the middle (filesystem) layer is to give users a way to
+organize the data that they have uploaded into the grid. The traditional way
+to do this in computer filesystems is to put this data into files, give those
+files names, and collect these names into directories.
+
+Each directory is a set of name-entry pairs, each of which maps a "child name"
+to a directory entry pointing to an object of some kind. Those child objects
+might be files, or they might be other directories. Each directory entry also
+contains metadata.
+
+The directory structure is therefore a directed graph of nodes, in which each
+node might be a directory node or a file node. All file nodes are terminal
+nodes.
+
+Dirnode Goals
+=============
+
+What properties might be desirable for these directory nodes? In no
+particular order:
+
+1. functional. Code which does not work doesn't count.
+2. easy to document, explain, and understand
+3. confidential: it should not be possible for others to see the contents of
+ a directory
+4. integrity: it should not be possible for others to modify the contents
+ of a directory
+5. available: directories should survive host failure, just like files do
+6. efficient: in storage, communication bandwidth, number of round-trips
+7. easy to delegate individual directories in a flexible way
+8. updateness: everybody looking at a directory should see the same contents
+9. monotonicity: everybody looking at a directory should see the same
+ sequence of updates
+
+Some of these goals are mutually exclusive. For example, availability and
+consistency are opposing, so it is not possible to achieve #5 and #8 at the
+same time. Moreover, it takes a more complex architecture to get close to the
+available-and-consistent ideal, so #2/#6 is in opposition to #5/#8.
+
+Tahoe-LAFS v0.7.0 introduced distributed mutable files, which use public-key
+cryptography for integrity, and erasure coding for availability. These
+achieve roughly the same properties as immutable CHK files, but their
+contents can be replaced without changing their identity. Dirnodes are then
+just a special way of interpreting the contents of a specific mutable file.
+Earlier releases used a "vdrive server": this server was abolished in the
+v0.7.0 release.
+
+For details of how mutable files work, please see "mutable.txt" in this
+directory.
+
+For releases since v0.7.0, we achieve most of our desired properties. The
+integrity and availability of dirnodes is equivalent to that of regular
+(immutable) files, with the exception that there are more simultaneous-update
+failure modes for mutable slots. Delegation is quite strong: you can give
+read-write or read-only access to any subtree, and the data format used for
+dirnodes is such that read-only access is transitive: i.e. if you grant Bob
+read-only access to a parent directory, then Bob will get read-only access
+(and *not* read-write access) to its children.
+
+Relative to the previous "vdrive-server" based scheme, the current
+distributed dirnode approach gives better availability, but cannot guarantee
+updateness quite as well, and requires far more network traffic for each
+retrieval and update. Mutable files are somewhat less available than
+immutable files, simply because of the increased number of combinations
+(shares of an immutable file are either present or not, whereas there are
+multiple versions of each mutable file, and you might have some shares of
+version 1 and other shares of version 2). In extreme cases of simultaneous
+update, mutable files might suffer from non-monotonicity.
+
+
+Dirnode secret values
+=====================
+
+As mentioned before, dirnodes are simply a special way to interpret the
+contents of a mutable file, so the secret keys and capability strings
+described in "mutable.txt" are all the same. Each dirnode contains an RSA
+public/private keypair, and the holder of the "write capability" will be able
+to retrieve the private key (as well as the AES encryption key used for the
+data itself). The holder of the "read capability" will be able to obtain the
+public key and the AES data key, but not the RSA private key needed to modify
+the data.
+
+The "write capability" for a dirnode grants read-write access to its
+contents. This is expressed on concrete form as the "dirnode write cap": a
+printable string which contains the necessary secrets to grant this access.
+Likewise, the "read capability" grants read-only access to a dirnode, and can
+be represented by a "dirnode read cap" string.
+
+For example,
+URI:DIR2:swdi8ge1s7qko45d3ckkyw1aac%3Aar8r5j99a4mezdojejmsfp4fj1zeky9gjigyrid4urxdimego68o
+is a write-capability URI, while
+URI:DIR2-RO:buxjqykt637u61nnmjg7s8zkny:ar8r5j99a4mezdojejmsfp4fj1zeky9gjigyrid4urxdimego68o
+is a read-capability URI, both for the same dirnode.
+
+
+Dirnode storage format
+======================
+
+Each dirnode is stored in a single mutable file, distributed in the Tahoe-LAFS
+grid. The contents of this file are a serialized list of netstrings, one per
+child. Each child is a list of four netstrings: (name, rocap, rwcap,
+metadata). (Remember that the contents of the mutable file are encrypted by
+the read-cap, so this section describes the plaintext contents of the mutable
+file, *after* it has been decrypted by the read-cap.)
+
+The name is simple a UTF-8 -encoded child name. The 'rocap' is a read-only
+capability URI to that child, either an immutable (CHK) file, a mutable file,
+or a directory. It is also possible to store 'unknown' URIs that are not
+recognized by the current version of Tahoe-LAFS. The 'rwcap' is a read-write
+capability URI for that child, encrypted with the dirnode's write-cap: this
+enables the "transitive readonlyness" property, described further below. The
+'metadata' is a JSON-encoded dictionary of type,value metadata pairs. Some
+metadata keys are pre-defined, the rest are left up to the application.
+
+Each rwcap is stored as IV + ciphertext + MAC. The IV is a 16-byte random
+value. The ciphertext is obtained by using AES in CTR mode on the rwcap URI
+string, using a key that is formed from a tagged hash of the IV and the
+dirnode's writekey. The MAC is written only for compatibility with older
+Tahoe-LAFS versions and is no longer verified.
+
+If Bob has read-only access to the 'bar' directory, and he adds it as a child
+to the 'foo' directory, then he will put the read-only cap for 'bar' in both
+the rwcap and rocap slots (encrypting the rwcap contents as described above).
+If he has full read-write access to 'bar', then he will put the read-write
+cap in the 'rwcap' slot, and the read-only cap in the 'rocap' slot. Since
+other users who have read-only access to 'foo' will be unable to decrypt its
+rwcap slot, this limits those users to read-only access to 'bar' as well,
+thus providing the transitive readonlyness that we desire.
+
+Dirnode sizes, mutable-file initial read sizes
+==============================================
+
+How big are dirnodes? When reading dirnode data out of mutable files, how
+large should our initial read be? If we guess exactly, we can read a dirnode
+in a single round-trip, and update one in two RTT. If we guess too high,
+we'll waste some amount of bandwidth. If we guess low, we need to make a
+second pass to get the data (or the encrypted privkey, for writes), which
+will cost us at least another RTT.
+
+Assuming child names are between 10 and 99 characters long, how long are the
+various pieces of a dirnode?
+
+::
+
+ netstring(name) ~= 4+len(name)
+ chk-cap = 97 (for 4-char filesizes)
+ dir-rw-cap = 88
+ dir-ro-cap = 91
+ netstring(cap) = 4+len(cap)
+ encrypted(cap) = 16+cap+32
+ JSON({}) = 2
+ JSON({ctime=float,mtime=float,'tahoe':{linkcrtime=float,linkmotime=float}}): 137
+ netstring(metadata) = 4+137 = 141
+
+so a CHK entry is::
+
+ 5+ 4+len(name) + 4+97 + 5+16+97+32 + 4+137
+
+And a 15-byte filename gives a 416-byte entry. When the entry points at a
+subdirectory instead of a file, the entry is a little bit smaller. So an
+empty directory uses 0 bytes, a directory with one child uses about 416
+bytes, a directory with two children uses about 832, etc.
+
+When the dirnode data is encoding using our default 3-of-10, that means we
+get 139ish bytes of data in each share per child.
+
+The pubkey, signature, and hashes form the first 935ish bytes of the
+container, then comes our data, then about 1216 bytes of encprivkey. So if we
+read the first::
+
+ 1kB: we get 65bytes of dirnode data : only empty directories
+ 2kB: 1065bytes: about 8
+ 3kB: 2065bytes: about 15 entries, or 6 entries plus the encprivkey
+ 4kB: 3065bytes: about 22 entries, or about 13 plus the encprivkey
+
+So we've written the code to do an initial read of 4kB from each share when
+we read the mutable file, which should give good performance (one RTT) for
+small directories.
+
+
+Design Goals, redux
+===================
+
+How well does this design meet the goals?
+
+1. functional: YES: the code works and has extensive unit tests
+2. documentable: YES: this document is the existence proof
+3. confidential: YES: see below
+4. integrity: MOSTLY: a coalition of storage servers can rollback individual
+ mutable files, but not a single one. No server can
+ substitute fake data as genuine.
+5. availability: YES: as long as 'k' storage servers are present and have
+ the same version of the mutable file, the dirnode will
+ be available.
+6. efficient: MOSTLY:
+ network: single dirnode lookup is very efficient, since clients can
+ fetch specific keys rather than being required to get or set
+ the entire dirnode each time. Traversing many directories
+ takes a lot of roundtrips, and these can't be collapsed with
+ promise-pipelining because the intermediate values must only
+ be visible to the client. Modifying many dirnodes at once
+ (e.g. importing a large pre-existing directory tree) is pretty
+ slow, since each graph edge must be created independently.
+ storage: each child has a separate IV, which makes them larger than
+ if all children were aggregated into a single encrypted string
+7. delegation: VERY: each dirnode is a completely independent object,
+ to which clients can be granted separate read-write or
+ read-only access
+8. updateness: VERY: with only a single point of access, and no caching,
+ each client operation starts by fetching the current
+ value, so there are no opportunities for staleness
+9. monotonicity: VERY: the single point of access also protects against
+ retrograde motion
+
+
+
+Confidentiality leaks in the storage servers
+--------------------------------------------
+
+Dirnode (and the mutable files upon which they are based) are very private
+against other clients: traffic between the client and the storage servers is
+protected by the Foolscap SSL connection, so they can observe very little.
+Storage index values are hashes of secrets and thus unguessable, and they are
+not made public, so other clients cannot snoop through encrypted dirnodes
+that they have not been told about.
+
+Storage servers can observe access patterns and see ciphertext, but they
+cannot see the plaintext (of child names, metadata, or URIs). If an attacker
+operates a significant number of storage servers, they can infer the shape of
+the directory structure by assuming that directories are usually accessed
+from root to leaf in rapid succession. Since filenames are usually much
+shorter than read-caps and write-caps, the attacker can use the length of the
+ciphertext to guess the number of children of each node, and might be able to
+guess the length of the child names (or at least their sum). From this, the
+attacker may be able to build up a graph with the same shape as the plaintext
+filesystem, but with unlabeled edges and unknown file contents.
+
+
+Integrity failures in the storage servers
+-----------------------------------------
+
+The mutable file's integrity mechanism (RSA signature on the hash of the file
+contents) prevents the storage server from modifying the dirnode's contents
+without detection. Therefore the storage servers can make the dirnode
+unavailable, but not corrupt it.
+
+A sufficient number of colluding storage servers can perform a rollback
+attack: replace all shares of the whole mutable file with an earlier version.
+To prevent this, when retrieving the contents of a mutable file, the
+client queries more servers than necessary and uses the highest available
+version number. This insures that one or two misbehaving storage servers
+cannot cause this rollback on their own.
+
+
+Improving the efficiency of dirnodes
+------------------------------------
+
+The current mutable-file -based dirnode scheme suffers from certain
+inefficiencies. A very large directory (with thousands or millions of
+children) will take a significant time to extract any single entry, because
+the whole file must be downloaded first, then parsed and searched to find the
+desired child entry. Likewise, modifying a single child will require the
+whole file to be re-uploaded.
+
+The current design assumes (and in some cases, requires) that dirnodes remain
+small. The mutable files on which dirnodes are based are currently using
+"SDMF" ("Small Distributed Mutable File") design rules, which state that the
+size of the data shall remain below one megabyte. More advanced forms of
+mutable files (MDMF and LDMF) are in the design phase to allow efficient
+manipulation of larger mutable files. This would reduce the work needed to
+modify a single entry in a large directory.
+
+Judicious caching may help improve the reading-large-directory case. Some
+form of mutable index at the beginning of the dirnode might help as well. The
+MDMF design rules allow for efficient random-access reads from the middle of
+the file, which would give the index something useful to point at.
+
+The current SDMF design generates a new RSA public/private keypair for each
+directory. This takes considerable time and CPU effort, generally one or two
+seconds per directory. We have designed (but not yet built) a DSA-based
+mutable file scheme which will use shared parameters to reduce the
+directory-creation effort to a bare minimum (picking a random number instead
+of generating two random primes).
+
+When a backup program is run for the first time, it needs to copy a large
+amount of data from a pre-existing filesystem into reliable storage. This
+means that a large and complex directory structure needs to be duplicated in
+the dirnode layer. With the one-object-per-dirnode approach described here,
+this requires as many operations as there are edges in the imported
+filesystem graph.
+
+Another approach would be to aggregate multiple directories into a single
+storage object. This object would contain a serialized graph rather than a
+single name-to-child dictionary. Most directory operations would fetch the
+whole block of data (and presumeably cache it for a while to avoid lots of
+re-fetches), and modification operations would need to replace the whole
+thing at once. This "realm" approach would have the added benefit of
+combining more data into a single encrypted bundle (perhaps hiding the shape
+of the graph from a determined attacker), and would reduce round-trips when
+performing deep directory traversals (assuming the realm was already cached).
+It would also prevent fine-grained rollback attacks from working: a coalition
+of storage servers could change the entire realm to look like an earlier
+state, but it could not independently roll back individual directories.
+
+The drawbacks of this aggregation would be that small accesses (adding a
+single child, looking up a single child) would require pulling or pushing a
+lot of unrelated data, increasing network overhead (and necessitating
+test-and-set semantics for the modification side, which increases the chances
+that a user operation will fail, making it more challenging to provide
+promises of atomicity to the user).
+
+It would also make it much more difficult to enable the delegation
+("sharing") of specific directories. Since each aggregate "realm" provides
+all-or-nothing access control, the act of delegating any directory from the
+middle of the realm would require the realm first be split into the upper
+piece that isn't being shared and the lower piece that is. This splitting
+would have to be done in response to what is essentially a read operation,
+which is not traditionally supposed to be a high-effort action. On the other
+hand, it may be possible to aggregate the ciphertext, but use distinct
+encryption keys for each component directory, to get the benefits of both
+schemes at once.
+
+
+Dirnode expiration and leases
+-----------------------------
+
+Dirnodes are created any time a client wishes to add a new directory. How
+long do they live? What's to keep them from sticking around forever, taking
+up space that nobody can reach any longer?
+
+Mutable files are created with limited-time "leases", which keep the shares
+alive until the last lease has expired or been cancelled. Clients which know
+and care about specific dirnodes can ask to keep them alive for a while, by
+renewing a lease on them (with a typical period of one month). Clients are
+expected to assist in the deletion of dirnodes by canceling their leases as
+soon as they are done with them. This means that when a client deletes a
+directory, it should also cancel its lease on that directory. When the lease
+count on a given share goes to zero, the storage server can delete the
+related storage. Multiple clients may all have leases on the same dirnode:
+the server may delete the shares only after all of the leases have gone away.
+
+We expect that clients will periodically create a "manifest": a list of
+so-called "refresh capabilities" for all of the dirnodes and files that they
+can reach. They will give this manifest to the "repairer", which is a service
+that keeps files (and dirnodes) alive on behalf of clients who cannot take on
+this responsibility for themselves. These refresh capabilities include the
+storage index, but do *not* include the readkeys or writekeys, so the
+repairer does not get to read the files or directories that it is helping to
+keep alive.
+
+After each change to the user's vdrive, the client creates a manifest and
+looks for differences from their previous version. Anything which was removed
+prompts the client to send out lease-cancellation messages, allowing the data
+to be deleted.
+
+
+Starting Points: root dirnodes
+==============================
+
+Any client can record the URI of a directory node in some external form (say,
+in a local file) and use it as the starting point of later traversal. Each
+Tahoe-LAFS user is expected to create a new (unattached) dirnode when they first
+start using the grid, and record its URI for later use.
+
+Mounting and Sharing Directories
+================================
+
+The biggest benefit of this dirnode approach is that sharing individual
+directories is almost trivial. Alice creates a subdirectory that she wants to
+use to share files with Bob. This subdirectory is attached to Alice's
+filesystem at "~alice/share-with-bob". She asks her filesystem for the
+read-write directory URI for that new directory, and emails it to Bob. When
+Bob receives the URI, he asks his own local vdrive to attach the given URI,
+perhaps at a place named "~bob/shared-with-alice". Every time either party
+writes a file into this directory, the other will be able to read it. If
+Alice prefers, she can give a read-only URI to Bob instead, and then Bob will
+be able to read files but not change the contents of the directory. Neither
+Alice nor Bob will get access to any files above the mounted directory: there
+are no 'parent directory' pointers. If Alice creates a nested set of
+directories, "~alice/share-with-bob/subdir2", and gives a read-only URI to
+share-with-bob to Bob, then Bob will be unable to write to either
+share-with-bob/ or subdir2/.
+
+A suitable UI needs to be created to allow users to easily perform this
+sharing action: dragging a folder their vdrive to an IM or email user icon,
+for example. The UI will need to give the sending user an opportunity to
+indicate whether they want to grant read-write or read-only access to the
+recipient. The recipient then needs an interface to drag the new folder into
+their vdrive and give it a home.
+
+Revocation
+==========
+
+When Alice decides that she no longer wants Bob to be able to access the
+shared directory, what should she do? Suppose she's shared this folder with
+both Bob and Carol, and now she wants Carol to retain access to it but Bob to
+be shut out. Ideally Carol should not have to do anything: her access should
+continue unabated.
+
+The current plan is to have her client create a deep copy of the folder in
+question, delegate access to the new folder to the remaining members of the
+group (Carol), asking the lucky survivors to replace their old reference with
+the new one. Bob may still have access to the old folder, but he is now the
+only one who cares: everyone else has moved on, and he will no longer be able
+to see their new changes. In a strict sense, this is the strongest form of
+revocation that can be accomplished: there is no point trying to force Bob to
+forget about the files that he read a moment before being kicked out. In
+addition it must be noted that anyone who can access the directory can proxy
+for Bob, reading files to him and accepting changes whenever he wants.
+Preventing delegation between communication parties is just as pointless as
+asking Bob to forget previously accessed files. However, there may be value
+to configuring the UI to ask Carol to not share files with Bob, or to
+removing all files from Bob's view at the same time his access is revoked.
+
+++ /dev/null
-==========================
-Tahoe-LAFS Directory Nodes
-==========================
-
-As explained in the architecture docs, Tahoe-LAFS can be roughly viewed as
-a collection of three layers. The lowest layer is the key-value store: it
-provides operations that accept files and upload them to the grid, creating
-a URI in the process which securely references the file's contents.
-The middle layer is the filesystem, creating a structure of directories and
-filenames resembling the traditional unix/windows filesystems. The top layer
-is the application layer, which uses the lower layers to provide useful
-services to users, like a backup application, or a way to share files with
-friends.
-
-This document examines the middle layer, the "filesystem".
-
-1. `Key-value Store Primitives`_
-2. `Filesystem goals`_
-3. `Dirnode goals`_
-4. `Dirnode secret values`_
-5. `Dirnode storage format`_
-6. `Dirnode sizes, mutable-file initial read sizes`_
-7. `Design Goals, redux`_
-
- 1. `Confidentiality leaks in the storage servers`_
- 2. `Integrity failures in the storage servers`_
- 3. `Improving the efficiency of dirnodes`_
- 4. `Dirnode expiration and leases`_
-
-8. `Starting Points: root dirnodes`_
-9. `Mounting and Sharing Directories`_
-10. `Revocation`_
-
-Key-value Store Primitives
-==========================
-
-In the lowest layer (key-value store), there are two operations that reference
-immutable data (which we refer to as "CHK URIs" or "CHK read-capabilities" or
-"CHK read-caps"). One puts data into the grid (but only if it doesn't exist
-already), the other retrieves it::
-
- chk_uri = put(data)
- data = get(chk_uri)
-
-We also have three operations which reference mutable data (which we refer to
-as "mutable slots", or "mutable write-caps and read-caps", or sometimes "SSK
-slots"). One creates a slot with some initial contents, a second replaces the
-contents of a pre-existing slot, and the third retrieves the contents::
-
- mutable_uri = create(initial_data)
- replace(mutable_uri, new_data)
- data = get(mutable_uri)
-
-Filesystem Goals
-================
-
-The main goal for the middle (filesystem) layer is to give users a way to
-organize the data that they have uploaded into the grid. The traditional way
-to do this in computer filesystems is to put this data into files, give those
-files names, and collect these names into directories.
-
-Each directory is a set of name-entry pairs, each of which maps a "child name"
-to a directory entry pointing to an object of some kind. Those child objects
-might be files, or they might be other directories. Each directory entry also
-contains metadata.
-
-The directory structure is therefore a directed graph of nodes, in which each
-node might be a directory node or a file node. All file nodes are terminal
-nodes.
-
-Dirnode Goals
-=============
-
-What properties might be desirable for these directory nodes? In no
-particular order:
-
-1. functional. Code which does not work doesn't count.
-2. easy to document, explain, and understand
-3. confidential: it should not be possible for others to see the contents of
- a directory
-4. integrity: it should not be possible for others to modify the contents
- of a directory
-5. available: directories should survive host failure, just like files do
-6. efficient: in storage, communication bandwidth, number of round-trips
-7. easy to delegate individual directories in a flexible way
-8. updateness: everybody looking at a directory should see the same contents
-9. monotonicity: everybody looking at a directory should see the same
- sequence of updates
-
-Some of these goals are mutually exclusive. For example, availability and
-consistency are opposing, so it is not possible to achieve #5 and #8 at the
-same time. Moreover, it takes a more complex architecture to get close to the
-available-and-consistent ideal, so #2/#6 is in opposition to #5/#8.
-
-Tahoe-LAFS v0.7.0 introduced distributed mutable files, which use public-key
-cryptography for integrity, and erasure coding for availability. These
-achieve roughly the same properties as immutable CHK files, but their
-contents can be replaced without changing their identity. Dirnodes are then
-just a special way of interpreting the contents of a specific mutable file.
-Earlier releases used a "vdrive server": this server was abolished in the
-v0.7.0 release.
-
-For details of how mutable files work, please see "mutable.txt" in this
-directory.
-
-For releases since v0.7.0, we achieve most of our desired properties. The
-integrity and availability of dirnodes is equivalent to that of regular
-(immutable) files, with the exception that there are more simultaneous-update
-failure modes for mutable slots. Delegation is quite strong: you can give
-read-write or read-only access to any subtree, and the data format used for
-dirnodes is such that read-only access is transitive: i.e. if you grant Bob
-read-only access to a parent directory, then Bob will get read-only access
-(and *not* read-write access) to its children.
-
-Relative to the previous "vdrive-server" based scheme, the current
-distributed dirnode approach gives better availability, but cannot guarantee
-updateness quite as well, and requires far more network traffic for each
-retrieval and update. Mutable files are somewhat less available than
-immutable files, simply because of the increased number of combinations
-(shares of an immutable file are either present or not, whereas there are
-multiple versions of each mutable file, and you might have some shares of
-version 1 and other shares of version 2). In extreme cases of simultaneous
-update, mutable files might suffer from non-monotonicity.
-
-
-Dirnode secret values
-=====================
-
-As mentioned before, dirnodes are simply a special way to interpret the
-contents of a mutable file, so the secret keys and capability strings
-described in "mutable.txt" are all the same. Each dirnode contains an RSA
-public/private keypair, and the holder of the "write capability" will be able
-to retrieve the private key (as well as the AES encryption key used for the
-data itself). The holder of the "read capability" will be able to obtain the
-public key and the AES data key, but not the RSA private key needed to modify
-the data.
-
-The "write capability" for a dirnode grants read-write access to its
-contents. This is expressed on concrete form as the "dirnode write cap": a
-printable string which contains the necessary secrets to grant this access.
-Likewise, the "read capability" grants read-only access to a dirnode, and can
-be represented by a "dirnode read cap" string.
-
-For example,
-URI:DIR2:swdi8ge1s7qko45d3ckkyw1aac%3Aar8r5j99a4mezdojejmsfp4fj1zeky9gjigyrid4urxdimego68o
-is a write-capability URI, while
-URI:DIR2-RO:buxjqykt637u61nnmjg7s8zkny:ar8r5j99a4mezdojejmsfp4fj1zeky9gjigyrid4urxdimego68o
-is a read-capability URI, both for the same dirnode.
-
-
-Dirnode storage format
-======================
-
-Each dirnode is stored in a single mutable file, distributed in the Tahoe-LAFS
-grid. The contents of this file are a serialized list of netstrings, one per
-child. Each child is a list of four netstrings: (name, rocap, rwcap,
-metadata). (Remember that the contents of the mutable file are encrypted by
-the read-cap, so this section describes the plaintext contents of the mutable
-file, *after* it has been decrypted by the read-cap.)
-
-The name is simple a UTF-8 -encoded child name. The 'rocap' is a read-only
-capability URI to that child, either an immutable (CHK) file, a mutable file,
-or a directory. It is also possible to store 'unknown' URIs that are not
-recognized by the current version of Tahoe-LAFS. The 'rwcap' is a read-write
-capability URI for that child, encrypted with the dirnode's write-cap: this
-enables the "transitive readonlyness" property, described further below. The
-'metadata' is a JSON-encoded dictionary of type,value metadata pairs. Some
-metadata keys are pre-defined, the rest are left up to the application.
-
-Each rwcap is stored as IV + ciphertext + MAC. The IV is a 16-byte random
-value. The ciphertext is obtained by using AES in CTR mode on the rwcap URI
-string, using a key that is formed from a tagged hash of the IV and the
-dirnode's writekey. The MAC is written only for compatibility with older
-Tahoe-LAFS versions and is no longer verified.
-
-If Bob has read-only access to the 'bar' directory, and he adds it as a child
-to the 'foo' directory, then he will put the read-only cap for 'bar' in both
-the rwcap and rocap slots (encrypting the rwcap contents as described above).
-If he has full read-write access to 'bar', then he will put the read-write
-cap in the 'rwcap' slot, and the read-only cap in the 'rocap' slot. Since
-other users who have read-only access to 'foo' will be unable to decrypt its
-rwcap slot, this limits those users to read-only access to 'bar' as well,
-thus providing the transitive readonlyness that we desire.
-
-Dirnode sizes, mutable-file initial read sizes
-==============================================
-
-How big are dirnodes? When reading dirnode data out of mutable files, how
-large should our initial read be? If we guess exactly, we can read a dirnode
-in a single round-trip, and update one in two RTT. If we guess too high,
-we'll waste some amount of bandwidth. If we guess low, we need to make a
-second pass to get the data (or the encrypted privkey, for writes), which
-will cost us at least another RTT.
-
-Assuming child names are between 10 and 99 characters long, how long are the
-various pieces of a dirnode?
-
-::
-
- netstring(name) ~= 4+len(name)
- chk-cap = 97 (for 4-char filesizes)
- dir-rw-cap = 88
- dir-ro-cap = 91
- netstring(cap) = 4+len(cap)
- encrypted(cap) = 16+cap+32
- JSON({}) = 2
- JSON({ctime=float,mtime=float,'tahoe':{linkcrtime=float,linkmotime=float}}): 137
- netstring(metadata) = 4+137 = 141
-
-so a CHK entry is::
-
- 5+ 4+len(name) + 4+97 + 5+16+97+32 + 4+137
-
-And a 15-byte filename gives a 416-byte entry. When the entry points at a
-subdirectory instead of a file, the entry is a little bit smaller. So an
-empty directory uses 0 bytes, a directory with one child uses about 416
-bytes, a directory with two children uses about 832, etc.
-
-When the dirnode data is encoding using our default 3-of-10, that means we
-get 139ish bytes of data in each share per child.
-
-The pubkey, signature, and hashes form the first 935ish bytes of the
-container, then comes our data, then about 1216 bytes of encprivkey. So if we
-read the first::
-
- 1kB: we get 65bytes of dirnode data : only empty directories
- 2kB: 1065bytes: about 8
- 3kB: 2065bytes: about 15 entries, or 6 entries plus the encprivkey
- 4kB: 3065bytes: about 22 entries, or about 13 plus the encprivkey
-
-So we've written the code to do an initial read of 4kB from each share when
-we read the mutable file, which should give good performance (one RTT) for
-small directories.
-
-
-Design Goals, redux
-===================
-
-How well does this design meet the goals?
-
-1. functional: YES: the code works and has extensive unit tests
-2. documentable: YES: this document is the existence proof
-3. confidential: YES: see below
-4. integrity: MOSTLY: a coalition of storage servers can rollback individual
- mutable files, but not a single one. No server can
- substitute fake data as genuine.
-5. availability: YES: as long as 'k' storage servers are present and have
- the same version of the mutable file, the dirnode will
- be available.
-6. efficient: MOSTLY:
- network: single dirnode lookup is very efficient, since clients can
- fetch specific keys rather than being required to get or set
- the entire dirnode each time. Traversing many directories
- takes a lot of roundtrips, and these can't be collapsed with
- promise-pipelining because the intermediate values must only
- be visible to the client. Modifying many dirnodes at once
- (e.g. importing a large pre-existing directory tree) is pretty
- slow, since each graph edge must be created independently.
- storage: each child has a separate IV, which makes them larger than
- if all children were aggregated into a single encrypted string
-7. delegation: VERY: each dirnode is a completely independent object,
- to which clients can be granted separate read-write or
- read-only access
-8. updateness: VERY: with only a single point of access, and no caching,
- each client operation starts by fetching the current
- value, so there are no opportunities for staleness
-9. monotonicity: VERY: the single point of access also protects against
- retrograde motion
-
-
-
-Confidentiality leaks in the storage servers
---------------------------------------------
-
-Dirnode (and the mutable files upon which they are based) are very private
-against other clients: traffic between the client and the storage servers is
-protected by the Foolscap SSL connection, so they can observe very little.
-Storage index values are hashes of secrets and thus unguessable, and they are
-not made public, so other clients cannot snoop through encrypted dirnodes
-that they have not been told about.
-
-Storage servers can observe access patterns and see ciphertext, but they
-cannot see the plaintext (of child names, metadata, or URIs). If an attacker
-operates a significant number of storage servers, they can infer the shape of
-the directory structure by assuming that directories are usually accessed
-from root to leaf in rapid succession. Since filenames are usually much
-shorter than read-caps and write-caps, the attacker can use the length of the
-ciphertext to guess the number of children of each node, and might be able to
-guess the length of the child names (or at least their sum). From this, the
-attacker may be able to build up a graph with the same shape as the plaintext
-filesystem, but with unlabeled edges and unknown file contents.
-
-
-Integrity failures in the storage servers
------------------------------------------
-
-The mutable file's integrity mechanism (RSA signature on the hash of the file
-contents) prevents the storage server from modifying the dirnode's contents
-without detection. Therefore the storage servers can make the dirnode
-unavailable, but not corrupt it.
-
-A sufficient number of colluding storage servers can perform a rollback
-attack: replace all shares of the whole mutable file with an earlier version.
-To prevent this, when retrieving the contents of a mutable file, the
-client queries more servers than necessary and uses the highest available
-version number. This insures that one or two misbehaving storage servers
-cannot cause this rollback on their own.
-
-
-Improving the efficiency of dirnodes
-------------------------------------
-
-The current mutable-file -based dirnode scheme suffers from certain
-inefficiencies. A very large directory (with thousands or millions of
-children) will take a significant time to extract any single entry, because
-the whole file must be downloaded first, then parsed and searched to find the
-desired child entry. Likewise, modifying a single child will require the
-whole file to be re-uploaded.
-
-The current design assumes (and in some cases, requires) that dirnodes remain
-small. The mutable files on which dirnodes are based are currently using
-"SDMF" ("Small Distributed Mutable File") design rules, which state that the
-size of the data shall remain below one megabyte. More advanced forms of
-mutable files (MDMF and LDMF) are in the design phase to allow efficient
-manipulation of larger mutable files. This would reduce the work needed to
-modify a single entry in a large directory.
-
-Judicious caching may help improve the reading-large-directory case. Some
-form of mutable index at the beginning of the dirnode might help as well. The
-MDMF design rules allow for efficient random-access reads from the middle of
-the file, which would give the index something useful to point at.
-
-The current SDMF design generates a new RSA public/private keypair for each
-directory. This takes considerable time and CPU effort, generally one or two
-seconds per directory. We have designed (but not yet built) a DSA-based
-mutable file scheme which will use shared parameters to reduce the
-directory-creation effort to a bare minimum (picking a random number instead
-of generating two random primes).
-
-When a backup program is run for the first time, it needs to copy a large
-amount of data from a pre-existing filesystem into reliable storage. This
-means that a large and complex directory structure needs to be duplicated in
-the dirnode layer. With the one-object-per-dirnode approach described here,
-this requires as many operations as there are edges in the imported
-filesystem graph.
-
-Another approach would be to aggregate multiple directories into a single
-storage object. This object would contain a serialized graph rather than a
-single name-to-child dictionary. Most directory operations would fetch the
-whole block of data (and presumeably cache it for a while to avoid lots of
-re-fetches), and modification operations would need to replace the whole
-thing at once. This "realm" approach would have the added benefit of
-combining more data into a single encrypted bundle (perhaps hiding the shape
-of the graph from a determined attacker), and would reduce round-trips when
-performing deep directory traversals (assuming the realm was already cached).
-It would also prevent fine-grained rollback attacks from working: a coalition
-of storage servers could change the entire realm to look like an earlier
-state, but it could not independently roll back individual directories.
-
-The drawbacks of this aggregation would be that small accesses (adding a
-single child, looking up a single child) would require pulling or pushing a
-lot of unrelated data, increasing network overhead (and necessitating
-test-and-set semantics for the modification side, which increases the chances
-that a user operation will fail, making it more challenging to provide
-promises of atomicity to the user).
-
-It would also make it much more difficult to enable the delegation
-("sharing") of specific directories. Since each aggregate "realm" provides
-all-or-nothing access control, the act of delegating any directory from the
-middle of the realm would require the realm first be split into the upper
-piece that isn't being shared and the lower piece that is. This splitting
-would have to be done in response to what is essentially a read operation,
-which is not traditionally supposed to be a high-effort action. On the other
-hand, it may be possible to aggregate the ciphertext, but use distinct
-encryption keys for each component directory, to get the benefits of both
-schemes at once.
-
-
-Dirnode expiration and leases
------------------------------
-
-Dirnodes are created any time a client wishes to add a new directory. How
-long do they live? What's to keep them from sticking around forever, taking
-up space that nobody can reach any longer?
-
-Mutable files are created with limited-time "leases", which keep the shares
-alive until the last lease has expired or been cancelled. Clients which know
-and care about specific dirnodes can ask to keep them alive for a while, by
-renewing a lease on them (with a typical period of one month). Clients are
-expected to assist in the deletion of dirnodes by canceling their leases as
-soon as they are done with them. This means that when a client deletes a
-directory, it should also cancel its lease on that directory. When the lease
-count on a given share goes to zero, the storage server can delete the
-related storage. Multiple clients may all have leases on the same dirnode:
-the server may delete the shares only after all of the leases have gone away.
-
-We expect that clients will periodically create a "manifest": a list of
-so-called "refresh capabilities" for all of the dirnodes and files that they
-can reach. They will give this manifest to the "repairer", which is a service
-that keeps files (and dirnodes) alive on behalf of clients who cannot take on
-this responsibility for themselves. These refresh capabilities include the
-storage index, but do *not* include the readkeys or writekeys, so the
-repairer does not get to read the files or directories that it is helping to
-keep alive.
-
-After each change to the user's vdrive, the client creates a manifest and
-looks for differences from their previous version. Anything which was removed
-prompts the client to send out lease-cancellation messages, allowing the data
-to be deleted.
-
-
-Starting Points: root dirnodes
-==============================
-
-Any client can record the URI of a directory node in some external form (say,
-in a local file) and use it as the starting point of later traversal. Each
-Tahoe-LAFS user is expected to create a new (unattached) dirnode when they first
-start using the grid, and record its URI for later use.
-
-Mounting and Sharing Directories
-================================
-
-The biggest benefit of this dirnode approach is that sharing individual
-directories is almost trivial. Alice creates a subdirectory that she wants to
-use to share files with Bob. This subdirectory is attached to Alice's
-filesystem at "~alice/share-with-bob". She asks her filesystem for the
-read-write directory URI for that new directory, and emails it to Bob. When
-Bob receives the URI, he asks his own local vdrive to attach the given URI,
-perhaps at a place named "~bob/shared-with-alice". Every time either party
-writes a file into this directory, the other will be able to read it. If
-Alice prefers, she can give a read-only URI to Bob instead, and then Bob will
-be able to read files but not change the contents of the directory. Neither
-Alice nor Bob will get access to any files above the mounted directory: there
-are no 'parent directory' pointers. If Alice creates a nested set of
-directories, "~alice/share-with-bob/subdir2", and gives a read-only URI to
-share-with-bob to Bob, then Bob will be unable to write to either
-share-with-bob/ or subdir2/.
-
-A suitable UI needs to be created to allow users to easily perform this
-sharing action: dragging a folder their vdrive to an IM or email user icon,
-for example. The UI will need to give the sending user an opportunity to
-indicate whether they want to grant read-write or read-only access to the
-recipient. The recipient then needs an interface to drag the new folder into
-their vdrive and give it a home.
-
-Revocation
-==========
-
-When Alice decides that she no longer wants Bob to be able to access the
-shared directory, what should she do? Suppose she's shared this folder with
-both Bob and Carol, and now she wants Carol to retain access to it but Bob to
-be shut out. Ideally Carol should not have to do anything: her access should
-continue unabated.
-
-The current plan is to have her client create a deep copy of the folder in
-question, delegate access to the new folder to the remaining members of the
-group (Carol), asking the lucky survivors to replace their old reference with
-the new one. Bob may still have access to the old folder, but he is now the
-only one who cares: everyone else has moved on, and he will no longer be able
-to see their new changes. In a strict sense, this is the strongest form of
-revocation that can be accomplished: there is no point trying to force Bob to
-forget about the files that he read a moment before being kicked out. In
-addition it must be noted that anyone who can access the directory can proxy
-for Bob, reading files to him and accepting changes whenever he wants.
-Preventing delegation between communication parties is just as pointless as
-asking Bob to forget previously accessed files. However, there may be value
-to configuring the UI to ask Carol to not share files with Bob, or to
-removing all files from Bob's view at the same time his access is revoked.
-
--- /dev/null
+=============
+File Encoding
+=============
+
+When the client wishes to upload an immutable file, the first step is to
+decide upon an encryption key. There are two methods: convergent or random.
+The goal of the convergent-key method is to make sure that multiple uploads
+of the same file will result in only one copy on the grid, whereas the
+random-key method does not provide this "convergence" feature.
+
+The convergent-key method computes the SHA-256d hash of a single-purpose tag,
+the encoding parameters, a "convergence secret", and the contents of the
+file. It uses a portion of the resulting hash as the AES encryption key.
+There are security concerns with using convergence this approach (the
+"partial-information guessing attack", please see ticket #365 for some
+references), so Tahoe uses a separate (randomly-generated) "convergence
+secret" for each node, stored in NODEDIR/private/convergence . The encoding
+parameters (k, N, and the segment size) are included in the hash to make sure
+that two different encodings of the same file will get different keys. This
+method requires an extra IO pass over the file, to compute this key, and
+encryption cannot be started until the pass is complete. This means that the
+convergent-key method will require at least two total passes over the file.
+
+The random-key method simply chooses a random encryption key. Convergence is
+disabled, however this method does not require a separate IO pass, so upload
+can be done with a single pass. This mode makes it easier to perform
+streaming upload.
+
+Regardless of which method is used to generate the key, the plaintext file is
+encrypted (using AES in CTR mode) to produce a ciphertext. This ciphertext is
+then erasure-coded and uploaded to the servers. Two hashes of the ciphertext
+are generated as the encryption proceeds: a flat hash of the whole
+ciphertext, and a Merkle tree. These are used to verify the correctness of
+the erasure decoding step, and can be used by a "verifier" process to make
+sure the file is intact without requiring the decryption key.
+
+The encryption key is hashed (with SHA-256d and a single-purpose tag) to
+produce the "Storage Index". This Storage Index (or SI) is used to identify
+the shares produced by the method described below. The grid can be thought of
+as a large table that maps Storage Index to a ciphertext. Since the
+ciphertext is stored as erasure-coded shares, it can also be thought of as a
+table that maps SI to shares.
+
+Anybody who knows a Storage Index can retrieve the associated ciphertext:
+ciphertexts are not secret.
+
+.. image:: file-encoding1.svg
+
+The ciphertext file is then broken up into segments. The last segment is
+likely to be shorter than the rest. Each segment is erasure-coded into a
+number of "blocks". This takes place one segment at a time. (In fact,
+encryption and erasure-coding take place at the same time, once per plaintext
+segment). Larger segment sizes result in less overhead overall, but increase
+both the memory footprint and the "alacrity" (the number of bytes we have to
+receive before we can deliver validated plaintext to the user). The current
+default segment size is 128KiB.
+
+One block from each segment is sent to each shareholder (aka leaseholder,
+aka landlord, aka storage node, aka peer). The "share" held by each remote
+shareholder is nominally just a collection of these blocks. The file will
+be recoverable when a certain number of shares have been retrieved.
+
+.. image:: file-encoding2.svg
+
+The blocks are hashed as they are generated and transmitted. These
+block hashes are put into a Merkle hash tree. When the last share has been
+created, the merkle tree is completed and delivered to the peer. Later, when
+we retrieve these blocks, the peer will send many of the merkle hash tree
+nodes ahead of time, so we can validate each block independently.
+
+The root of this block hash tree is called the "block root hash" and
+used in the next step.
+
+.. image:: file-encoding3.svg
+
+There is a higher-level Merkle tree called the "share hash tree". Its leaves
+are the block root hashes from each share. The root of this tree is called
+the "share root hash" and is included in the "URI Extension Block", aka UEB.
+The ciphertext hash and Merkle tree are also put here, along with the
+original file size, and the encoding parameters. The UEB contains all the
+non-secret values that could be put in the URI, but would have made the URI
+too big. So instead, the UEB is stored with the share, and the hash of the
+UEB is put in the URI.
+
+The URI then contains the secret encryption key and the UEB hash. It also
+contains the basic encoding parameters (k and N) and the file size, to make
+download more efficient (by knowing the number of required shares ahead of
+time, sufficient download queries can be generated in parallel).
+
+The URI (also known as the immutable-file read-cap, since possessing it
+grants the holder the capability to read the file's plaintext) is then
+represented as a (relatively) short printable string like so::
+
+ URI:CHK:auxet66ynq55naiy2ay7cgrshm:6rudoctmbxsmbg7gwtjlimd6umtwrrsxkjzthuldsmo4nnfoc6fa:3:10:1000000
+
+.. image:: file-encoding4.svg
+
+During download, when a peer begins to transmit a share, it first transmits
+all of the parts of the share hash tree that are necessary to validate its
+block root hash. Then it transmits the portions of the block hash tree
+that are necessary to validate the first block. Then it transmits the
+first block. It then continues this loop: transmitting any portions of the
+block hash tree to validate block#N, then sending block#N.
+
+.. image:: file-encoding5.svg
+
+So the "share" that is sent to the remote peer actually consists of three
+pieces, sent in a specific order as they become available, and retrieved
+during download in a different order according to when they are needed.
+
+The first piece is the blocks themselves, one per segment. The last
+block will likely be shorter than the rest, because the last segment is
+probably shorter than the rest. The second piece is the block hash tree,
+consisting of a total of two SHA-1 hashes per block. The third piece is a
+hash chain from the share hash tree, consisting of log2(numshares) hashes.
+
+During upload, all blocks are sent first, followed by the block hash
+tree, followed by the share hash chain. During download, the share hash chain
+is delivered first, followed by the block root hash. The client then uses
+the hash chain to validate the block root hash. Then the peer delivers
+enough of the block hash tree to validate the first block, followed by
+the first block itself. The block hash chain is used to validate the
+block, then it is passed (along with the first block from several other
+peers) into decoding, to produce the first segment of crypttext, which is
+then decrypted to produce the first segment of plaintext, which is finally
+delivered to the user.
+
+.. image:: file-encoding6.svg
+
+Hashes
+======
+
+All hashes use SHA-256d, as defined in Practical Cryptography (by Ferguson
+and Schneier). All hashes use a single-purpose tag, e.g. the hash that
+converts an encryption key into a storage index is defined as follows::
+
+ SI = SHA256d(netstring("allmydata_immutable_key_to_storage_index_v1") + key)
+
+When two separate values need to be combined together in a hash, we wrap each
+in a netstring.
+
+Using SHA-256d (instead of plain SHA-256) guards against length-extension
+attacks. Using the tag protects our Merkle trees against attacks in which the
+hash of a leaf is confused with a hash of two children (allowing an attacker
+to generate corrupted data that nevertheless appears to be valid), and is
+simply good "cryptograhic hygiene". The `"Chosen Protocol Attack" by Kelsey,
+Schneier, and Wagner <http://www.schneier.com/paper-chosen-protocol.html>`_ is
+relevant. Putting the tag in a netstring guards against attacks that seek to
+confuse the end of the tag with the beginning of the subsequent value.
+
+++ /dev/null
-=============
-File Encoding
-=============
-
-When the client wishes to upload an immutable file, the first step is to
-decide upon an encryption key. There are two methods: convergent or random.
-The goal of the convergent-key method is to make sure that multiple uploads
-of the same file will result in only one copy on the grid, whereas the
-random-key method does not provide this "convergence" feature.
-
-The convergent-key method computes the SHA-256d hash of a single-purpose tag,
-the encoding parameters, a "convergence secret", and the contents of the
-file. It uses a portion of the resulting hash as the AES encryption key.
-There are security concerns with using convergence this approach (the
-"partial-information guessing attack", please see ticket #365 for some
-references), so Tahoe uses a separate (randomly-generated) "convergence
-secret" for each node, stored in NODEDIR/private/convergence . The encoding
-parameters (k, N, and the segment size) are included in the hash to make sure
-that two different encodings of the same file will get different keys. This
-method requires an extra IO pass over the file, to compute this key, and
-encryption cannot be started until the pass is complete. This means that the
-convergent-key method will require at least two total passes over the file.
-
-The random-key method simply chooses a random encryption key. Convergence is
-disabled, however this method does not require a separate IO pass, so upload
-can be done with a single pass. This mode makes it easier to perform
-streaming upload.
-
-Regardless of which method is used to generate the key, the plaintext file is
-encrypted (using AES in CTR mode) to produce a ciphertext. This ciphertext is
-then erasure-coded and uploaded to the servers. Two hashes of the ciphertext
-are generated as the encryption proceeds: a flat hash of the whole
-ciphertext, and a Merkle tree. These are used to verify the correctness of
-the erasure decoding step, and can be used by a "verifier" process to make
-sure the file is intact without requiring the decryption key.
-
-The encryption key is hashed (with SHA-256d and a single-purpose tag) to
-produce the "Storage Index". This Storage Index (or SI) is used to identify
-the shares produced by the method described below. The grid can be thought of
-as a large table that maps Storage Index to a ciphertext. Since the
-ciphertext is stored as erasure-coded shares, it can also be thought of as a
-table that maps SI to shares.
-
-Anybody who knows a Storage Index can retrieve the associated ciphertext:
-ciphertexts are not secret.
-
-.. image:: file-encoding1.svg
-
-The ciphertext file is then broken up into segments. The last segment is
-likely to be shorter than the rest. Each segment is erasure-coded into a
-number of "blocks". This takes place one segment at a time. (In fact,
-encryption and erasure-coding take place at the same time, once per plaintext
-segment). Larger segment sizes result in less overhead overall, but increase
-both the memory footprint and the "alacrity" (the number of bytes we have to
-receive before we can deliver validated plaintext to the user). The current
-default segment size is 128KiB.
-
-One block from each segment is sent to each shareholder (aka leaseholder,
-aka landlord, aka storage node, aka peer). The "share" held by each remote
-shareholder is nominally just a collection of these blocks. The file will
-be recoverable when a certain number of shares have been retrieved.
-
-.. image:: file-encoding2.svg
-
-The blocks are hashed as they are generated and transmitted. These
-block hashes are put into a Merkle hash tree. When the last share has been
-created, the merkle tree is completed and delivered to the peer. Later, when
-we retrieve these blocks, the peer will send many of the merkle hash tree
-nodes ahead of time, so we can validate each block independently.
-
-The root of this block hash tree is called the "block root hash" and
-used in the next step.
-
-.. image:: file-encoding3.svg
-
-There is a higher-level Merkle tree called the "share hash tree". Its leaves
-are the block root hashes from each share. The root of this tree is called
-the "share root hash" and is included in the "URI Extension Block", aka UEB.
-The ciphertext hash and Merkle tree are also put here, along with the
-original file size, and the encoding parameters. The UEB contains all the
-non-secret values that could be put in the URI, but would have made the URI
-too big. So instead, the UEB is stored with the share, and the hash of the
-UEB is put in the URI.
-
-The URI then contains the secret encryption key and the UEB hash. It also
-contains the basic encoding parameters (k and N) and the file size, to make
-download more efficient (by knowing the number of required shares ahead of
-time, sufficient download queries can be generated in parallel).
-
-The URI (also known as the immutable-file read-cap, since possessing it
-grants the holder the capability to read the file's plaintext) is then
-represented as a (relatively) short printable string like so::
-
- URI:CHK:auxet66ynq55naiy2ay7cgrshm:6rudoctmbxsmbg7gwtjlimd6umtwrrsxkjzthuldsmo4nnfoc6fa:3:10:1000000
-
-.. image:: file-encoding4.svg
-
-During download, when a peer begins to transmit a share, it first transmits
-all of the parts of the share hash tree that are necessary to validate its
-block root hash. Then it transmits the portions of the block hash tree
-that are necessary to validate the first block. Then it transmits the
-first block. It then continues this loop: transmitting any portions of the
-block hash tree to validate block#N, then sending block#N.
-
-.. image:: file-encoding5.svg
-
-So the "share" that is sent to the remote peer actually consists of three
-pieces, sent in a specific order as they become available, and retrieved
-during download in a different order according to when they are needed.
-
-The first piece is the blocks themselves, one per segment. The last
-block will likely be shorter than the rest, because the last segment is
-probably shorter than the rest. The second piece is the block hash tree,
-consisting of a total of two SHA-1 hashes per block. The third piece is a
-hash chain from the share hash tree, consisting of log2(numshares) hashes.
-
-During upload, all blocks are sent first, followed by the block hash
-tree, followed by the share hash chain. During download, the share hash chain
-is delivered first, followed by the block root hash. The client then uses
-the hash chain to validate the block root hash. Then the peer delivers
-enough of the block hash tree to validate the first block, followed by
-the first block itself. The block hash chain is used to validate the
-block, then it is passed (along with the first block from several other
-peers) into decoding, to produce the first segment of crypttext, which is
-then decrypted to produce the first segment of plaintext, which is finally
-delivered to the user.
-
-.. image:: file-encoding6.svg
-
-Hashes
-======
-
-All hashes use SHA-256d, as defined in Practical Cryptography (by Ferguson
-and Schneier). All hashes use a single-purpose tag, e.g. the hash that
-converts an encryption key into a storage index is defined as follows::
-
- SI = SHA256d(netstring("allmydata_immutable_key_to_storage_index_v1") + key)
-
-When two separate values need to be combined together in a hash, we wrap each
-in a netstring.
-
-Using SHA-256d (instead of plain SHA-256) guards against length-extension
-attacks. Using the tag protects our Merkle trees against attacks in which the
-hash of a leaf is confused with a hash of two children (allowing an attacker
-to generate corrupted data that nevertheless appears to be valid), and is
-simply good "cryptograhic hygiene". The `"Chosen Protocol Attack" by Kelsey,
-Schneier, and Wagner <http://www.schneier.com/paper-chosen-protocol.html>`_ is
-relevant. Putting the tag in a netstring guards against attacks that seek to
-confuse the end of the tag with the beginning of the subsequent value.
-
--- /dev/null
+=============
+Mutable Files
+=============
+
+This describes the "RSA-based mutable files" which were shipped in Tahoe v0.8.0.
+
+1. `Consistency vs. Availability`_
+2. `The Prime Coordination Directive: "Don't Do That"`_
+3. `Small Distributed Mutable Files`_
+
+ 1. `SDMF slots overview`_
+ 2. `Server Storage Protocol`_
+ 3. `Code Details`_
+ 4. `SMDF Slot Format`_
+ 5. `Recovery`_
+
+4. `Medium Distributed Mutable Files`_
+5. `Large Distributed Mutable Files`_
+6. `TODO`_
+
+Mutable File Slots are places with a stable identifier that can hold data
+that changes over time. In contrast to CHK slots, for which the
+URI/identifier is derived from the contents themselves, the Mutable File Slot
+URI remains fixed for the life of the slot, regardless of what data is placed
+inside it.
+
+Each mutable slot is referenced by two different URIs. The "read-write" URI
+grants read-write access to its holder, allowing them to put whatever
+contents they like into the slot. The "read-only" URI is less powerful, only
+granting read access, and not enabling modification of the data. The
+read-write URI can be turned into the read-only URI, but not the other way
+around.
+
+The data in these slots is distributed over a number of servers, using the
+same erasure coding that CHK files use, with 3-of-10 being a typical choice
+of encoding parameters. The data is encrypted and signed in such a way that
+only the holders of the read-write URI will be able to set the contents of
+the slot, and only the holders of the read-only URI will be able to read
+those contents. Holders of either URI will be able to validate the contents
+as being written by someone with the read-write URI. The servers who hold the
+shares cannot read or modify them: the worst they can do is deny service (by
+deleting or corrupting the shares), or attempt a rollback attack (which can
+only succeed with the cooperation of at least k servers).
+
+Consistency vs. Availability
+============================
+
+There is an age-old battle between consistency and availability. Epic papers
+have been written, elaborate proofs have been established, and generations of
+theorists have learned that you cannot simultaneously achieve guaranteed
+consistency with guaranteed reliability. In addition, the closer to 0 you get
+on either axis, the cost and complexity of the design goes up.
+
+Tahoe's design goals are to largely favor design simplicity, then slightly
+favor read availability, over the other criteria.
+
+As we develop more sophisticated mutable slots, the API may expose multiple
+read versions to the application layer. The tahoe philosophy is to defer most
+consistency recovery logic to the higher layers. Some applications have
+effective ways to merge multiple versions, so inconsistency is not
+necessarily a problem (i.e. directory nodes can usually merge multiple "add
+child" operations).
+
+The Prime Coordination Directive: "Don't Do That"
+=================================================
+
+The current rule for applications which run on top of Tahoe is "do not
+perform simultaneous uncoordinated writes". That means you need non-tahoe
+means to make sure that two parties are not trying to modify the same mutable
+slot at the same time. For example:
+
+* don't give the read-write URI to anyone else. Dirnodes in a private
+ directory generally satisfy this case, as long as you don't use two
+ clients on the same account at the same time
+* if you give a read-write URI to someone else, stop using it yourself. An
+ inbox would be a good example of this.
+* if you give a read-write URI to someone else, call them on the phone
+ before you write into it
+* build an automated mechanism to have your agents coordinate writes.
+ For example, we expect a future release to include a FURL for a
+ "coordination server" in the dirnodes. The rule can be that you must
+ contact the coordination server and obtain a lock/lease on the file
+ before you're allowed to modify it.
+
+If you do not follow this rule, Bad Things will happen. The worst-case Bad
+Thing is that the entire file will be lost. A less-bad Bad Thing is that one
+or more of the simultaneous writers will lose their changes. An observer of
+the file may not see monotonically-increasing changes to the file, i.e. they
+may see version 1, then version 2, then 3, then 2 again.
+
+Tahoe takes some amount of care to reduce the badness of these Bad Things.
+One way you can help nudge it from the "lose your file" case into the "lose
+some changes" case is to reduce the number of competing versions: multiple
+versions of the file that different parties are trying to establish as the
+one true current contents. Each simultaneous writer counts as a "competing
+version", as does the previous version of the file. If the count "S" of these
+competing versions is larger than N/k, then the file runs the risk of being
+lost completely. [TODO] If at least one of the writers remains running after
+the collision is detected, it will attempt to recover, but if S>(N/k) and all
+writers crash after writing a few shares, the file will be lost.
+
+Note that Tahoe uses serialization internally to make sure that a single
+Tahoe node will not perform simultaneous modifications to a mutable file. It
+accomplishes this by using a weakref cache of the MutableFileNode (so that
+there will never be two distinct MutableFileNodes for the same file), and by
+forcing all mutable file operations to obtain a per-node lock before they
+run. The Prime Coordination Directive therefore applies to inter-node
+conflicts, not intra-node ones.
+
+
+Small Distributed Mutable Files
+===============================
+
+SDMF slots are suitable for small (<1MB) files that are editing by rewriting
+the entire file. The three operations are:
+
+ * allocate (with initial contents)
+ * set (with new contents)
+ * get (old contents)
+
+The first use of SDMF slots will be to hold directories (dirnodes), which map
+encrypted child names to rw-URI/ro-URI pairs.
+
+SDMF slots overview
+-------------------
+
+Each SDMF slot is created with a public/private key pair. The public key is
+known as the "verification key", while the private key is called the
+"signature key". The private key is hashed and truncated to 16 bytes to form
+the "write key" (an AES symmetric key). The write key is then hashed and
+truncated to form the "read key". The read key is hashed and truncated to
+form the 16-byte "storage index" (a unique string used as an index to locate
+stored data).
+
+The public key is hashed by itself to form the "verification key hash".
+
+The write key is hashed a different way to form the "write enabler master".
+For each storage server on which a share is kept, the write enabler master is
+concatenated with the server's nodeid and hashed, and the result is called
+the "write enabler" for that particular server. Note that multiple shares of
+the same slot stored on the same server will all get the same write enabler,
+i.e. the write enabler is associated with the "bucket", rather than the
+individual shares.
+
+The private key is encrypted (using AES in counter mode) by the write key,
+and the resulting crypttext is stored on the servers. so it will be
+retrievable by anyone who knows the write key. The write key is not used to
+encrypt anything else, and the private key never changes, so we do not need
+an IV for this purpose.
+
+The actual data is encrypted (using AES in counter mode) with a key derived
+by concatenating the readkey with the IV, the hashing the results and
+truncating to 16 bytes. The IV is randomly generated each time the slot is
+updated, and stored next to the encrypted data.
+
+The read-write URI consists of the write key and the verification key hash.
+The read-only URI contains the read key and the verification key hash. The
+verify-only URI contains the storage index and the verification key hash.
+
+::
+
+ URI:SSK-RW:b2a(writekey):b2a(verification_key_hash)
+ URI:SSK-RO:b2a(readkey):b2a(verification_key_hash)
+ URI:SSK-Verify:b2a(storage_index):b2a(verification_key_hash)
+
+Note that this allows the read-only and verify-only URIs to be derived from
+the read-write URI without actually retrieving the public keys. Also note
+that it means the read-write agent must validate both the private key and the
+public key when they are first fetched. All users validate the public key in
+exactly the same way.
+
+The SDMF slot is allocated by sending a request to the storage server with a
+desired size, the storage index, and the write enabler for that server's
+nodeid. If granted, the write enabler is stashed inside the slot's backing
+store file. All further write requests must be accompanied by the write
+enabler or they will not be honored. The storage server does not share the
+write enabler with anyone else.
+
+The SDMF slot structure will be described in more detail below. The important
+pieces are:
+
+* a sequence number
+* a root hash "R"
+* the encoding parameters (including k, N, file size, segment size)
+* a signed copy of [seqnum,R,encoding_params], using the signature key
+* the verification key (not encrypted)
+* the share hash chain (part of a Merkle tree over the share hashes)
+* the block hash tree (Merkle tree over blocks of share data)
+* the share data itself (erasure-coding of read-key-encrypted file data)
+* the signature key, encrypted with the write key
+
+The access pattern for read is:
+
+* hash read-key to get storage index
+* use storage index to locate 'k' shares with identical 'R' values
+
+ * either get one share, read 'k' from it, then read k-1 shares
+ * or read, say, 5 shares, discover k, either get more or be finished
+ * or copy k into the URIs
+
+* read verification key
+* hash verification key, compare against verification key hash
+* read seqnum, R, encoding parameters, signature
+* verify signature against verification key
+* read share data, compute block-hash Merkle tree and root "r"
+* read share hash chain (leading from "r" to "R")
+* validate share hash chain up to the root "R"
+* submit share data to erasure decoding
+* decrypt decoded data with read-key
+* submit plaintext to application
+
+The access pattern for write is:
+
+* hash write-key to get read-key, hash read-key to get storage index
+* use the storage index to locate at least one share
+* read verification key and encrypted signature key
+* decrypt signature key using write-key
+* hash signature key, compare against write-key
+* hash verification key, compare against verification key hash
+* encrypt plaintext from application with read-key
+
+ * application can encrypt some data with the write-key to make it only
+ available to writers (use this for transitive read-onlyness of dirnodes)
+
+* erasure-code crypttext to form shares
+* split shares into blocks
+* compute Merkle tree of blocks, giving root "r" for each share
+* compute Merkle tree of shares, find root "R" for the file as a whole
+* create share data structures, one per server:
+
+ * use seqnum which is one higher than the old version
+ * share hash chain has log(N) hashes, different for each server
+ * signed data is the same for each server
+
+* now we have N shares and need homes for them
+* walk through peers
+
+ * if share is not already present, allocate-and-set
+ * otherwise, try to modify existing share:
+ * send testv_and_writev operation to each one
+ * testv says to accept share if their(seqnum+R) <= our(seqnum+R)
+ * count how many servers wind up with which versions (histogram over R)
+ * keep going until N servers have the same version, or we run out of servers
+
+ * if any servers wound up with a different version, report error to
+ application
+ * if we ran out of servers, initiate recovery process (described below)
+
+Server Storage Protocol
+-----------------------
+
+The storage servers will provide a mutable slot container which is oblivious
+to the details of the data being contained inside it. Each storage index
+refers to a "bucket", and each bucket has one or more shares inside it. (In a
+well-provisioned network, each bucket will have only one share). The bucket
+is stored as a directory, using the base32-encoded storage index as the
+directory name. Each share is stored in a single file, using the share number
+as the filename.
+
+The container holds space for a container magic number (for versioning), the
+write enabler, the nodeid which accepted the write enabler (used for share
+migration, described below), a small number of lease structures, the embedded
+data itself, and expansion space for additional lease structures::
+
+ # offset size name
+ 1 0 32 magic verstr "tahoe mutable container v1" plus binary
+ 2 32 20 write enabler's nodeid
+ 3 52 32 write enabler
+ 4 84 8 data size (actual share data present) (a)
+ 5 92 8 offset of (8) count of extra leases (after data)
+ 6 100 368 four leases, 92 bytes each
+ 0 4 ownerid (0 means "no lease here")
+ 4 4 expiration timestamp
+ 8 32 renewal token
+ 40 32 cancel token
+ 72 20 nodeid which accepted the tokens
+ 7 468 (a) data
+ 8 ?? 4 count of extra leases
+ 9 ?? n*92 extra leases
+
+The "extra leases" field must be copied and rewritten each time the size of
+the enclosed data changes. The hope is that most buckets will have four or
+fewer leases and this extra copying will not usually be necessary.
+
+The (4) "data size" field contains the actual number of bytes of data present
+in field (7), such that a client request to read beyond 504+(a) will result
+in an error. This allows the client to (one day) read relative to the end of
+the file. The container size (that is, (8)-(7)) might be larger, especially
+if extra size was pre-allocated in anticipation of filling the container with
+a lot of data.
+
+The offset in (5) points at the *count* of extra leases, at (8). The actual
+leases (at (9)) begin 4 bytes later. If the container size changes, both (8)
+and (9) must be relocated by copying.
+
+The server will honor any write commands that provide the write token and do
+not exceed the server-wide storage size limitations. Read and write commands
+MUST be restricted to the 'data' portion of the container: the implementation
+of those commands MUST perform correct bounds-checking to make sure other
+portions of the container are inaccessible to the clients.
+
+The two methods provided by the storage server on these "MutableSlot" share
+objects are:
+
+* readv(ListOf(offset=int, length=int))
+
+ * returns a list of bytestrings, of the various requested lengths
+ * offset < 0 is interpreted relative to the end of the data
+ * spans which hit the end of the data will return truncated data
+
+* testv_and_writev(write_enabler, test_vector, write_vector)
+
+ * this is a test-and-set operation which performs the given tests and only
+ applies the desired writes if all tests succeed. This is used to detect
+ simultaneous writers, and to reduce the chance that an update will lose
+ data recently written by some other party (written after the last time
+ this slot was read).
+ * test_vector=ListOf(TupleOf(offset, length, opcode, specimen))
+ * the opcode is a string, from the set [gt, ge, eq, le, lt, ne]
+ * each element of the test vector is read from the slot's data and
+ compared against the specimen using the desired (in)equality. If all
+ tests evaluate True, the write is performed
+ * write_vector=ListOf(TupleOf(offset, newdata))
+
+ * offset < 0 is not yet defined, it probably means relative to the
+ end of the data, which probably means append, but we haven't nailed
+ it down quite yet
+ * write vectors are executed in order, which specifies the results of
+ overlapping writes
+
+ * return value:
+
+ * error: OutOfSpace
+ * error: something else (io error, out of memory, whatever)
+ * (True, old_test_data): the write was accepted (test_vector passed)
+ * (False, old_test_data): the write was rejected (test_vector failed)
+
+ * both 'accepted' and 'rejected' return the old data that was used
+ for the test_vector comparison. This can be used by the client
+ to detect write collisions, including collisions for which the
+ desired behavior was to overwrite the old version.
+
+In addition, the storage server provides several methods to access these
+share objects:
+
+* allocate_mutable_slot(storage_index, sharenums=SetOf(int))
+
+ * returns DictOf(int, MutableSlot)
+
+* get_mutable_slot(storage_index)
+
+ * returns DictOf(int, MutableSlot)
+ * or raises KeyError
+
+We intend to add an interface which allows small slots to allocate-and-write
+in a single call, as well as do update or read in a single call. The goal is
+to allow a reasonably-sized dirnode to be created (or updated, or read) in
+just one round trip (to all N shareholders in parallel).
+
+migrating shares
+````````````````
+
+If a share must be migrated from one server to another, two values become
+invalid: the write enabler (since it was computed for the old server), and
+the lease renew/cancel tokens.
+
+Suppose that a slot was first created on nodeA, and was thus initialized with
+WE(nodeA) (= H(WEM+nodeA)). Later, for provisioning reasons, the share is
+moved from nodeA to nodeB.
+
+Readers may still be able to find the share in its new home, depending upon
+how many servers are present in the grid, where the new nodeid lands in the
+permuted index for this particular storage index, and how many servers the
+reading client is willing to contact.
+
+When a client attempts to write to this migrated share, it will get a "bad
+write enabler" error, since the WE it computes for nodeB will not match the
+WE(nodeA) that was embedded in the share. When this occurs, the "bad write
+enabler" message must include the old nodeid (e.g. nodeA) that was in the
+share.
+
+The client then computes H(nodeB+H(WEM+nodeA)), which is the same as
+H(nodeB+WE(nodeA)). The client sends this along with the new WE(nodeB), which
+is H(WEM+nodeB). Note that the client only sends WE(nodeB) to nodeB, never to
+anyone else. Also note that the client does not send a value to nodeB that
+would allow the node to impersonate the client to a third node: everything
+sent to nodeB will include something specific to nodeB in it.
+
+The server locally computes H(nodeB+WE(nodeA)), using its own node id and the
+old write enabler from the share. It compares this against the value supplied
+by the client. If they match, this serves as proof that the client was able
+to compute the old write enabler. The server then accepts the client's new
+WE(nodeB) and writes it into the container.
+
+This WE-fixup process requires an extra round trip, and requires the error
+message to include the old nodeid, but does not require any public key
+operations on either client or server.
+
+Migrating the leases will require a similar protocol. This protocol will be
+defined concretely at a later date.
+
+Code Details
+------------
+
+The MutableFileNode class is used to manipulate mutable files (as opposed to
+ImmutableFileNodes). These are initially generated with
+client.create_mutable_file(), and later recreated from URIs with
+client.create_node_from_uri(). Instances of this class will contain a URI and
+a reference to the client (for peer selection and connection).
+
+NOTE: this section is out of date. Please see src/allmydata/interfaces.py
+(the section on IMutableFilesystemNode) for more accurate information.
+
+The methods of MutableFileNode are:
+
+* download_to_data() -> [deferred] newdata, NotEnoughSharesError
+
+ * if there are multiple retrieveable versions in the grid, get() returns
+ the first version it can reconstruct, and silently ignores the others.
+ In the future, a more advanced API will signal and provide access to
+ the multiple heads.
+
+* update(newdata) -> OK, UncoordinatedWriteError, NotEnoughSharesError
+* overwrite(newdata) -> OK, UncoordinatedWriteError, NotEnoughSharesError
+
+download_to_data() causes a new retrieval to occur, pulling the current
+contents from the grid and returning them to the caller. At the same time,
+this call caches information about the current version of the file. This
+information will be used in a subsequent call to update(), and if another
+change has occured between the two, this information will be out of date,
+triggering the UncoordinatedWriteError.
+
+update() is therefore intended to be used just after a download_to_data(), in
+the following pattern::
+
+ d = mfn.download_to_data()
+ d.addCallback(apply_delta)
+ d.addCallback(mfn.update)
+
+If the update() call raises UCW, then the application can simply return an
+error to the user ("you violated the Prime Coordination Directive"), and they
+can try again later. Alternatively, the application can attempt to retry on
+its own. To accomplish this, the app needs to pause, download the new
+(post-collision and post-recovery) form of the file, reapply their delta,
+then submit the update request again. A randomized pause is necessary to
+reduce the chances of colliding a second time with another client that is
+doing exactly the same thing::
+
+ d = mfn.download_to_data()
+ d.addCallback(apply_delta)
+ d.addCallback(mfn.update)
+ def _retry(f):
+ f.trap(UncoordinatedWriteError)
+ d1 = pause(random.uniform(5, 20))
+ d1.addCallback(lambda res: mfn.download_to_data())
+ d1.addCallback(apply_delta)
+ d1.addCallback(mfn.update)
+ return d1
+ d.addErrback(_retry)
+
+Enthusiastic applications can retry multiple times, using a randomized
+exponential backoff between each. A particularly enthusiastic application can
+retry forever, but such apps are encouraged to provide a means to the user of
+giving up after a while.
+
+UCW does not mean that the update was not applied, so it is also a good idea
+to skip the retry-update step if the delta was already applied::
+
+ d = mfn.download_to_data()
+ d.addCallback(apply_delta)
+ d.addCallback(mfn.update)
+ def _retry(f):
+ f.trap(UncoordinatedWriteError)
+ d1 = pause(random.uniform(5, 20))
+ d1.addCallback(lambda res: mfn.download_to_data())
+ def _maybe_apply_delta(contents):
+ new_contents = apply_delta(contents)
+ if new_contents != contents:
+ return mfn.update(new_contents)
+ d1.addCallback(_maybe_apply_delta)
+ return d1
+ d.addErrback(_retry)
+
+update() is the right interface to use for delta-application situations, like
+directory nodes (in which apply_delta might be adding or removing child
+entries from a serialized table).
+
+Note that any uncoordinated write has the potential to lose data. We must do
+more analysis to be sure, but it appears that two clients who write to the
+same mutable file at the same time (even if both eventually retry) will, with
+high probability, result in one client observing UCW and the other silently
+losing their changes. It is also possible for both clients to observe UCW.
+The moral of the story is that the Prime Coordination Directive is there for
+a reason, and that recovery/UCW/retry is not a subsitute for write
+coordination.
+
+overwrite() tells the client to ignore this cached version information, and
+to unconditionally replace the mutable file's contents with the new data.
+This should not be used in delta application, but rather in situations where
+you want to replace the file's contents with completely unrelated ones. When
+raw files are uploaded into a mutable slot through the tahoe webapi (using
+POST and the ?mutable=true argument), they are put in place with overwrite().
+
+The peer-selection and data-structure manipulation (and signing/verification)
+steps will be implemented in a separate class in allmydata/mutable.py .
+
+SMDF Slot Format
+----------------
+
+This SMDF data lives inside a server-side MutableSlot container. The server
+is oblivious to this format.
+
+This data is tightly packed. In particular, the share data is defined to run
+all the way to the beginning of the encrypted private key (the encprivkey
+offset is used both to terminate the share data and to begin the encprivkey).
+
+::
+
+ # offset size name
+ 1 0 1 version byte, \x00 for this format
+ 2 1 8 sequence number. 2^64-1 must be handled specially, TBD
+ 3 9 32 "R" (root of share hash Merkle tree)
+ 4 41 16 IV (share data is AES(H(readkey+IV)) )
+ 5 57 18 encoding parameters:
+ 57 1 k
+ 58 1 N
+ 59 8 segment size
+ 67 8 data length (of original plaintext)
+ 6 75 32 offset table:
+ 75 4 (8) signature
+ 79 4 (9) share hash chain
+ 83 4 (10) block hash tree
+ 87 4 (11) share data
+ 91 8 (12) encrypted private key
+ 99 8 (13) EOF
+ 7 107 436ish verification key (2048 RSA key)
+ 8 543ish 256ish signature=RSAenc(sigkey, H(version+seqnum+r+IV+encparm))
+ 9 799ish (a) share hash chain, encoded as:
+ "".join([pack(">H32s", shnum, hash)
+ for (shnum,hash) in needed_hashes])
+ 10 (927ish) (b) block hash tree, encoded as:
+ "".join([pack(">32s",hash) for hash in block_hash_tree])
+ 11 (935ish) LEN share data (no gap between this and encprivkey)
+ 12 ?? 1216ish encrypted private key= AESenc(write-key, RSA-key)
+ 13 ?? -- EOF
+
+ (a) The share hash chain contains ceil(log(N)) hashes, each 32 bytes long.
+ This is the set of hashes necessary to validate this share's leaf in the
+ share Merkle tree. For N=10, this is 4 hashes, i.e. 128 bytes.
+ (b) The block hash tree contains ceil(length/segsize) hashes, each 32 bytes
+ long. This is the set of hashes necessary to validate any given block of
+ share data up to the per-share root "r". Each "r" is a leaf of the share
+ has tree (with root "R"), from which a minimal subset of hashes is put in
+ the share hash chain in (8).
+
+Recovery
+--------
+
+The first line of defense against damage caused by colliding writes is the
+Prime Coordination Directive: "Don't Do That".
+
+The second line of defense is to keep "S" (the number of competing versions)
+lower than N/k. If this holds true, at least one competing version will have
+k shares and thus be recoverable. Note that server unavailability counts
+against us here: the old version stored on the unavailable server must be
+included in the value of S.
+
+The third line of defense is our use of testv_and_writev() (described below),
+which increases the convergence of simultaneous writes: one of the writers
+will be favored (the one with the highest "R"), and that version is more
+likely to be accepted than the others. This defense is least effective in the
+pathological situation where S simultaneous writers are active, the one with
+the lowest "R" writes to N-k+1 of the shares and then dies, then the one with
+the next-lowest "R" writes to N-2k+1 of the shares and dies, etc, until the
+one with the highest "R" writes to k-1 shares and dies. Any other sequencing
+will allow the highest "R" to write to at least k shares and establish a new
+revision.
+
+The fourth line of defense is the fact that each client keeps writing until
+at least one version has N shares. This uses additional servers, if
+necessary, to make sure that either the client's version or some
+newer/overriding version is highly available.
+
+The fifth line of defense is the recovery algorithm, which seeks to make sure
+that at least *one* version is highly available, even if that version is
+somebody else's.
+
+The write-shares-to-peers algorithm is as follows:
+
+* permute peers according to storage index
+* walk through peers, trying to assign one share per peer
+* for each peer:
+
+ * send testv_and_writev, using "old(seqnum+R) <= our(seqnum+R)" as the test
+
+ * this means that we will overwrite any old versions, and we will
+ overwrite simultaenous writers of the same version if our R is higher.
+ We will not overwrite writers using a higher seqnum.
+
+ * record the version that each share winds up with. If the write was
+ accepted, this is our own version. If it was rejected, read the
+ old_test_data to find out what version was retained.
+ * if old_test_data indicates the seqnum was equal or greater than our
+ own, mark the "Simultanous Writes Detected" flag, which will eventually
+ result in an error being reported to the writer (in their close() call).
+ * build a histogram of "R" values
+ * repeat until the histogram indicate that some version (possibly ours)
+ has N shares. Use new servers if necessary.
+ * If we run out of servers:
+
+ * if there are at least shares-of-happiness of any one version, we're
+ happy, so return. (the close() might still get an error)
+ * not happy, need to reinforce something, goto RECOVERY
+
+Recovery:
+
+* read all shares, count the versions, identify the recoverable ones,
+ discard the unrecoverable ones.
+* sort versions: locate max(seqnums), put all versions with that seqnum
+ in the list, sort by number of outstanding shares. Then put our own
+ version. (TODO: put versions with seqnum <max but >us ahead of us?).
+* for each version:
+
+ * attempt to recover that version
+ * if not possible, remove it from the list, go to next one
+ * if recovered, start at beginning of peer list, push that version,
+ continue until N shares are placed
+ * if pushing our own version, bump up the seqnum to one higher than
+ the max seqnum we saw
+ * if we run out of servers:
+
+ * schedule retry and exponential backoff to repeat RECOVERY
+
+ * admit defeat after some period? presumeably the client will be shut down
+ eventually, maybe keep trying (once per hour?) until then.
+
+
+Medium Distributed Mutable Files
+================================
+
+These are just like the SDMF case, but:
+
+* we actually take advantage of the Merkle hash tree over the blocks, by
+ reading a single segment of data at a time (and its necessary hashes), to
+ reduce the read-time alacrity
+* we allow arbitrary writes to the file (i.e. seek() is provided, and
+ O_TRUNC is no longer required)
+* we write more code on the client side (in the MutableFileNode class), to
+ first read each segment that a write must modify. This looks exactly like
+ the way a normal filesystem uses a block device, or how a CPU must perform
+ a cache-line fill before modifying a single word.
+* we might implement some sort of copy-based atomic update server call,
+ to allow multiple writev() calls to appear atomic to any readers.
+
+MDMF slots provide fairly efficient in-place edits of very large files (a few
+GB). Appending data is also fairly efficient, although each time a power of 2
+boundary is crossed, the entire file must effectively be re-uploaded (because
+the size of the block hash tree changes), so if the filesize is known in
+advance, that space ought to be pre-allocated (by leaving extra space between
+the block hash tree and the actual data).
+
+MDMF1 uses the Merkle tree to enable low-alacrity random-access reads. MDMF2
+adds cache-line reads to allow random-access writes.
+
+Large Distributed Mutable Files
+===============================
+
+LDMF slots use a fundamentally different way to store the file, inspired by
+Mercurial's "revlog" format. They enable very efficient insert/remove/replace
+editing of arbitrary spans. Multiple versions of the file can be retained, in
+a revision graph that can have multiple heads. Each revision can be
+referenced by a cryptographic identifier. There are two forms of the URI, one
+that means "most recent version", and a longer one that points to a specific
+revision.
+
+Metadata can be attached to the revisions, like timestamps, to enable rolling
+back an entire tree to a specific point in history.
+
+LDMF1 provides deltas but tries to avoid dealing with multiple heads. LDMF2
+provides explicit support for revision identifiers and branching.
+
+TODO
+====
+
+improve allocate-and-write or get-writer-buckets API to allow one-call (or
+maybe two-call) updates. The challenge is in figuring out which shares are on
+which machines. First cut will have lots of round trips.
+
+(eventually) define behavior when seqnum wraps. At the very least make sure
+it can't cause a security problem. "the slot is worn out" is acceptable.
+
+(eventually) define share-migration lease update protocol. Including the
+nodeid who accepted the lease is useful, we can use the same protocol as we
+do for updating the write enabler. However we need to know which lease to
+update.. maybe send back a list of all old nodeids that we find, then try all
+of them when we accept the update?
+
+We now do this in a specially-formatted IndexError exception:
+ "UNABLE to renew non-existent lease. I have leases accepted by " +
+ "nodeids: '12345','abcde','44221' ."
+
+confirm that a repairer can regenerate shares without the private key. Hmm,
+without the write-enabler they won't be able to write those shares to the
+servers.. although they could add immutable new shares to new servers.
+++ /dev/null
-=============
-Mutable Files
-=============
-
-This describes the "RSA-based mutable files" which were shipped in Tahoe v0.8.0.
-
-1. `Consistency vs. Availability`_
-2. `The Prime Coordination Directive: "Don't Do That"`_
-3. `Small Distributed Mutable Files`_
-
- 1. `SDMF slots overview`_
- 2. `Server Storage Protocol`_
- 3. `Code Details`_
- 4. `SMDF Slot Format`_
- 5. `Recovery`_
-
-4. `Medium Distributed Mutable Files`_
-5. `Large Distributed Mutable Files`_
-6. `TODO`_
-
-Mutable File Slots are places with a stable identifier that can hold data
-that changes over time. In contrast to CHK slots, for which the
-URI/identifier is derived from the contents themselves, the Mutable File Slot
-URI remains fixed for the life of the slot, regardless of what data is placed
-inside it.
-
-Each mutable slot is referenced by two different URIs. The "read-write" URI
-grants read-write access to its holder, allowing them to put whatever
-contents they like into the slot. The "read-only" URI is less powerful, only
-granting read access, and not enabling modification of the data. The
-read-write URI can be turned into the read-only URI, but not the other way
-around.
-
-The data in these slots is distributed over a number of servers, using the
-same erasure coding that CHK files use, with 3-of-10 being a typical choice
-of encoding parameters. The data is encrypted and signed in such a way that
-only the holders of the read-write URI will be able to set the contents of
-the slot, and only the holders of the read-only URI will be able to read
-those contents. Holders of either URI will be able to validate the contents
-as being written by someone with the read-write URI. The servers who hold the
-shares cannot read or modify them: the worst they can do is deny service (by
-deleting or corrupting the shares), or attempt a rollback attack (which can
-only succeed with the cooperation of at least k servers).
-
-Consistency vs. Availability
-============================
-
-There is an age-old battle between consistency and availability. Epic papers
-have been written, elaborate proofs have been established, and generations of
-theorists have learned that you cannot simultaneously achieve guaranteed
-consistency with guaranteed reliability. In addition, the closer to 0 you get
-on either axis, the cost and complexity of the design goes up.
-
-Tahoe's design goals are to largely favor design simplicity, then slightly
-favor read availability, over the other criteria.
-
-As we develop more sophisticated mutable slots, the API may expose multiple
-read versions to the application layer. The tahoe philosophy is to defer most
-consistency recovery logic to the higher layers. Some applications have
-effective ways to merge multiple versions, so inconsistency is not
-necessarily a problem (i.e. directory nodes can usually merge multiple "add
-child" operations).
-
-The Prime Coordination Directive: "Don't Do That"
-=================================================
-
-The current rule for applications which run on top of Tahoe is "do not
-perform simultaneous uncoordinated writes". That means you need non-tahoe
-means to make sure that two parties are not trying to modify the same mutable
-slot at the same time. For example:
-
-* don't give the read-write URI to anyone else. Dirnodes in a private
- directory generally satisfy this case, as long as you don't use two
- clients on the same account at the same time
-* if you give a read-write URI to someone else, stop using it yourself. An
- inbox would be a good example of this.
-* if you give a read-write URI to someone else, call them on the phone
- before you write into it
-* build an automated mechanism to have your agents coordinate writes.
- For example, we expect a future release to include a FURL for a
- "coordination server" in the dirnodes. The rule can be that you must
- contact the coordination server and obtain a lock/lease on the file
- before you're allowed to modify it.
-
-If you do not follow this rule, Bad Things will happen. The worst-case Bad
-Thing is that the entire file will be lost. A less-bad Bad Thing is that one
-or more of the simultaneous writers will lose their changes. An observer of
-the file may not see monotonically-increasing changes to the file, i.e. they
-may see version 1, then version 2, then 3, then 2 again.
-
-Tahoe takes some amount of care to reduce the badness of these Bad Things.
-One way you can help nudge it from the "lose your file" case into the "lose
-some changes" case is to reduce the number of competing versions: multiple
-versions of the file that different parties are trying to establish as the
-one true current contents. Each simultaneous writer counts as a "competing
-version", as does the previous version of the file. If the count "S" of these
-competing versions is larger than N/k, then the file runs the risk of being
-lost completely. [TODO] If at least one of the writers remains running after
-the collision is detected, it will attempt to recover, but if S>(N/k) and all
-writers crash after writing a few shares, the file will be lost.
-
-Note that Tahoe uses serialization internally to make sure that a single
-Tahoe node will not perform simultaneous modifications to a mutable file. It
-accomplishes this by using a weakref cache of the MutableFileNode (so that
-there will never be two distinct MutableFileNodes for the same file), and by
-forcing all mutable file operations to obtain a per-node lock before they
-run. The Prime Coordination Directive therefore applies to inter-node
-conflicts, not intra-node ones.
-
-
-Small Distributed Mutable Files
-===============================
-
-SDMF slots are suitable for small (<1MB) files that are editing by rewriting
-the entire file. The three operations are:
-
- * allocate (with initial contents)
- * set (with new contents)
- * get (old contents)
-
-The first use of SDMF slots will be to hold directories (dirnodes), which map
-encrypted child names to rw-URI/ro-URI pairs.
-
-SDMF slots overview
--------------------
-
-Each SDMF slot is created with a public/private key pair. The public key is
-known as the "verification key", while the private key is called the
-"signature key". The private key is hashed and truncated to 16 bytes to form
-the "write key" (an AES symmetric key). The write key is then hashed and
-truncated to form the "read key". The read key is hashed and truncated to
-form the 16-byte "storage index" (a unique string used as an index to locate
-stored data).
-
-The public key is hashed by itself to form the "verification key hash".
-
-The write key is hashed a different way to form the "write enabler master".
-For each storage server on which a share is kept, the write enabler master is
-concatenated with the server's nodeid and hashed, and the result is called
-the "write enabler" for that particular server. Note that multiple shares of
-the same slot stored on the same server will all get the same write enabler,
-i.e. the write enabler is associated with the "bucket", rather than the
-individual shares.
-
-The private key is encrypted (using AES in counter mode) by the write key,
-and the resulting crypttext is stored on the servers. so it will be
-retrievable by anyone who knows the write key. The write key is not used to
-encrypt anything else, and the private key never changes, so we do not need
-an IV for this purpose.
-
-The actual data is encrypted (using AES in counter mode) with a key derived
-by concatenating the readkey with the IV, the hashing the results and
-truncating to 16 bytes. The IV is randomly generated each time the slot is
-updated, and stored next to the encrypted data.
-
-The read-write URI consists of the write key and the verification key hash.
-The read-only URI contains the read key and the verification key hash. The
-verify-only URI contains the storage index and the verification key hash.
-
-::
-
- URI:SSK-RW:b2a(writekey):b2a(verification_key_hash)
- URI:SSK-RO:b2a(readkey):b2a(verification_key_hash)
- URI:SSK-Verify:b2a(storage_index):b2a(verification_key_hash)
-
-Note that this allows the read-only and verify-only URIs to be derived from
-the read-write URI without actually retrieving the public keys. Also note
-that it means the read-write agent must validate both the private key and the
-public key when they are first fetched. All users validate the public key in
-exactly the same way.
-
-The SDMF slot is allocated by sending a request to the storage server with a
-desired size, the storage index, and the write enabler for that server's
-nodeid. If granted, the write enabler is stashed inside the slot's backing
-store file. All further write requests must be accompanied by the write
-enabler or they will not be honored. The storage server does not share the
-write enabler with anyone else.
-
-The SDMF slot structure will be described in more detail below. The important
-pieces are:
-
-* a sequence number
-* a root hash "R"
-* the encoding parameters (including k, N, file size, segment size)
-* a signed copy of [seqnum,R,encoding_params], using the signature key
-* the verification key (not encrypted)
-* the share hash chain (part of a Merkle tree over the share hashes)
-* the block hash tree (Merkle tree over blocks of share data)
-* the share data itself (erasure-coding of read-key-encrypted file data)
-* the signature key, encrypted with the write key
-
-The access pattern for read is:
-
-* hash read-key to get storage index
-* use storage index to locate 'k' shares with identical 'R' values
-
- * either get one share, read 'k' from it, then read k-1 shares
- * or read, say, 5 shares, discover k, either get more or be finished
- * or copy k into the URIs
-
-* read verification key
-* hash verification key, compare against verification key hash
-* read seqnum, R, encoding parameters, signature
-* verify signature against verification key
-* read share data, compute block-hash Merkle tree and root "r"
-* read share hash chain (leading from "r" to "R")
-* validate share hash chain up to the root "R"
-* submit share data to erasure decoding
-* decrypt decoded data with read-key
-* submit plaintext to application
-
-The access pattern for write is:
-
-* hash write-key to get read-key, hash read-key to get storage index
-* use the storage index to locate at least one share
-* read verification key and encrypted signature key
-* decrypt signature key using write-key
-* hash signature key, compare against write-key
-* hash verification key, compare against verification key hash
-* encrypt plaintext from application with read-key
-
- * application can encrypt some data with the write-key to make it only
- available to writers (use this for transitive read-onlyness of dirnodes)
-
-* erasure-code crypttext to form shares
-* split shares into blocks
-* compute Merkle tree of blocks, giving root "r" for each share
-* compute Merkle tree of shares, find root "R" for the file as a whole
-* create share data structures, one per server:
-
- * use seqnum which is one higher than the old version
- * share hash chain has log(N) hashes, different for each server
- * signed data is the same for each server
-
-* now we have N shares and need homes for them
-* walk through peers
-
- * if share is not already present, allocate-and-set
- * otherwise, try to modify existing share:
- * send testv_and_writev operation to each one
- * testv says to accept share if their(seqnum+R) <= our(seqnum+R)
- * count how many servers wind up with which versions (histogram over R)
- * keep going until N servers have the same version, or we run out of servers
-
- * if any servers wound up with a different version, report error to
- application
- * if we ran out of servers, initiate recovery process (described below)
-
-Server Storage Protocol
------------------------
-
-The storage servers will provide a mutable slot container which is oblivious
-to the details of the data being contained inside it. Each storage index
-refers to a "bucket", and each bucket has one or more shares inside it. (In a
-well-provisioned network, each bucket will have only one share). The bucket
-is stored as a directory, using the base32-encoded storage index as the
-directory name. Each share is stored in a single file, using the share number
-as the filename.
-
-The container holds space for a container magic number (for versioning), the
-write enabler, the nodeid which accepted the write enabler (used for share
-migration, described below), a small number of lease structures, the embedded
-data itself, and expansion space for additional lease structures::
-
- # offset size name
- 1 0 32 magic verstr "tahoe mutable container v1" plus binary
- 2 32 20 write enabler's nodeid
- 3 52 32 write enabler
- 4 84 8 data size (actual share data present) (a)
- 5 92 8 offset of (8) count of extra leases (after data)
- 6 100 368 four leases, 92 bytes each
- 0 4 ownerid (0 means "no lease here")
- 4 4 expiration timestamp
- 8 32 renewal token
- 40 32 cancel token
- 72 20 nodeid which accepted the tokens
- 7 468 (a) data
- 8 ?? 4 count of extra leases
- 9 ?? n*92 extra leases
-
-The "extra leases" field must be copied and rewritten each time the size of
-the enclosed data changes. The hope is that most buckets will have four or
-fewer leases and this extra copying will not usually be necessary.
-
-The (4) "data size" field contains the actual number of bytes of data present
-in field (7), such that a client request to read beyond 504+(a) will result
-in an error. This allows the client to (one day) read relative to the end of
-the file. The container size (that is, (8)-(7)) might be larger, especially
-if extra size was pre-allocated in anticipation of filling the container with
-a lot of data.
-
-The offset in (5) points at the *count* of extra leases, at (8). The actual
-leases (at (9)) begin 4 bytes later. If the container size changes, both (8)
-and (9) must be relocated by copying.
-
-The server will honor any write commands that provide the write token and do
-not exceed the server-wide storage size limitations. Read and write commands
-MUST be restricted to the 'data' portion of the container: the implementation
-of those commands MUST perform correct bounds-checking to make sure other
-portions of the container are inaccessible to the clients.
-
-The two methods provided by the storage server on these "MutableSlot" share
-objects are:
-
-* readv(ListOf(offset=int, length=int))
-
- * returns a list of bytestrings, of the various requested lengths
- * offset < 0 is interpreted relative to the end of the data
- * spans which hit the end of the data will return truncated data
-
-* testv_and_writev(write_enabler, test_vector, write_vector)
-
- * this is a test-and-set operation which performs the given tests and only
- applies the desired writes if all tests succeed. This is used to detect
- simultaneous writers, and to reduce the chance that an update will lose
- data recently written by some other party (written after the last time
- this slot was read).
- * test_vector=ListOf(TupleOf(offset, length, opcode, specimen))
- * the opcode is a string, from the set [gt, ge, eq, le, lt, ne]
- * each element of the test vector is read from the slot's data and
- compared against the specimen using the desired (in)equality. If all
- tests evaluate True, the write is performed
- * write_vector=ListOf(TupleOf(offset, newdata))
-
- * offset < 0 is not yet defined, it probably means relative to the
- end of the data, which probably means append, but we haven't nailed
- it down quite yet
- * write vectors are executed in order, which specifies the results of
- overlapping writes
-
- * return value:
-
- * error: OutOfSpace
- * error: something else (io error, out of memory, whatever)
- * (True, old_test_data): the write was accepted (test_vector passed)
- * (False, old_test_data): the write was rejected (test_vector failed)
-
- * both 'accepted' and 'rejected' return the old data that was used
- for the test_vector comparison. This can be used by the client
- to detect write collisions, including collisions for which the
- desired behavior was to overwrite the old version.
-
-In addition, the storage server provides several methods to access these
-share objects:
-
-* allocate_mutable_slot(storage_index, sharenums=SetOf(int))
-
- * returns DictOf(int, MutableSlot)
-
-* get_mutable_slot(storage_index)
-
- * returns DictOf(int, MutableSlot)
- * or raises KeyError
-
-We intend to add an interface which allows small slots to allocate-and-write
-in a single call, as well as do update or read in a single call. The goal is
-to allow a reasonably-sized dirnode to be created (or updated, or read) in
-just one round trip (to all N shareholders in parallel).
-
-migrating shares
-````````````````
-
-If a share must be migrated from one server to another, two values become
-invalid: the write enabler (since it was computed for the old server), and
-the lease renew/cancel tokens.
-
-Suppose that a slot was first created on nodeA, and was thus initialized with
-WE(nodeA) (= H(WEM+nodeA)). Later, for provisioning reasons, the share is
-moved from nodeA to nodeB.
-
-Readers may still be able to find the share in its new home, depending upon
-how many servers are present in the grid, where the new nodeid lands in the
-permuted index for this particular storage index, and how many servers the
-reading client is willing to contact.
-
-When a client attempts to write to this migrated share, it will get a "bad
-write enabler" error, since the WE it computes for nodeB will not match the
-WE(nodeA) that was embedded in the share. When this occurs, the "bad write
-enabler" message must include the old nodeid (e.g. nodeA) that was in the
-share.
-
-The client then computes H(nodeB+H(WEM+nodeA)), which is the same as
-H(nodeB+WE(nodeA)). The client sends this along with the new WE(nodeB), which
-is H(WEM+nodeB). Note that the client only sends WE(nodeB) to nodeB, never to
-anyone else. Also note that the client does not send a value to nodeB that
-would allow the node to impersonate the client to a third node: everything
-sent to nodeB will include something specific to nodeB in it.
-
-The server locally computes H(nodeB+WE(nodeA)), using its own node id and the
-old write enabler from the share. It compares this against the value supplied
-by the client. If they match, this serves as proof that the client was able
-to compute the old write enabler. The server then accepts the client's new
-WE(nodeB) and writes it into the container.
-
-This WE-fixup process requires an extra round trip, and requires the error
-message to include the old nodeid, but does not require any public key
-operations on either client or server.
-
-Migrating the leases will require a similar protocol. This protocol will be
-defined concretely at a later date.
-
-Code Details
-------------
-
-The MutableFileNode class is used to manipulate mutable files (as opposed to
-ImmutableFileNodes). These are initially generated with
-client.create_mutable_file(), and later recreated from URIs with
-client.create_node_from_uri(). Instances of this class will contain a URI and
-a reference to the client (for peer selection and connection).
-
-NOTE: this section is out of date. Please see src/allmydata/interfaces.py
-(the section on IMutableFilesystemNode) for more accurate information.
-
-The methods of MutableFileNode are:
-
-* download_to_data() -> [deferred] newdata, NotEnoughSharesError
-
- * if there are multiple retrieveable versions in the grid, get() returns
- the first version it can reconstruct, and silently ignores the others.
- In the future, a more advanced API will signal and provide access to
- the multiple heads.
-
-* update(newdata) -> OK, UncoordinatedWriteError, NotEnoughSharesError
-* overwrite(newdata) -> OK, UncoordinatedWriteError, NotEnoughSharesError
-
-download_to_data() causes a new retrieval to occur, pulling the current
-contents from the grid and returning them to the caller. At the same time,
-this call caches information about the current version of the file. This
-information will be used in a subsequent call to update(), and if another
-change has occured between the two, this information will be out of date,
-triggering the UncoordinatedWriteError.
-
-update() is therefore intended to be used just after a download_to_data(), in
-the following pattern::
-
- d = mfn.download_to_data()
- d.addCallback(apply_delta)
- d.addCallback(mfn.update)
-
-If the update() call raises UCW, then the application can simply return an
-error to the user ("you violated the Prime Coordination Directive"), and they
-can try again later. Alternatively, the application can attempt to retry on
-its own. To accomplish this, the app needs to pause, download the new
-(post-collision and post-recovery) form of the file, reapply their delta,
-then submit the update request again. A randomized pause is necessary to
-reduce the chances of colliding a second time with another client that is
-doing exactly the same thing::
-
- d = mfn.download_to_data()
- d.addCallback(apply_delta)
- d.addCallback(mfn.update)
- def _retry(f):
- f.trap(UncoordinatedWriteError)
- d1 = pause(random.uniform(5, 20))
- d1.addCallback(lambda res: mfn.download_to_data())
- d1.addCallback(apply_delta)
- d1.addCallback(mfn.update)
- return d1
- d.addErrback(_retry)
-
-Enthusiastic applications can retry multiple times, using a randomized
-exponential backoff between each. A particularly enthusiastic application can
-retry forever, but such apps are encouraged to provide a means to the user of
-giving up after a while.
-
-UCW does not mean that the update was not applied, so it is also a good idea
-to skip the retry-update step if the delta was already applied::
-
- d = mfn.download_to_data()
- d.addCallback(apply_delta)
- d.addCallback(mfn.update)
- def _retry(f):
- f.trap(UncoordinatedWriteError)
- d1 = pause(random.uniform(5, 20))
- d1.addCallback(lambda res: mfn.download_to_data())
- def _maybe_apply_delta(contents):
- new_contents = apply_delta(contents)
- if new_contents != contents:
- return mfn.update(new_contents)
- d1.addCallback(_maybe_apply_delta)
- return d1
- d.addErrback(_retry)
-
-update() is the right interface to use for delta-application situations, like
-directory nodes (in which apply_delta might be adding or removing child
-entries from a serialized table).
-
-Note that any uncoordinated write has the potential to lose data. We must do
-more analysis to be sure, but it appears that two clients who write to the
-same mutable file at the same time (even if both eventually retry) will, with
-high probability, result in one client observing UCW and the other silently
-losing their changes. It is also possible for both clients to observe UCW.
-The moral of the story is that the Prime Coordination Directive is there for
-a reason, and that recovery/UCW/retry is not a subsitute for write
-coordination.
-
-overwrite() tells the client to ignore this cached version information, and
-to unconditionally replace the mutable file's contents with the new data.
-This should not be used in delta application, but rather in situations where
-you want to replace the file's contents with completely unrelated ones. When
-raw files are uploaded into a mutable slot through the tahoe webapi (using
-POST and the ?mutable=true argument), they are put in place with overwrite().
-
-The peer-selection and data-structure manipulation (and signing/verification)
-steps will be implemented in a separate class in allmydata/mutable.py .
-
-SMDF Slot Format
-----------------
-
-This SMDF data lives inside a server-side MutableSlot container. The server
-is oblivious to this format.
-
-This data is tightly packed. In particular, the share data is defined to run
-all the way to the beginning of the encrypted private key (the encprivkey
-offset is used both to terminate the share data and to begin the encprivkey).
-
-::
-
- # offset size name
- 1 0 1 version byte, \x00 for this format
- 2 1 8 sequence number. 2^64-1 must be handled specially, TBD
- 3 9 32 "R" (root of share hash Merkle tree)
- 4 41 16 IV (share data is AES(H(readkey+IV)) )
- 5 57 18 encoding parameters:
- 57 1 k
- 58 1 N
- 59 8 segment size
- 67 8 data length (of original plaintext)
- 6 75 32 offset table:
- 75 4 (8) signature
- 79 4 (9) share hash chain
- 83 4 (10) block hash tree
- 87 4 (11) share data
- 91 8 (12) encrypted private key
- 99 8 (13) EOF
- 7 107 436ish verification key (2048 RSA key)
- 8 543ish 256ish signature=RSAenc(sigkey, H(version+seqnum+r+IV+encparm))
- 9 799ish (a) share hash chain, encoded as:
- "".join([pack(">H32s", shnum, hash)
- for (shnum,hash) in needed_hashes])
- 10 (927ish) (b) block hash tree, encoded as:
- "".join([pack(">32s",hash) for hash in block_hash_tree])
- 11 (935ish) LEN share data (no gap between this and encprivkey)
- 12 ?? 1216ish encrypted private key= AESenc(write-key, RSA-key)
- 13 ?? -- EOF
-
- (a) The share hash chain contains ceil(log(N)) hashes, each 32 bytes long.
- This is the set of hashes necessary to validate this share's leaf in the
- share Merkle tree. For N=10, this is 4 hashes, i.e. 128 bytes.
- (b) The block hash tree contains ceil(length/segsize) hashes, each 32 bytes
- long. This is the set of hashes necessary to validate any given block of
- share data up to the per-share root "r". Each "r" is a leaf of the share
- has tree (with root "R"), from which a minimal subset of hashes is put in
- the share hash chain in (8).
-
-Recovery
---------
-
-The first line of defense against damage caused by colliding writes is the
-Prime Coordination Directive: "Don't Do That".
-
-The second line of defense is to keep "S" (the number of competing versions)
-lower than N/k. If this holds true, at least one competing version will have
-k shares and thus be recoverable. Note that server unavailability counts
-against us here: the old version stored on the unavailable server must be
-included in the value of S.
-
-The third line of defense is our use of testv_and_writev() (described below),
-which increases the convergence of simultaneous writes: one of the writers
-will be favored (the one with the highest "R"), and that version is more
-likely to be accepted than the others. This defense is least effective in the
-pathological situation where S simultaneous writers are active, the one with
-the lowest "R" writes to N-k+1 of the shares and then dies, then the one with
-the next-lowest "R" writes to N-2k+1 of the shares and dies, etc, until the
-one with the highest "R" writes to k-1 shares and dies. Any other sequencing
-will allow the highest "R" to write to at least k shares and establish a new
-revision.
-
-The fourth line of defense is the fact that each client keeps writing until
-at least one version has N shares. This uses additional servers, if
-necessary, to make sure that either the client's version or some
-newer/overriding version is highly available.
-
-The fifth line of defense is the recovery algorithm, which seeks to make sure
-that at least *one* version is highly available, even if that version is
-somebody else's.
-
-The write-shares-to-peers algorithm is as follows:
-
-* permute peers according to storage index
-* walk through peers, trying to assign one share per peer
-* for each peer:
-
- * send testv_and_writev, using "old(seqnum+R) <= our(seqnum+R)" as the test
-
- * this means that we will overwrite any old versions, and we will
- overwrite simultaenous writers of the same version if our R is higher.
- We will not overwrite writers using a higher seqnum.
-
- * record the version that each share winds up with. If the write was
- accepted, this is our own version. If it was rejected, read the
- old_test_data to find out what version was retained.
- * if old_test_data indicates the seqnum was equal or greater than our
- own, mark the "Simultanous Writes Detected" flag, which will eventually
- result in an error being reported to the writer (in their close() call).
- * build a histogram of "R" values
- * repeat until the histogram indicate that some version (possibly ours)
- has N shares. Use new servers if necessary.
- * If we run out of servers:
-
- * if there are at least shares-of-happiness of any one version, we're
- happy, so return. (the close() might still get an error)
- * not happy, need to reinforce something, goto RECOVERY
-
-Recovery:
-
-* read all shares, count the versions, identify the recoverable ones,
- discard the unrecoverable ones.
-* sort versions: locate max(seqnums), put all versions with that seqnum
- in the list, sort by number of outstanding shares. Then put our own
- version. (TODO: put versions with seqnum <max but >us ahead of us?).
-* for each version:
-
- * attempt to recover that version
- * if not possible, remove it from the list, go to next one
- * if recovered, start at beginning of peer list, push that version,
- continue until N shares are placed
- * if pushing our own version, bump up the seqnum to one higher than
- the max seqnum we saw
- * if we run out of servers:
-
- * schedule retry and exponential backoff to repeat RECOVERY
-
- * admit defeat after some period? presumeably the client will be shut down
- eventually, maybe keep trying (once per hour?) until then.
-
-
-Medium Distributed Mutable Files
-================================
-
-These are just like the SDMF case, but:
-
-* we actually take advantage of the Merkle hash tree over the blocks, by
- reading a single segment of data at a time (and its necessary hashes), to
- reduce the read-time alacrity
-* we allow arbitrary writes to the file (i.e. seek() is provided, and
- O_TRUNC is no longer required)
-* we write more code on the client side (in the MutableFileNode class), to
- first read each segment that a write must modify. This looks exactly like
- the way a normal filesystem uses a block device, or how a CPU must perform
- a cache-line fill before modifying a single word.
-* we might implement some sort of copy-based atomic update server call,
- to allow multiple writev() calls to appear atomic to any readers.
-
-MDMF slots provide fairly efficient in-place edits of very large files (a few
-GB). Appending data is also fairly efficient, although each time a power of 2
-boundary is crossed, the entire file must effectively be re-uploaded (because
-the size of the block hash tree changes), so if the filesize is known in
-advance, that space ought to be pre-allocated (by leaving extra space between
-the block hash tree and the actual data).
-
-MDMF1 uses the Merkle tree to enable low-alacrity random-access reads. MDMF2
-adds cache-line reads to allow random-access writes.
-
-Large Distributed Mutable Files
-===============================
-
-LDMF slots use a fundamentally different way to store the file, inspired by
-Mercurial's "revlog" format. They enable very efficient insert/remove/replace
-editing of arbitrary spans. Multiple versions of the file can be retained, in
-a revision graph that can have multiple heads. Each revision can be
-referenced by a cryptographic identifier. There are two forms of the URI, one
-that means "most recent version", and a longer one that points to a specific
-revision.
-
-Metadata can be attached to the revisions, like timestamps, to enable rolling
-back an entire tree to a specific point in history.
-
-LDMF1 provides deltas but tries to avoid dealing with multiple heads. LDMF2
-provides explicit support for revision identifiers and branching.
-
-TODO
-====
-
-improve allocate-and-write or get-writer-buckets API to allow one-call (or
-maybe two-call) updates. The challenge is in figuring out which shares are on
-which machines. First cut will have lots of round trips.
-
-(eventually) define behavior when seqnum wraps. At the very least make sure
-it can't cause a security problem. "the slot is worn out" is acceptable.
-
-(eventually) define share-migration lease update protocol. Including the
-nodeid who accepted the lease is useful, we can use the same protocol as we
-do for updating the write enabler. However we need to know which lease to
-update.. maybe send back a list of all old nodeids that we find, then try all
-of them when we accept the update?
-
-We now do this in a specially-formatted IndexError exception:
- "UNABLE to renew non-existent lease. I have leases accepted by " +
- "nodeids: '12345','abcde','44221' ."
-
-confirm that a repairer can regenerate shares without the private key. Hmm,
-without the write-enabler they won't be able to write those shares to the
-servers.. although they could add immutable new shares to new servers.
--- /dev/null
+==============================
+Specification Document Outline
+==============================
+
+While we do not yet have a clear set of specification documents for Tahoe
+(explaining the file formats, so that others can write interoperable
+implementations), this document is intended to lay out an outline for what
+these specs ought to contain. Think of this as the ISO 7-Layer Model for
+Tahoe.
+
+We currently imagine 4 documents.
+
+1. `#1: Share Format, Encoding Algorithm`_
+2. `#2: Share Exchange Protocol`_
+3. `#3: Server Selection Algorithm, filecap format`_
+4. `#4: Directory Format`_
+
+#1: Share Format, Encoding Algorithm
+====================================
+
+This document will describe the way that files are encrypted and encoded into
+shares. It will include a specification of the share format, and explain both
+the encoding and decoding algorithms. It will cover both mutable and
+immutable files.
+
+The immutable encoding algorithm, as described by this document, will start
+with a plaintext series of bytes, encoding parameters "k" and "N", and either
+an encryption key or a mechanism for deterministically deriving the key from
+the plaintext (the CHK specification). The algorithm will end with a set of N
+shares, and a set of values that must be included in the filecap to provide
+confidentiality (the encryption key) and integrity (the UEB hash).
+
+The immutable decoding algorithm will start with the filecap values (key and
+UEB hash) and "k" shares. It will explain how to validate the shares against
+the integrity information, how to reverse the erasure-coding, and how to
+decrypt the resulting ciphertext. It will result in the original plaintext
+bytes (or some subrange thereof).
+
+The sections on mutable files will contain similar information.
+
+This document is *not* responsible for explaining the filecap format, since
+full filecaps may need to contain additional information as described in
+document #3. Likewise it it not responsible for explaining where to put the
+generated shares or where to find them again later.
+
+It is also not responsible for explaining the access control mechanisms
+surrounding share upload, download, or modification ("Accounting" is the
+business of controlling share upload to conserve space, and mutable file
+shares require some sort of access control to prevent non-writecap holders
+from destroying shares). We don't yet have a document dedicated to explaining
+these, but let's call it "Access Control" for now.
+
+
+#2: Share Exchange Protocol
+===========================
+
+This document explains the wire-protocol used to upload, download, and modify
+shares on the various storage servers.
+
+Given the N shares created by the algorithm described in document #1, and a
+set of servers who are willing to accept those shares, the protocols in this
+document will be sufficient to get the shares onto the servers. Likewise,
+given a set of servers who hold at least k shares, these protocols will be
+enough to retrieve the shares necessary to begin the decoding process
+described in document #1. The notion of a "storage index" is used to
+reference a particular share: the storage index is generated by the encoding
+process described in document #1.
+
+This document does *not* describe how to identify or choose those servers,
+rather it explains what to do once they have been selected (by the mechanisms
+in document #3).
+
+This document also explains the protocols that a client uses to ask a server
+whether or not it is willing to accept an uploaded share, and whether it has
+a share available for download. These protocols will be used by the
+mechanisms in document #3 to help decide where the shares should be placed.
+
+Where cryptographic mechanisms are necessary to implement access-control
+policy, this document will explain those mechanisms.
+
+In the future, Tahoe will be able to use multiple protocols to speak to
+storage servers. There will be alternative forms of this document, one for
+each protocol. The first one to be written will describe the Foolscap-based
+protocol that tahoe currently uses, but we anticipate a subsequent one to
+describe a more HTTP-based protocol.
+
+#3: Server Selection Algorithm, filecap format
+==============================================
+
+This document has two interrelated purposes. With a deeper understanding of
+the issues, we may be able to separate these more cleanly in the future.
+
+The first purpose is to explain the server selection algorithm. Given a set
+of N shares, where should those shares be uploaded? Given some information
+stored about a previously-uploaded file, how should a downloader locate and
+recover at least k shares? Given a previously-uploaded mutable file, how
+should a modifier locate all (or most of) the shares with a reasonable amount
+of work?
+
+This question implies many things, all of which should be explained in this
+document:
+
+* the notion of a "grid", nominally a set of servers who could potentially
+ hold shares, which might change over time
+* a way to configure which grid should be used
+* a way to discover which servers are a part of that grid
+* a way to decide which servers are reliable enough to be worth sending
+ shares
+* an algorithm to handle servers which refuse shares
+* a way for a downloader to locate which servers have shares
+* a way to choose which shares should be used for download
+
+The server-selection algorithm has several obviously competing goals:
+
+* minimize the amount of work that must be done during upload
+* minimize the total storage resources used
+* avoid "hot spots", balance load among multiple servers
+* maximize the chance that enough shares will be downloadable later, by
+ uploading lots of shares, and by placing them on reliable servers
+* minimize the work that the future downloader must do
+* tolerate temporary server failures, permanent server departure, and new
+ server insertions
+* minimize the amount of information that must be added to the filecap
+
+The server-selection algorithm is defined in some context: some set of
+expectations about the servers or grid with which it is expected to operate.
+Different algorithms are appropriate for different situtations, so there will
+be multiple alternatives of this document.
+
+The first version of this document will describe the algorithm that the
+current (1.3.0) release uses, which is heavily weighted towards the two main
+use case scenarios for which Tahoe has been designed: the small, stable
+friendnet, and the allmydata.com managed grid. In both cases, we assume that
+the storage servers are online most of the time, they are uniformly highly
+reliable, and that the set of servers does not change very rapidly. The
+server-selection algorithm for this environment uses a permuted server list
+to achieve load-balancing, uses all servers identically, and derives the
+permutation key from the storage index to avoid adding a new field to the
+filecap.
+
+An alternative algorithm could give clients more precise control over share
+placement, for example by a user who wished to make sure that k+1 shares are
+located in each datacenter (to allow downloads to take place using only local
+bandwidth). This algorithm could skip the permuted list and use other
+mechanisms to accomplish load-balancing (or ignore the issue altogether). It
+could add additional information to the filecap (like a list of which servers
+received the shares) in lieu of performing a search at download time, perhaps
+at the expense of allowing a repairer to move shares to a new server after
+the initial upload. It might make up for this by storing "location hints"
+next to each share, to indicate where other shares are likely to be found,
+and obligating the repairer to update these hints.
+
+The second purpose of this document is to explain the format of the file
+capability string (or "filecap" for short). There are multiple kinds of
+capabilties (read-write, read-only, verify-only, repaircap, lease-renewal
+cap, traverse-only, etc). There are multiple ways to represent the filecap
+(compressed binary, human-readable, clickable-HTTP-URL, "tahoe:" URL, etc),
+but they must all contain enough information to reliably retrieve a file
+(given some context, of course). It must at least contain the confidentiality
+and integrity information from document #1 (i.e. the encryption key and the
+UEB hash). It must also contain whatever additional information the
+upload-time server-selection algorithm generated that will be required by the
+downloader.
+
+For some server-selection algorithms, the additional information will be
+minimal. For example, the 1.3.0 release uses the hash of the encryption key
+as a storage index, and uses the storage index to permute the server list,
+and uses an Introducer to learn the current list of servers. This allows a
+"close-enough" list of servers to be compressed into a filecap field that is
+already required anyways (the encryption key). It also adds k and N to the
+filecap, to speed up the downloader's search (the downloader knows how many
+shares it needs, so it can send out multiple queries in parallel).
+
+But other server-selection algorithms might require more information. Each
+variant of this document will explain how to encode that additional
+information into the filecap, and how to extract and use that information at
+download time.
+
+These two purposes are interrelated. A filecap that is interpreted in the
+context of the allmydata.com commercial grid, which uses tahoe-1.3.0, implies
+a specific peer-selection algorithm, a specific Introducer, and therefore a
+fairly-specific set of servers to query for shares. A filecap which is meant
+to be interpreted on a different sort of grid would need different
+information.
+
+Some filecap formats can be designed to contain more information (and depend
+less upon context), such as the way an HTTP URL implies the existence of a
+single global DNS system. Ideally a tahoe filecap should be able to specify
+which "grid" it lives in, with enough information to allow a compatible
+implementation of Tahoe to locate that grid and retrieve the file (regardless
+of which server-selection algorithm was used for upload).
+
+This more-universal format might come at the expense of reliability, however.
+Tahoe-1.3.0 filecaps do not contain hostnames, because the failure of DNS or
+an individual host might then impact file availability (however the
+Introducer contains DNS names or IP addresses).
+
+#4: Directory Format
+====================
+
+Tahoe directories are a special way of interpreting and managing the contents
+of a file (either mutable or immutable). These "dirnode" files are basically
+serialized tables that map child name to filecap/dircap. This document
+describes the format of these files.
+
+Tahoe-1.3.0 directories are "transitively readonly", which is accomplished by
+applying an additional layer of encryption to the list of child writecaps.
+The key for this encryption is derived from the containing file's writecap.
+This document must explain how to derive this key and apply it to the
+appropriate portion of the table.
+
+Future versions of the directory format are expected to contain
+"deep-traversal caps", which allow verification/repair of files without
+exposing their plaintext to the repair agent. This document wil be
+responsible for explaining traversal caps too.
+
+Future versions of the directory format will probably contain an index and
+more advanced data structures (for efficiency and fast lookups), instead of a
+simple flat list of (childname, childcap). This document will also need to
+describe metadata formats, including what access-control policies are defined
+for the metadata.
+++ /dev/null
-==============================
-Specification Document Outline
-==============================
-
-While we do not yet have a clear set of specification documents for Tahoe
-(explaining the file formats, so that others can write interoperable
-implementations), this document is intended to lay out an outline for what
-these specs ought to contain. Think of this as the ISO 7-Layer Model for
-Tahoe.
-
-We currently imagine 4 documents.
-
-1. `#1: Share Format, Encoding Algorithm`_
-2. `#2: Share Exchange Protocol`_
-3. `#3: Server Selection Algorithm, filecap format`_
-4. `#4: Directory Format`_
-
-#1: Share Format, Encoding Algorithm
-====================================
-
-This document will describe the way that files are encrypted and encoded into
-shares. It will include a specification of the share format, and explain both
-the encoding and decoding algorithms. It will cover both mutable and
-immutable files.
-
-The immutable encoding algorithm, as described by this document, will start
-with a plaintext series of bytes, encoding parameters "k" and "N", and either
-an encryption key or a mechanism for deterministically deriving the key from
-the plaintext (the CHK specification). The algorithm will end with a set of N
-shares, and a set of values that must be included in the filecap to provide
-confidentiality (the encryption key) and integrity (the UEB hash).
-
-The immutable decoding algorithm will start with the filecap values (key and
-UEB hash) and "k" shares. It will explain how to validate the shares against
-the integrity information, how to reverse the erasure-coding, and how to
-decrypt the resulting ciphertext. It will result in the original plaintext
-bytes (or some subrange thereof).
-
-The sections on mutable files will contain similar information.
-
-This document is *not* responsible for explaining the filecap format, since
-full filecaps may need to contain additional information as described in
-document #3. Likewise it it not responsible for explaining where to put the
-generated shares or where to find them again later.
-
-It is also not responsible for explaining the access control mechanisms
-surrounding share upload, download, or modification ("Accounting" is the
-business of controlling share upload to conserve space, and mutable file
-shares require some sort of access control to prevent non-writecap holders
-from destroying shares). We don't yet have a document dedicated to explaining
-these, but let's call it "Access Control" for now.
-
-
-#2: Share Exchange Protocol
-===========================
-
-This document explains the wire-protocol used to upload, download, and modify
-shares on the various storage servers.
-
-Given the N shares created by the algorithm described in document #1, and a
-set of servers who are willing to accept those shares, the protocols in this
-document will be sufficient to get the shares onto the servers. Likewise,
-given a set of servers who hold at least k shares, these protocols will be
-enough to retrieve the shares necessary to begin the decoding process
-described in document #1. The notion of a "storage index" is used to
-reference a particular share: the storage index is generated by the encoding
-process described in document #1.
-
-This document does *not* describe how to identify or choose those servers,
-rather it explains what to do once they have been selected (by the mechanisms
-in document #3).
-
-This document also explains the protocols that a client uses to ask a server
-whether or not it is willing to accept an uploaded share, and whether it has
-a share available for download. These protocols will be used by the
-mechanisms in document #3 to help decide where the shares should be placed.
-
-Where cryptographic mechanisms are necessary to implement access-control
-policy, this document will explain those mechanisms.
-
-In the future, Tahoe will be able to use multiple protocols to speak to
-storage servers. There will be alternative forms of this document, one for
-each protocol. The first one to be written will describe the Foolscap-based
-protocol that tahoe currently uses, but we anticipate a subsequent one to
-describe a more HTTP-based protocol.
-
-#3: Server Selection Algorithm, filecap format
-==============================================
-
-This document has two interrelated purposes. With a deeper understanding of
-the issues, we may be able to separate these more cleanly in the future.
-
-The first purpose is to explain the server selection algorithm. Given a set
-of N shares, where should those shares be uploaded? Given some information
-stored about a previously-uploaded file, how should a downloader locate and
-recover at least k shares? Given a previously-uploaded mutable file, how
-should a modifier locate all (or most of) the shares with a reasonable amount
-of work?
-
-This question implies many things, all of which should be explained in this
-document:
-
-* the notion of a "grid", nominally a set of servers who could potentially
- hold shares, which might change over time
-* a way to configure which grid should be used
-* a way to discover which servers are a part of that grid
-* a way to decide which servers are reliable enough to be worth sending
- shares
-* an algorithm to handle servers which refuse shares
-* a way for a downloader to locate which servers have shares
-* a way to choose which shares should be used for download
-
-The server-selection algorithm has several obviously competing goals:
-
-* minimize the amount of work that must be done during upload
-* minimize the total storage resources used
-* avoid "hot spots", balance load among multiple servers
-* maximize the chance that enough shares will be downloadable later, by
- uploading lots of shares, and by placing them on reliable servers
-* minimize the work that the future downloader must do
-* tolerate temporary server failures, permanent server departure, and new
- server insertions
-* minimize the amount of information that must be added to the filecap
-
-The server-selection algorithm is defined in some context: some set of
-expectations about the servers or grid with which it is expected to operate.
-Different algorithms are appropriate for different situtations, so there will
-be multiple alternatives of this document.
-
-The first version of this document will describe the algorithm that the
-current (1.3.0) release uses, which is heavily weighted towards the two main
-use case scenarios for which Tahoe has been designed: the small, stable
-friendnet, and the allmydata.com managed grid. In both cases, we assume that
-the storage servers are online most of the time, they are uniformly highly
-reliable, and that the set of servers does not change very rapidly. The
-server-selection algorithm for this environment uses a permuted server list
-to achieve load-balancing, uses all servers identically, and derives the
-permutation key from the storage index to avoid adding a new field to the
-filecap.
-
-An alternative algorithm could give clients more precise control over share
-placement, for example by a user who wished to make sure that k+1 shares are
-located in each datacenter (to allow downloads to take place using only local
-bandwidth). This algorithm could skip the permuted list and use other
-mechanisms to accomplish load-balancing (or ignore the issue altogether). It
-could add additional information to the filecap (like a list of which servers
-received the shares) in lieu of performing a search at download time, perhaps
-at the expense of allowing a repairer to move shares to a new server after
-the initial upload. It might make up for this by storing "location hints"
-next to each share, to indicate where other shares are likely to be found,
-and obligating the repairer to update these hints.
-
-The second purpose of this document is to explain the format of the file
-capability string (or "filecap" for short). There are multiple kinds of
-capabilties (read-write, read-only, verify-only, repaircap, lease-renewal
-cap, traverse-only, etc). There are multiple ways to represent the filecap
-(compressed binary, human-readable, clickable-HTTP-URL, "tahoe:" URL, etc),
-but they must all contain enough information to reliably retrieve a file
-(given some context, of course). It must at least contain the confidentiality
-and integrity information from document #1 (i.e. the encryption key and the
-UEB hash). It must also contain whatever additional information the
-upload-time server-selection algorithm generated that will be required by the
-downloader.
-
-For some server-selection algorithms, the additional information will be
-minimal. For example, the 1.3.0 release uses the hash of the encryption key
-as a storage index, and uses the storage index to permute the server list,
-and uses an Introducer to learn the current list of servers. This allows a
-"close-enough" list of servers to be compressed into a filecap field that is
-already required anyways (the encryption key). It also adds k and N to the
-filecap, to speed up the downloader's search (the downloader knows how many
-shares it needs, so it can send out multiple queries in parallel).
-
-But other server-selection algorithms might require more information. Each
-variant of this document will explain how to encode that additional
-information into the filecap, and how to extract and use that information at
-download time.
-
-These two purposes are interrelated. A filecap that is interpreted in the
-context of the allmydata.com commercial grid, which uses tahoe-1.3.0, implies
-a specific peer-selection algorithm, a specific Introducer, and therefore a
-fairly-specific set of servers to query for shares. A filecap which is meant
-to be interpreted on a different sort of grid would need different
-information.
-
-Some filecap formats can be designed to contain more information (and depend
-less upon context), such as the way an HTTP URL implies the existence of a
-single global DNS system. Ideally a tahoe filecap should be able to specify
-which "grid" it lives in, with enough information to allow a compatible
-implementation of Tahoe to locate that grid and retrieve the file (regardless
-of which server-selection algorithm was used for upload).
-
-This more-universal format might come at the expense of reliability, however.
-Tahoe-1.3.0 filecaps do not contain hostnames, because the failure of DNS or
-an individual host might then impact file availability (however the
-Introducer contains DNS names or IP addresses).
-
-#4: Directory Format
-====================
-
-Tahoe directories are a special way of interpreting and managing the contents
-of a file (either mutable or immutable). These "dirnode" files are basically
-serialized tables that map child name to filecap/dircap. This document
-describes the format of these files.
-
-Tahoe-1.3.0 directories are "transitively readonly", which is accomplished by
-applying an additional layer of encryption to the list of child writecaps.
-The key for this encryption is derived from the containing file's writecap.
-This document must explain how to derive this key and apply it to the
-appropriate portion of the table.
-
-Future versions of the directory format are expected to contain
-"deep-traversal caps", which allow verification/repair of files without
-exposing their plaintext to the repair agent. This document wil be
-responsible for explaining traversal caps too.
-
-Future versions of the directory format will probably contain an index and
-more advanced data structures (for efficiency and fast lookups), instead of a
-simple flat list of (childname, childcap). This document will also need to
-describe metadata formats, including what access-control policies are defined
-for the metadata.
--- /dev/null
+====================
+Servers of Happiness
+====================
+
+When you upload a file to a Tahoe-LAFS grid, you expect that it will
+stay there for a while, and that it will do so even if a few of the
+peers on the grid stop working, or if something else goes wrong. An
+upload health metric helps to make sure that this actually happens.
+An upload health metric is a test that looks at a file on a Tahoe-LAFS
+grid and says whether or not that file is healthy; that is, whether it
+is distributed on the grid in such a way as to ensure that it will
+probably survive in good enough shape to be recoverable, even if a few
+things go wrong between the time of the test and the time that it is
+recovered. Our current upload health metric for immutable files is called
+'servers-of-happiness'; its predecessor was called 'shares-of-happiness'.
+
+shares-of-happiness used the number of encoded shares generated by a
+file upload to say whether or not it was healthy. If there were more
+shares than a user-configurable threshold, the file was reported to be
+healthy; otherwise, it was reported to be unhealthy. In normal
+situations, the upload process would distribute shares fairly evenly
+over the peers in the grid, and in that case shares-of-happiness
+worked fine. However, because it only considered the number of shares,
+and not where they were on the grid, it could not detect situations
+where a file was unhealthy because most or all of the shares generated
+from the file were stored on one or two peers.
+
+servers-of-happiness addresses this by extending the share-focused
+upload health metric to also consider the location of the shares on
+grid. servers-of-happiness looks at the mapping of peers to the shares
+that they hold, and compares the cardinality of the largest happy subset
+of those to a user-configurable threshold. A happy subset of peers has
+the property that any k (where k is as in k-of-n encoding) peers within
+the subset can reconstruct the source file. This definition of file
+health provides a stronger assurance of file availability over time;
+with 3-of-10 encoding, and happy=7, a healthy file is still guaranteed
+to be available even if 4 peers fail.
+
+Measuring Servers of Happiness
+==============================
+
+We calculate servers-of-happiness by computing a matching on a
+bipartite graph that is related to the layout of shares on the grid.
+One set of vertices is the peers on the grid, and one set of vertices is
+the shares. An edge connects a peer and a share if the peer will (or
+does, for existing shares) hold the share. The size of the maximum
+matching on this graph is the size of the largest happy peer set that
+exists for the upload.
+
+First, note that a bipartite matching of size n corresponds to a happy
+subset of size n. This is because a bipartite matching of size n implies
+that there are n peers such that each peer holds a share that no other
+peer holds. Then any k of those peers collectively hold k distinct
+shares, and can restore the file.
+
+A bipartite matching of size n is not necessary for a happy subset of
+size n, however (so it is not correct to say that the size of the
+maximum matching on this graph is the size of the largest happy subset
+of peers that exists for the upload). For example, consider a file with
+k = 3, and suppose that each peer has all three of those pieces. Then,
+since any peer from the original upload can restore the file, if there
+are 10 peers holding shares, and the happiness threshold is 7, the
+upload should be declared happy, because there is a happy subset of size
+10, and 10 > 7. However, since a maximum matching on the bipartite graph
+related to this layout has only 3 edges, Tahoe-LAFS declares the upload
+unhealthy. Though it is not unhealthy, a share layout like this example
+is inefficient; for k = 3, and if there are n peers, it corresponds to
+an expansion factor of 10x. Layouts that are declared healthy by the
+bipartite graph matching approach have the property that they correspond
+to uploads that are either already relatively efficient in their
+utilization of space, or can be made to be so by deleting shares; and
+that place all of the shares that they generate, enabling redistribution
+of shares later without having to re-encode the file. Also, it is
+computationally reasonable to compute a maximum matching in a bipartite
+graph, and there are well-studied algorithms to do that.
+
+Issues
+======
+
+The uploader is good at detecting unhealthy upload layouts, but it
+doesn't always know how to make an unhealthy upload into a healthy
+upload if it is possible to do so; it attempts to redistribute shares to
+achieve happiness, but only in certain circumstances. The redistribution
+algorithm isn't optimal, either, so even in these cases it will not
+always find a happy layout if one can be arrived at through
+redistribution. We are investigating improvements to address these
+issues.
+
+We don't use servers-of-happiness for mutable files yet; this fix will
+likely come in Tahoe-LAFS version 1.8.
+++ /dev/null
-====================
-Servers of Happiness
-====================
-
-When you upload a file to a Tahoe-LAFS grid, you expect that it will
-stay there for a while, and that it will do so even if a few of the
-peers on the grid stop working, or if something else goes wrong. An
-upload health metric helps to make sure that this actually happens.
-An upload health metric is a test that looks at a file on a Tahoe-LAFS
-grid and says whether or not that file is healthy; that is, whether it
-is distributed on the grid in such a way as to ensure that it will
-probably survive in good enough shape to be recoverable, even if a few
-things go wrong between the time of the test and the time that it is
-recovered. Our current upload health metric for immutable files is called
-'servers-of-happiness'; its predecessor was called 'shares-of-happiness'.
-
-shares-of-happiness used the number of encoded shares generated by a
-file upload to say whether or not it was healthy. If there were more
-shares than a user-configurable threshold, the file was reported to be
-healthy; otherwise, it was reported to be unhealthy. In normal
-situations, the upload process would distribute shares fairly evenly
-over the peers in the grid, and in that case shares-of-happiness
-worked fine. However, because it only considered the number of shares,
-and not where they were on the grid, it could not detect situations
-where a file was unhealthy because most or all of the shares generated
-from the file were stored on one or two peers.
-
-servers-of-happiness addresses this by extending the share-focused
-upload health metric to also consider the location of the shares on
-grid. servers-of-happiness looks at the mapping of peers to the shares
-that they hold, and compares the cardinality of the largest happy subset
-of those to a user-configurable threshold. A happy subset of peers has
-the property that any k (where k is as in k-of-n encoding) peers within
-the subset can reconstruct the source file. This definition of file
-health provides a stronger assurance of file availability over time;
-with 3-of-10 encoding, and happy=7, a healthy file is still guaranteed
-to be available even if 4 peers fail.
-
-Measuring Servers of Happiness
-==============================
-
-We calculate servers-of-happiness by computing a matching on a
-bipartite graph that is related to the layout of shares on the grid.
-One set of vertices is the peers on the grid, and one set of vertices is
-the shares. An edge connects a peer and a share if the peer will (or
-does, for existing shares) hold the share. The size of the maximum
-matching on this graph is the size of the largest happy peer set that
-exists for the upload.
-
-First, note that a bipartite matching of size n corresponds to a happy
-subset of size n. This is because a bipartite matching of size n implies
-that there are n peers such that each peer holds a share that no other
-peer holds. Then any k of those peers collectively hold k distinct
-shares, and can restore the file.
-
-A bipartite matching of size n is not necessary for a happy subset of
-size n, however (so it is not correct to say that the size of the
-maximum matching on this graph is the size of the largest happy subset
-of peers that exists for the upload). For example, consider a file with
-k = 3, and suppose that each peer has all three of those pieces. Then,
-since any peer from the original upload can restore the file, if there
-are 10 peers holding shares, and the happiness threshold is 7, the
-upload should be declared happy, because there is a happy subset of size
-10, and 10 > 7. However, since a maximum matching on the bipartite graph
-related to this layout has only 3 edges, Tahoe-LAFS declares the upload
-unhealthy. Though it is not unhealthy, a share layout like this example
-is inefficient; for k = 3, and if there are n peers, it corresponds to
-an expansion factor of 10x. Layouts that are declared healthy by the
-bipartite graph matching approach have the property that they correspond
-to uploads that are either already relatively efficient in their
-utilization of space, or can be made to be so by deleting shares; and
-that place all of the shares that they generate, enabling redistribution
-of shares later without having to re-encode the file. Also, it is
-computationally reasonable to compute a maximum matching in a bipartite
-graph, and there are well-studied algorithms to do that.
-
-Issues
-======
-
-The uploader is good at detecting unhealthy upload layouts, but it
-doesn't always know how to make an unhealthy upload into a healthy
-upload if it is possible to do so; it attempts to redistribute shares to
-achieve happiness, but only in certain circumstances. The redistribution
-algorithm isn't optimal, either, so even in these cases it will not
-always find a happy layout if one can be arrived at through
-redistribution. We are investigating improvements to address these
-issues.
-
-We don't use servers-of-happiness for mutable files yet; this fix will
-likely come in Tahoe-LAFS version 1.8.
--- /dev/null
+==========
+Tahoe URIs
+==========
+
+1. `File URIs`_
+
+ 1. `CHK URIs`_
+ 2. `LIT URIs`_
+ 3. `Mutable File URIs`_
+
+2. `Directory URIs`_
+3. `Internal Usage of URIs`_
+
+Each file and directory in a Tahoe filesystem is described by a "URI". There
+are different kinds of URIs for different kinds of objects, and there are
+different kinds of URIs to provide different kinds of access to those
+objects. Each URI is a string representation of a "capability" or "cap", and
+there are read-caps, write-caps, verify-caps, and others.
+
+Each URI provides both ``location`` and ``identification`` properties.
+``location`` means that holding the URI is sufficient to locate the data it
+represents (this means it contains a storage index or a lookup key, whatever
+is necessary to find the place or places where the data is being kept).
+``identification`` means that the URI also serves to validate the data: an
+attacker who wants to trick you into into using the wrong data will be
+limited in their abilities by the identification properties of the URI.
+
+Some URIs are subsets of others. In particular, if you know a URI which
+allows you to modify some object, you can produce a weaker read-only URI and
+give it to someone else, and they will be able to read that object but not
+modify it. Directories, for example, have a read-cap which is derived from
+the write-cap: anyone with read/write access to the directory can produce a
+limited URI that grants read-only access, but not the other way around.
+
+src/allmydata/uri.py is the main place where URIs are processed. It is
+the authoritative definition point for all the the URI types described
+herein.
+
+File URIs
+=========
+
+The lowest layer of the Tahoe architecture (the "grid") is reponsible for
+mapping URIs to data. This is basically a distributed hash table, in which
+the URI is the key, and some sequence of bytes is the value.
+
+There are two kinds of entries in this table: immutable and mutable. For
+immutable entries, the URI represents a fixed chunk of data. The URI itself
+is derived from the data when it is uploaded into the grid, and can be used
+to locate and download that data from the grid at some time in the future.
+
+For mutable entries, the URI identifies a "slot" or "container", which can be
+filled with different pieces of data at different times.
+
+It is important to note that the "files" described by these URIs are just a
+bunch of bytes, and that **no** filenames or other metadata is retained at
+this layer. The vdrive layer (which sits above the grid layer) is entirely
+responsible for directories and filenames and the like.
+
+CHK URIs
+--------
+
+CHK (Content Hash Keyed) files are immutable sequences of bytes. They are
+uploaded in a distributed fashion using a "storage index" (for the "location"
+property), and encrypted using a "read key". A secure hash of the data is
+computed to help validate the data afterwards (providing the "identification"
+property). All of these pieces, plus information about the file's size and
+the number of shares into which it has been distributed, are put into the
+"CHK" uri. The storage index is derived by hashing the read key (using a
+tagged SHA-256d hash, then truncated to 128 bits), so it does not need to be
+physically present in the URI.
+
+The current format for CHK URIs is the concatenation of the following
+strings::
+
+ URI:CHK:(key):(hash):(needed-shares):(total-shares):(size)
+
+Where (key) is the base32 encoding of the 16-byte AES read key, (hash) is the
+base32 encoding of the SHA-256 hash of the URI Extension Block,
+(needed-shares) is an ascii decimal representation of the number of shares
+required to reconstruct this file, (total-shares) is the same representation
+of the total number of shares created, and (size) is an ascii decimal
+representation of the size of the data represented by this URI. All base32
+encodings are expressed in lower-case, with the trailing '=' signs removed.
+
+For example, the following is a CHK URI, generated from the contents of the
+architecture.txt document that lives next to this one in the source tree::
+
+ URI:CHK:ihrbeov7lbvoduupd4qblysj7a:bg5agsdt62jb34hxvxmdsbza6do64f4fg5anxxod2buttbo6udzq:3:10:28733
+
+Historical note: The name "CHK" is somewhat inaccurate and continues to be
+used for historical reasons. "Content Hash Key" means that the encryption key
+is derived by hashing the contents, which gives the useful property that
+encoding the same file twice will result in the same URI. However, this is an
+optional step: by passing a different flag to the appropriate API call, Tahoe
+will generate a random encryption key instead of hashing the file: this gives
+the useful property that the URI or storage index does not reveal anything
+about the file's contents (except filesize), which improves privacy. The
+URI:CHK: prefix really indicates that an immutable file is in use, without
+saying anything about how the key was derived.
+
+LIT URIs
+--------
+
+LITeral files are also an immutable sequence of bytes, but they are so short
+that the data is stored inside the URI itself. These are used for files of 55
+bytes or shorter, which is the point at which the LIT URI is the same length
+as a CHK URI would be.
+
+LIT URIs do not require an upload or download phase, as their data is stored
+directly in the URI.
+
+The format of a LIT URI is simply a fixed prefix concatenated with the base32
+encoding of the file's data::
+
+ URI:LIT:bjuw4y3movsgkidbnrwg26lemf2gcl3xmvrc6kropbuhi3lmbi
+
+The LIT URI for an empty file is "URI:LIT:", and the LIT URI for a 5-byte
+file that contains the string "hello" is "URI:LIT:nbswy3dp".
+
+Mutable File URIs
+-----------------
+
+The other kind of DHT entry is the "mutable slot", in which the URI names a
+container to which data can be placed and retrieved without changing the
+identity of the container.
+
+These slots have write-caps (which allow read/write access), read-caps (which
+only allow read-access), and verify-caps (which allow a file checker/repairer
+to confirm that the contents exist, but does not let it decrypt the
+contents).
+
+Mutable slots use public key technology to provide data integrity, and put a
+hash of the public key in the URI. As a result, the data validation is
+limited to confirming that the data retrieved matches *some* data that was
+uploaded in the past, but not _which_ version of that data.
+
+The format of the write-cap for mutable files is::
+
+ URI:SSK:(writekey):(fingerprint)
+
+Where (writekey) is the base32 encoding of the 16-byte AES encryption key
+that is used to encrypt the RSA private key, and (fingerprint) is the base32
+encoded 32-byte SHA-256 hash of the RSA public key. For more details about
+the way these keys are used, please see docs/mutable.txt .
+
+The format for mutable read-caps is::
+
+ URI:SSK-RO:(readkey):(fingerprint)
+
+The read-cap is just like the write-cap except it contains the other AES
+encryption key: the one used for encrypting the mutable file's contents. This
+second key is derived by hashing the writekey, which allows the holder of a
+write-cap to produce a read-cap, but not the other way around. The
+fingerprint is the same in both caps.
+
+Historical note: the "SSK" prefix is a perhaps-inaccurate reference to
+"Sub-Space Keys" from the Freenet project, which uses a vaguely similar
+structure to provide mutable file access.
+
+Directory URIs
+==============
+
+The grid layer provides a mapping from URI to data. To turn this into a graph
+of directories and files, the "vdrive" layer (which sits on top of the grid
+layer) needs to keep track of "directory nodes", or "dirnodes" for short.
+docs/dirnodes.txt describes how these work.
+
+Dirnodes are contained inside mutable files, and are thus simply a particular
+way to interpret the contents of these files. As a result, a directory
+write-cap looks a lot like a mutable-file write-cap::
+
+ URI:DIR2:(writekey):(fingerprint)
+
+Likewise directory read-caps (which provide read-only access to the
+directory) look much like mutable-file read-caps::
+
+ URI:DIR2-RO:(readkey):(fingerprint)
+
+Historical note: the "DIR2" prefix is used because the non-distributed
+dirnodes in earlier Tahoe releases had already claimed the "DIR" prefix.
+
+Internal Usage of URIs
+======================
+
+The classes in source:src/allmydata/uri.py are used to pack and unpack these
+various kinds of URIs. Three Interfaces are defined (IURI, IFileURI, and
+IDirnodeURI) which are implemented by these classes, and string-to-URI-class
+conversion routines have been registered as adapters, so that code which
+wants to extract e.g. the size of a CHK or LIT uri can do::
+
+ print IFileURI(uri).get_size()
+
+If the URI does not represent a CHK or LIT uri (for example, if it was for a
+directory instead), the adaptation will fail, raising a TypeError inside the
+IFileURI() call.
+
+Several utility methods are provided on these objects. The most important is
+``to_string()``, which returns the string form of the URI. Therefore
+``IURI(uri).to_string == uri`` is true for any valid URI. See the IURI class
+in source:src/allmydata/interfaces.py for more details.
+
+++ /dev/null
-==========
-Tahoe URIs
-==========
-
-1. `File URIs`_
-
- 1. `CHK URIs`_
- 2. `LIT URIs`_
- 3. `Mutable File URIs`_
-
-2. `Directory URIs`_
-3. `Internal Usage of URIs`_
-
-Each file and directory in a Tahoe filesystem is described by a "URI". There
-are different kinds of URIs for different kinds of objects, and there are
-different kinds of URIs to provide different kinds of access to those
-objects. Each URI is a string representation of a "capability" or "cap", and
-there are read-caps, write-caps, verify-caps, and others.
-
-Each URI provides both ``location`` and ``identification`` properties.
-``location`` means that holding the URI is sufficient to locate the data it
-represents (this means it contains a storage index or a lookup key, whatever
-is necessary to find the place or places where the data is being kept).
-``identification`` means that the URI also serves to validate the data: an
-attacker who wants to trick you into into using the wrong data will be
-limited in their abilities by the identification properties of the URI.
-
-Some URIs are subsets of others. In particular, if you know a URI which
-allows you to modify some object, you can produce a weaker read-only URI and
-give it to someone else, and they will be able to read that object but not
-modify it. Directories, for example, have a read-cap which is derived from
-the write-cap: anyone with read/write access to the directory can produce a
-limited URI that grants read-only access, but not the other way around.
-
-src/allmydata/uri.py is the main place where URIs are processed. It is
-the authoritative definition point for all the the URI types described
-herein.
-
-File URIs
-=========
-
-The lowest layer of the Tahoe architecture (the "grid") is reponsible for
-mapping URIs to data. This is basically a distributed hash table, in which
-the URI is the key, and some sequence of bytes is the value.
-
-There are two kinds of entries in this table: immutable and mutable. For
-immutable entries, the URI represents a fixed chunk of data. The URI itself
-is derived from the data when it is uploaded into the grid, and can be used
-to locate and download that data from the grid at some time in the future.
-
-For mutable entries, the URI identifies a "slot" or "container", which can be
-filled with different pieces of data at different times.
-
-It is important to note that the "files" described by these URIs are just a
-bunch of bytes, and that **no** filenames or other metadata is retained at
-this layer. The vdrive layer (which sits above the grid layer) is entirely
-responsible for directories and filenames and the like.
-
-CHK URIs
---------
-
-CHK (Content Hash Keyed) files are immutable sequences of bytes. They are
-uploaded in a distributed fashion using a "storage index" (for the "location"
-property), and encrypted using a "read key". A secure hash of the data is
-computed to help validate the data afterwards (providing the "identification"
-property). All of these pieces, plus information about the file's size and
-the number of shares into which it has been distributed, are put into the
-"CHK" uri. The storage index is derived by hashing the read key (using a
-tagged SHA-256d hash, then truncated to 128 bits), so it does not need to be
-physically present in the URI.
-
-The current format for CHK URIs is the concatenation of the following
-strings::
-
- URI:CHK:(key):(hash):(needed-shares):(total-shares):(size)
-
-Where (key) is the base32 encoding of the 16-byte AES read key, (hash) is the
-base32 encoding of the SHA-256 hash of the URI Extension Block,
-(needed-shares) is an ascii decimal representation of the number of shares
-required to reconstruct this file, (total-shares) is the same representation
-of the total number of shares created, and (size) is an ascii decimal
-representation of the size of the data represented by this URI. All base32
-encodings are expressed in lower-case, with the trailing '=' signs removed.
-
-For example, the following is a CHK URI, generated from the contents of the
-architecture.txt document that lives next to this one in the source tree::
-
- URI:CHK:ihrbeov7lbvoduupd4qblysj7a:bg5agsdt62jb34hxvxmdsbza6do64f4fg5anxxod2buttbo6udzq:3:10:28733
-
-Historical note: The name "CHK" is somewhat inaccurate and continues to be
-used for historical reasons. "Content Hash Key" means that the encryption key
-is derived by hashing the contents, which gives the useful property that
-encoding the same file twice will result in the same URI. However, this is an
-optional step: by passing a different flag to the appropriate API call, Tahoe
-will generate a random encryption key instead of hashing the file: this gives
-the useful property that the URI or storage index does not reveal anything
-about the file's contents (except filesize), which improves privacy. The
-URI:CHK: prefix really indicates that an immutable file is in use, without
-saying anything about how the key was derived.
-
-LIT URIs
---------
-
-LITeral files are also an immutable sequence of bytes, but they are so short
-that the data is stored inside the URI itself. These are used for files of 55
-bytes or shorter, which is the point at which the LIT URI is the same length
-as a CHK URI would be.
-
-LIT URIs do not require an upload or download phase, as their data is stored
-directly in the URI.
-
-The format of a LIT URI is simply a fixed prefix concatenated with the base32
-encoding of the file's data::
-
- URI:LIT:bjuw4y3movsgkidbnrwg26lemf2gcl3xmvrc6kropbuhi3lmbi
-
-The LIT URI for an empty file is "URI:LIT:", and the LIT URI for a 5-byte
-file that contains the string "hello" is "URI:LIT:nbswy3dp".
-
-Mutable File URIs
------------------
-
-The other kind of DHT entry is the "mutable slot", in which the URI names a
-container to which data can be placed and retrieved without changing the
-identity of the container.
-
-These slots have write-caps (which allow read/write access), read-caps (which
-only allow read-access), and verify-caps (which allow a file checker/repairer
-to confirm that the contents exist, but does not let it decrypt the
-contents).
-
-Mutable slots use public key technology to provide data integrity, and put a
-hash of the public key in the URI. As a result, the data validation is
-limited to confirming that the data retrieved matches *some* data that was
-uploaded in the past, but not _which_ version of that data.
-
-The format of the write-cap for mutable files is::
-
- URI:SSK:(writekey):(fingerprint)
-
-Where (writekey) is the base32 encoding of the 16-byte AES encryption key
-that is used to encrypt the RSA private key, and (fingerprint) is the base32
-encoded 32-byte SHA-256 hash of the RSA public key. For more details about
-the way these keys are used, please see docs/mutable.txt .
-
-The format for mutable read-caps is::
-
- URI:SSK-RO:(readkey):(fingerprint)
-
-The read-cap is just like the write-cap except it contains the other AES
-encryption key: the one used for encrypting the mutable file's contents. This
-second key is derived by hashing the writekey, which allows the holder of a
-write-cap to produce a read-cap, but not the other way around. The
-fingerprint is the same in both caps.
-
-Historical note: the "SSK" prefix is a perhaps-inaccurate reference to
-"Sub-Space Keys" from the Freenet project, which uses a vaguely similar
-structure to provide mutable file access.
-
-Directory URIs
-==============
-
-The grid layer provides a mapping from URI to data. To turn this into a graph
-of directories and files, the "vdrive" layer (which sits on top of the grid
-layer) needs to keep track of "directory nodes", or "dirnodes" for short.
-docs/dirnodes.txt describes how these work.
-
-Dirnodes are contained inside mutable files, and are thus simply a particular
-way to interpret the contents of these files. As a result, a directory
-write-cap looks a lot like a mutable-file write-cap::
-
- URI:DIR2:(writekey):(fingerprint)
-
-Likewise directory read-caps (which provide read-only access to the
-directory) look much like mutable-file read-caps::
-
- URI:DIR2-RO:(readkey):(fingerprint)
-
-Historical note: the "DIR2" prefix is used because the non-distributed
-dirnodes in earlier Tahoe releases had already claimed the "DIR" prefix.
-
-Internal Usage of URIs
-======================
-
-The classes in source:src/allmydata/uri.py are used to pack and unpack these
-various kinds of URIs. Three Interfaces are defined (IURI, IFileURI, and
-IDirnodeURI) which are implemented by these classes, and string-to-URI-class
-conversion routines have been registered as adapters, so that code which
-wants to extract e.g. the size of a CHK or LIT uri can do::
-
- print IFileURI(uri).get_size()
-
-If the URI does not represent a CHK or LIT uri (for example, if it was for a
-directory instead), the adaptation will fail, raising a TypeError inside the
-IFileURI() call.
-
-Several utility methods are provided on these objects. The most important is
-``to_string()``, which returns the string form of the URI. Therefore
-``IURI(uri).to_string == uri`` is true for any valid URI. See the IURI class
-in source:src/allmydata/interfaces.py for more details.
-