uploaded recently.
This database lives in ``~/.tahoe/private/backupdb.sqlite``, and is a SQLite
-single-file database. It is used by the "tahoe backup" command. In the
-future, it will also be used by "tahoe mirror", and by "tahoe cp" when the
-``--use-backupdb`` option is included.
+single-file database. It is used by the "``tahoe backup``" command. In the
+future, it may optionally be used by other commands such as "``tahoe cp``".
The purpose of this database is twofold: to manage the file-to-cap
translation (the "upload" step) and the directory-to-cap translation (the
The overall goal of optimizing backup is to reduce the work required when the
source disk has not changed (much) since the last backup. In the ideal case,
-running "tahoe backup" twice in a row, with no intervening changes to the
+running "``tahoe backup``" twice in a row, with no intervening changes to the
disk, will not require any network traffic. Minimal changes to the source
disk should result in minimal traffic.
subsequent backup operation may use more effort (network bandwidth, CPU
cycles, and disk IO) than it would have without the backupdb.
-The database uses sqlite3, which is included as part of the standard python
-library with python2.5 and later. For python2.4, Tahoe will try to install the
+The database uses sqlite3, which is included as part of the standard Python
+library with Python 2.5 and later. For Python 2.4, Tahoe will try to install the
"pysqlite" package at build-time, but this will succeed only if sqlite3 with
development headers is already installed. On Debian and Debian derivatives
you can install the "python-pysqlite2" package (which, despite the name,
-actually provides sqlite3 rather than sqlite2), but on old distributions such
+actually provides sqlite3 rather than sqlite2). On old distributions such
as Debian etch (4.0 "oldstable") or Ubuntu Edgy (6.10) the "python-pysqlite2"
package won't work, but the "sqlite3-dev" package will.
Upload Operation
================
-The upload process starts with a pathname (like ~/.emacs) and wants to end up
-with a file-cap (like URI:CHK:...).
+The upload process starts with a pathname (like ``~/.emacs``) and wants to end up
+with a file-cap (like ``URI:CHK:...``).
The first step is to convert the path to an absolute form
-(/home/warner/.emacs) and do a lookup in the local_files table. If the path
+(``/home/warner/.emacs``) and do a lookup in the local_files table. If the path
is not present in this table, the file must be uploaded. The upload process
is:
Relying upon timestamps is a compromise between efficiency and safety: a file
which is modified without changing the timestamp or size will be treated as
-unmodified, and the "tahoe backup" command will not copy the new contents
-into the grid. The ``--no-timestamps`` can be used to disable this
+unmodified, and the "``tahoe backup``" command will not copy the new contents
+into the grid. The ``--no-timestamps`` option can be used to disable this
optimization, forcing every byte of the file to be hashed and encoded.
Directory Operations
directory node with the same contents. The contents are hashed, and the hash
is queried in the 'directories' table. If found, the last-checked timestamp
is used to perform the same random-early-check algorithm described for files
-above, but no new upload is performed. Since "tahoe backup" creates immutable
+above, but no new upload is performed. Since "``tahoe backup``" creates immutable
directories, it is perfectly safe to re-use a directory from a previous
backup.
-If not found, the webapi "mkdir-immutable" operation is used to create a new
+If not found, the web-API "mkdir-immutable" operation is used to create a new
directory, and an entry is stored in the table.
The comparison operation ignores timestamps and metadata, and pays attention
solely to the file names and contents.
-By using a directory-contents hash, the "tahoe backup" command is able to
+By using a directory-contents hash, the "``tahoe backup``" command is able to
re-use directories from other places in the backed up data, or from old
backups. This means that renaming a directory and moving a subdirectory to a
new parent both count as "minor changes" and will result in minimal Tahoe
The best case is a null backup, in which nothing has changed. This will
result in minimal network bandwidth: one directory read and two modifies. The
-Archives/ directory must be read to locate the latest backup, and must be
-modified to add a new snapshot, and the Latest/ directory will be updated to
+``Archives/`` directory must be read to locate the latest backup, and must be
+modified to add a new snapshot, and the ``Latest/`` directory will be updated to
point to that same snapshot.
tub.port = 8098
tub.location = external-firewall.example.com:7912
- * Run a node behind a Tor proxy (perhaps via torsocks), in client-only
+ * Run a node behind a Tor proxy (perhaps via ``torsocks``), in client-only
mode (i.e. we can make outbound connections, but other nodes will not
be able to connect to us). The literal '``unreachable.example.org``' will
not resolve, but will serve as a reminder to human observers that this
a "log gatherer", which will be granted access to the logport. This can
be used by centralized storage grids to gather operational logs in a
single place. Note that when an old-style ``BASEDIR/log_gatherer.furl`` file
- exists (see 'Backwards Compatibility Files', below), both are used. (For
+ exists (see `Backwards Compatibility Files`_, below), both are used. (For
most other items, the separate config file overrides the entry in
``tahoe.cfg``.)
each connection to another node, if nothing has been heard for a while,
we will drop the connection. The duration of silence that passes before
dropping the connection will be between DT-2*KT and 2*DT+2*KT (please see
- ticket #521 for more details). If we are sending a large amount of data
+ ticket `#521`_ for more details). If we are sending a large amount of data
to the other end (which takes more than DT-2*KT to deliver), we might
incorrectly drop the connection. The default behavior (when this value is
not provided) is to disable the disconnect timer.
- See ticket #521 for a discussion of how to pick these timeout values.
+ See ticket `#521`_ for a discussion of how to pick these timeout values.
Using 30 minutes means we'll disconnect after 22 to 68 minutes of
inactivity. Receiving data will reset this timeout, however if we have
more than 22min of data in the outbound queue (such as 800kB in two
contact us, our ping might be delayed, so we may disconnect them by
accident.
+ .. _`#521`: http://tahoe-lafs.org/trac/tahoe-lafs/ticket/521
+
``ssh.port = (strports string, optional)``
``ssh.authorized_keys_file = (filename, optional)``
``tempdir = (string, optional)``
- This specifies a temporary directory for the webapi server to use, for
- holding large files while they are being uploaded. If a webapi client
+ This specifies a temporary directory for the web-API server to use, for
+ holding large files while they are being uploaded. If a web-API client
attempts to upload a 10GB file, this tempdir will need to have at least
10GB available for the upload to complete.
The Introducer node maintains some different state than regular client nodes.
-``BASEDIR/introducer.furl`` : This is generated the first time the introducer
-node is started, and used again on subsequent runs, to give the introduction
-service a persistent long-term identity. This file should be published and
-copied into new client nodes before they are started for the first time.
+``BASEDIR/introducer.furl``
+ This is generated the first time the introducer node is started, and used
+ again on subsequent runs, to give the introduction service a persistent
+ long-term identity. This file should be published and copied into new client
+ nodes before they are started for the first time.
Other Files in BASEDIR
ssh.port = 8022
ssh.authorized_keys_file = ~/.ssh/authorized_keys
+
[client]
introducer.furl = pb://ok45ssoklj4y7eok5c3xkmj@tahoe.example:44801/ii3uumo
helper.furl = pb://ggti5ssoklj4y7eok5c3xkmj@helper.tahoe.example:7054/kk8lhr
+
[storage]
enabled = True
readonly_storage = True
sizelimit = 10000000000
+
[helper]
run_helper = True
Overview
========
-One convenient way to install Tahoe-LAFS is with debian packages.
+One convenient way to install Tahoe-LAFS is with Debian packages.
This document attempts to explain how to complete a desert island build for
people in a hurry. It also attempts to explain more about our Debian packaging
for those willing to read beyond the simple pragmatic packaging exercises.
==============================================
There are only four supporting packages that are currently not available from
-the debian apt repositories in Debian Lenny::
+the Debian apt repositories in Debian Lenny::
python-foolscap python-zfec argparse zbase32
sudo dpkg -i ../allmydata-tahoe_1.6.1-r4262_all.deb
You should now have a functional desert island build of Tahoe with all of the
-supported libraries as .deb packages. You'll need to edit the Debian specific
-/etc/defaults/allmydata-tahoe file to get Tahoe started. Data is by default
-stored in /var/lib/tahoelafsd/ and Tahoe runs as the 'tahoelafsd' user.
+supported libraries as .deb packages. You'll need to edit the Debian-specific
+``/etc/defaults/allmydata-tahoe`` file to get Tahoe started. Data is by default
+stored in ``/var/lib/tahoelafsd/`` and Tahoe runs as the 'tahoelafsd' user.
Building Debian Packages
========================
-The Tahoe source tree comes with limited support for building debian packages
+The Tahoe source tree comes with limited support for building Debian packages
on a variety of Debian and Ubuntu platforms. For each supported platform,
there is a "deb-PLATFORM-head" target in the Makefile that will produce a
-debian package from a darcs checkout, using a version number that is derived
+Debian package from a darcs checkout, using a version number that is derived
from the most recent darcs tag, plus the total number of revisions present in
the tree (e.g. "1.1-r2678").
-To create debian packages from a Tahoe tree, you will need some additional
+To create Debian packages from a Tahoe tree, you will need some additional
tools installed. The canonical list of these packages is in the
-"Build-Depends" clause of misc/sid/debian/control , and includes::
+"Build-Depends" clause of ``misc/sid/debian/control``, and includes::
build-essential
debhelper
python-twisted-core
In addition, to use the "deb-$PLATFORM-head" target, you will also need the
-"debchange" utility from the "devscripts" package, and the "fakeroot" package.
+"``debchange``" utility from the "devscripts" package, and the "fakeroot" package.
Some recent platforms can be handled by using the targets for the previous
release, for example if there is no "deb-hardy-head" target, try building
"deb-gutsy-head" and see if the resulting package will work.
-Note that we haven't tried to build source packages (.orig.tar.gz + dsc) yet,
+Note that we haven't tried to build source packages (``.orig.tar.gz`` + dsc) yet,
and there are no such source packages in our APT repository.
Using Pre-Built Debian Packages
The ``tahoe-lafs.org`` APT repository also includes Debian packages of support
libraries, like Foolscap, zfec, pycryptopp, and everything else you need that
-isn't already in debian.
+isn't already in Debian.
Building From Source on Debian Systems
======================================
Many of Tahoe's build dependencies can be satisfied by first installing
-certain debian packages: simplejson is one of these. Some debian/ubuntu
-platforms do not provide the necessary .egg-info metadata with their
+certain Debian packages: simplejson is one of these. Some Debian/Ubuntu
+platforms do not provide the necessary ``.egg-info`` metadata with their
packages, so the Tahoe build process may not believe they are present. Some
-Tahoe dependencies are not present in most debian systems (such as foolscap
+Tahoe dependencies are not present in most Debian systems (such as foolscap
and zfec): debs for these are made available in the APT repository described
above.
environment).
We have observed occasional problems with this acquisition process. In some
-cases, setuptools will only be half-aware of an installed debian package,
+cases, setuptools will only be half-aware of an installed Debian package,
just enough to interfere with the automatic download+build of the dependency.
-For example, on some platforms, if Nevow-0.9.26 is installed via a debian
+For example, on some platforms, if Nevow-0.9.26 is installed via a Debian
package, setuptools will believe that it must download Nevow anyways, but it
will insist upon downloading that specific 0.9.26 version. Since the current
release of Nevow is 0.9.31, and 0.9.26 is no longer available for download,
If "dir_index" is present in the "features:" line, then you're all set. If
not, you'll need to use tune2fs and e2fsck to enable and build the index. See
-<http://wiki.dovecot.org/MailboxFormat/Maildir> for some hints.
+`<http://wiki.dovecot.org/MailboxFormat/Maildir>`_ for some hints.
"key-generation" service, which allows a client to offload their RSA key
generation to a separate process. Since RSA key generation takes several
seconds, and must be done each time a directory is created, moving it to a
-separate process allows the first process (perhaps a busy webapi server) to
+separate process allows the first process (perhaps a busy web-API server) to
continue servicing other requests. The key generator exports a FURL that can
be copied into a node to enable this functionality.
"``tahoe stop [NODEDIR]``" will shut down a running node.
-"``tahoe restart [NODEDIR]``" will stop and then restart a running node. This is
-most often used by developers who have just modified the code and want to
+"``tahoe restart [NODEDIR]``" will stop and then restart a running node. This
+is most often used by developers who have just modified the code and want to
start using their changes.
These commands let you exmaine a Tahoe-LAFS filesystem, providing basic
list/upload/download/delete/rename/mkdir functionality. They can be used as
primitives by other scripts. Most of these commands are fairly thin wrappers
-around webapi calls, which are described in `<webapi.rst>`_.
+around web-API calls, which are described in `<webapi.rst>`_.
-By default, all filesystem-manipulation commands look in ``~/.tahoe/`` to figure
-out which Tahoe-LAFS node they should use. When the CLI command makes webapi
-calls, it will use ``~/.tahoe/node.url`` for this purpose: a running Tahoe-LAFS
-node that provides a webapi port will write its URL into this file. If you want
-to use a node on some other host, just create ``~/.tahoe/`` and copy that node's
-webapi URL into this file, and the CLI commands will contact that node instead
-of a local one.
+By default, all filesystem-manipulation commands look in ``~/.tahoe/`` to
+figure out which Tahoe-LAFS node they should use. When the CLI command makes
+web-API calls, it will use ``~/.tahoe/node.url`` for this purpose: a running
+Tahoe-LAFS node that provides a web-API port will write its URL into this
+file. If you want to use a node on some other host, just create ``~/.tahoe/``
+and copy that node's web-API URL into this file, and the CLI commands will
+contact that node instead of a local one.
These commands also use a table of "aliases" to figure out which directory
they ought to use a starting point. This is explained in more detail below.
* ``[SUBDIRS/]FILENAME`` for a path relative to the default ``tahoe:`` alias;
* ``ALIAS:[SUBDIRS/]FILENAME`` for a path relative to another alias;
-* ``DIRCAP/[SUBDIRS/]FILENAME`` or ``DIRCAP:./[SUBDIRS/]FILENAME`` for a path relative to a directory cap.
+* ``DIRCAP/[SUBDIRS/]FILENAME`` or ``DIRCAP:./[SUBDIRS/]FILENAME`` for a
+ path relative to a directory cap.
Command Examples
clients (like /usr/bin/ftp, ncftp, and countless others) to access the
virtual filesystem. They can also run an SFTP server, so SFTP clients (like
/usr/bin/sftp, the sshfs FUSE plugin, and others) can too. These frontends
-sit at the same level as the webapi interface.
+sit at the same level as the web-API interface.
Since Tahoe-LAFS does not use user accounts or passwords, the FTP/SFTP servers
must be configured with a way to first authenticate a user (confirm that a
==============================
Downloads are triggered by read() calls, each with a starting offset (defaults
-to 0) and a length (defaults to the whole file). A regular webapi GET request
+to 0) and a length (defaults to the whole file). A regular web-API GET request
will result in a whole-file read() call.
Each read() call turns into an ordered sequence of get_segment() calls. A
When an error occurs, the HTTP response code will be set to an appropriate
400-series code (like 404 Not Found for an unknown childname, or 400 Bad Request
-when the parameters to a webapi operation are invalid), and the HTTP response
+when the parameters to a web-API operation are invalid), and the HTTP response
body will usually contain a few lines of explanation as to the cause of the
error and possible responses. Unusual exceptions may result in a 500 Internal
Server Error as a catch-all, with a default response body containing
representable as such.
All Tahoe operations that refer to existing files or directories must include
-a suitable read- or write- cap in the URL: the webapi server won't add one
+a suitable read- or write- cap in the URL: the web-API server won't add one
for you. If you don't know the cap, you can't access the file. This allows
-the security properties of Tahoe caps to be extended across the webapi
+the security properties of Tahoe caps to be extended across the web-API
interface.
Slow Operations, Progress, and Cancelling
}
For forward-compatibility, a mutable directory can also contain caps in
- a format that is unknown to the webapi server. When such caps are retrieved
+ a format that is unknown to the web-API server. When such caps are retrieved
from a mutable directory in a "ro_uri" field, they will be prefixed with
the string "ro.", indicating that they must not be decoded without
checking that they are read-only. The "ro." prefix must not be stripped
- off without performing this check. (Future versions of the webapi server
+ off without performing this check. (Future versions of the web-API server
will perform it where necessary.)
If both the "rw_uri" and "ro_uri" fields are present in a given PROPDICT,
- and the webapi server recognizes the rw_uri as a write cap, then it will
+ and the web-API server recognizes the rw_uri as a write cap, then it will
reset the ro_uri to the corresponding read cap and discard the original
contents of ro_uri (in order to ensure that the two caps correspond to the
same object and that the ro_uri is in fact read-only). However this may not
- happen for caps in a format unknown to the webapi server. Therefore, when
- writing a directory the webapi client should ensure that the contents
+ happen for caps in a format unknown to the web-API server. Therefore, when
+ writing a directory the web-API client should ensure that the contents
of "rw_uri" and "ro_uri" for a given PROPDICT are a consistent
- (write cap, read cap) pair if possible. If the webapi client only has
+ (write cap, read cap) pair if possible. If the web-API client only has
one cap and does not know whether it is a write cap or read cap, then
it is acceptable to set "rw_uri" to that cap and omit "ro_uri". The
client must not put a write cap into a "ro_uri" field.
Also, if the "no-write" field is set to true in the metadata of a link to
a mutable child, it will cause the link to be diminished to read-only.
- Note that the webapi-using client application must not provide the
+ Note that the web-API-using client application must not provide the
"Content-Type: multipart/form-data" header that usually accompanies HTML
form submissions, since the body is not formatted this way. Doing so will
cause a server error as the lower-level code misparses the request body.
immutable files, literal files, and deep-immutable directories.
For forward-compatibility, a deep-immutable directory can also contain caps
- in a format that is unknown to the webapi server. When such caps are retrieved
+ in a format that is unknown to the web-API server. When such caps are retrieved
from a deep-immutable directory in a "ro_uri" field, they will be prefixed
with the string "imm.", indicating that they must not be decoded without
checking that they are immutable. The "imm." prefix must not be stripped
- off without performing this check. (Future versions of the webapi server
+ off without performing this check. (Future versions of the web-API server
will perform it where necessary.)
The cap for each child may be given either in the "rw_uri" or "ro_uri"
field of the PROPDICT (not both). If a cap is given in the "rw_uri" field,
- then the webapi server will check that it is an immutable read-cap of a
+ then the web-API server will check that it is an immutable read-cap of a
*known* format, and give an error if it is not. If a cap is given in the
- "ro_uri" field, then the webapi server will still check whether known
+ "ro_uri" field, then the web-API server will still check whether known
caps are immutable, but for unknown caps it will simply assume that the
cap can be stored, as described above. Note that an attacker would be
able to store any cap in an immutable directory, so this check when
instead of the 'tahoe':'linkmotime' and 'tahoe':'linkcrtime' keys. Starting
in Tahoe v1.4.0, the 'linkmotime'/'linkcrtime' keys in the 'tahoe' sub-dict
are populated. However, prior to Tahoe v1.7beta, a bug caused the 'tahoe'
-sub-dict to be deleted by webapi requests in which new metadata is
+sub-dict to be deleted by web-API requests in which new metadata is
specified, and not to be added to existing child links that lack it.
From Tahoe v1.7.0 onward, the 'mtime' and 'ctime' fields are no longer
Note that this operation does not take its child cap in the form of
separate "rw_uri" and "ro_uri" fields. Therefore, it cannot accept a
- child cap in a format unknown to the webapi server, unless its URI
+ child cap in a format unknown to the web-API server, unless its URI
starts with "ro." or "imm.". This restriction is necessary because the
server is not able to attenuate an unknown write cap to a read cap.
Unknown URIs starting with "ro." or "imm.", on the other hand, are
directory, with a specified child name. This behaves much like the PUT t=uri
operation, and is a lot like a UNIX hardlink. It is subject to the same
restrictions as that operation on the use of cap formats unknown to the
- webapi server.
+ web-API server.
This will create additional intermediate directories as necessary, although
since it is expected to be triggered by a form that was retrieved by "GET
Static Files in /public_html
============================
-The webapi server will take any request for a URL that starts with /static
+The web-API server will take any request for a URL that starts with /static
and serve it from a configurable directory which defaults to
$BASEDIR/public_html . This is configured by setting the "[node]web.static"
value in $BASEDIR/tahoe.cfg . If this is left at the default value of
served with the contents of the file $BASEDIR/public_html/subdir/foo.html .
This can be useful to serve a javascript application which provides a
-prettier front-end to the rest of the Tahoe webapi.
+prettier front-end to the rest of the Tahoe web-API.
Safety and security issues -- names vs. URIs
The read and write caps in a given directory node are separate URIs, and
can't be assumed to point to the same object even if they were retrieved in
-the same operation (although the webapi server attempts to ensure this
+the same operation (although the web-API server attempts to ensure this
in most cases). If you need to rely on that property, you should explicitly
verify it. More generally, you should not make assumptions about the
internal consistency of the contents of mutable directories. As a result
Tahoe nodes implement internal serialization to make sure that a single Tahoe
node cannot conflict with itself. For example, it is safe to issue two
-directory modification requests to a single tahoe node's webapi server at the
+directory modification requests to a single tahoe node's web-API server at the
same time, because the Tahoe node will internally delay one of them until
after the other has finished being applied. (This feature was introduced in
Tahoe-1.1; back with Tahoe-1.0 the web client was responsible for serializing
There are several tradeoffs to be considered when choosing the renewal timer
and the lease duration, and there is no single optimal pair of values. See
-the "lease-tradeoffs.svg" diagram to get an idea for the tradeoffs involved.
+the `<lease-tradeoffs.svg>`_ diagram to get an idea for the tradeoffs involved.
If lease renewal occurs quickly and with 100% reliability, than any renewal
time that is shorter than the lease duration will suffice, but a larger ratio
of duration-over-renewal-time will be more robust in the face of occasional
If all of the files and directories which you care about are reachable from a
single starting point (usually referred to as a "rootcap"), and you store
-that rootcap as an alias (via "tahoe create-alias"), then the simplest way to
-renew these leases is with the following CLI command::
+that rootcap as an alias (via "``tahoe create-alias``" for example), then the
+simplest way to renew these leases is with the following CLI command::
tahoe deep-check --add-lease ALIAS:
This will recursively walk every directory under the given alias and renew
the leases on all files and directories. (You may want to add a ``--repair``
-flag to perform repair at the same time). Simply run this command once a week
+flag to perform repair at the same time.) Simply run this command once a week
(or whatever other renewal period your grid recommends) and make sure it
completes successfully. As a side effect, a manifest of all unique files and
directories will be emitted to stdout, as well as a summary of file sizes and
Expiration must be explicitly enabled on each storage server, since the
default behavior is to never expire shares. Expiration is enabled by adding
-config keys to the "[storage]" section of the tahoe.cfg file (as described
+config keys to the ``[storage]`` section of the ``tahoe.cfg`` file (as described
below) and restarting the server node.
Each lease has two parameters: a create/renew timestamp and a duration. The
seconds after the $create_renew timestamp. (In a future release of Tahoe, the
client will get to request a specific duration, and the server will accept or
reject the request depending upon its local configuration, so that servers
-can achieve better control over their storage obligations).
+can achieve better control over their storage obligations.)
The lease-expiration code has two modes of operation. The first is age-based:
leases are expired when their age is greater than their duration. This is the
collected in a timely fashion.
Since there is not yet a way for clients to request a lease duration of other
-than 31 days, there is a tahoe.cfg setting to override the duration of all
+than 31 days, there is a ``tahoe.cfg`` setting to override the duration of all
leases. If, for example, this alternative duration is set to 60 days, then
clients could safely renew their leases with an add-lease operation perhaps
once every 50 days: even though nominally their leases would expire 31 days
expired whatever it is going to expire, the second and subsequent passes are
not going to find any new leases to remove.
-The tahoe.cfg file uses the following keys to control lease expiration::
+The ``tahoe.cfg`` file uses the following keys to control lease expiration:
- [storage]
+``[storage]``
- expire.enabled = (boolean, optional)
+``expire.enabled = (boolean, optional)``
- If this is True, the storage server will delete shares on which all
+ If this is ``True``, the storage server will delete shares on which all
leases have expired. Other controls dictate when leases are considered to
- have expired. The default is False.
+ have expired. The default is ``False``.
- expire.mode = (string, "age" or "cutoff-date", required if expiration enabled)
+``expire.mode = (string, "age" or "cutoff-date", required if expiration enabled)``
If this string is "age", the age-based expiration scheme is used, and the
- "expire.override_lease_duration" setting can be provided to influence the
+ ``expire.override_lease_duration`` setting can be provided to influence the
lease ages. If it is "cutoff-date", the absolute-date-cutoff mode is
- used, and the "expire.cutoff_date" setting must be provided to specify
+ used, and the ``expire.cutoff_date`` setting must be provided to specify
the cutoff date. The mode setting currently has no default: you must
provide a value.
this release it was deemed safer to require an explicit mode
specification.
- expire.override_lease_duration = (duration string, optional)
+``expire.override_lease_duration = (duration string, optional)``
When age-based expiration is in use, a lease will be expired if its
- "lease.create_renew" timestamp plus its "lease.duration" time is
+ ``lease.create_renew`` timestamp plus its ``lease.duration`` time is
earlier/older than the current time. This key, if present, overrides the
- duration value for all leases, changing the algorithm from:
+ duration value for all leases, changing the algorithm from::
if (lease.create_renew_timestamp + lease.duration) < now:
expire_lease()
- to:
+ to::
if (lease.create_renew_timestamp + override_lease_duration) < now:
expire_lease()
The value of this setting is a "duration string", which is a number of
days, months, or years, followed by a units suffix, and optionally
- separated by a space, such as one of the following:
+ separated by a space, such as one of the following::
7days
31day
31days" had been passed.
This key is only valid when age-based expiration is in use (i.e. when
- "expire.mode = age" is used). It will be rejected if cutoff-date
+ ``expire.mode = age`` is used). It will be rejected if cutoff-date
expiration is in use.
- expire.cutoff_date = (date string, required if mode=cutoff-date)
+``expire.cutoff_date = (date string, required if mode=cutoff-date)``
When cutoff-date expiration is in use, a lease will be expired if its
create/renew timestamp is older than the cutoff date. This string will be
- a date in the following format:
+ a date in the following format::
2009-01-16 (January 16th, 2009)
2008-02-02
primarily for use by programmers and grid operators who want to find out what
went wrong.
-The foolscap logging system is documented here:
+The foolscap logging system is documented at
+`<http://foolscap.lothar.com/docs/logging.html>`_.
- http://foolscap.lothar.com/docs/logging.html
-
-The foolscap distribution includes a utility named "flogtool" (usually at
-/usr/bin/flogtool) which is used to get access to many foolscap logging
-features.
+The foolscap distribution includes a utility named "``flogtool``" (usually
+at ``/usr/bin/flogtool`` on Unix) which is used to get access to many
+foolscap logging features.
Realtime Logging
================
When you are working on Tahoe code, and want to see what the node is doing,
-the easiest tool to use is "flogtool tail". This connects to the tahoe node
-and subscribes to hear about all log events. These events are then displayed
-to stdout, and optionally saved to a file.
+the easiest tool to use is "``flogtool tail``". This connects to the Tahoe
+node and subscribes to hear about all log events. These events are then
+displayed to stdout, and optionally saved to a file.
-"flogtool tail" connects to the "logport", for which the FURL is stored in
-BASEDIR/private/logport.furl . The following command will connect to this
-port and start emitting log information:
+"``flogtool tail``" connects to the "logport", for which the FURL is stored
+in ``BASEDIR/private/logport.furl`` . The following command will connect to
+this port and start emitting log information::
flogtool tail BASEDIR/private/logport.furl
-The "--save-to FILENAME" option will save all received events to a file,
-where then can be examined later with "flogtool dump" or "flogtool
-web-viewer". The --catch-up flag will ask the node to dump all stored events
-before subscribing to new ones (without --catch-up, you will only hear about
-events that occur after the tool has connected and subscribed).
+The ``--save-to FILENAME`` option will save all received events to a file,
+where then can be examined later with "``flogtool dump``" or
+"``flogtool web-viewer``". The ``--catch-up`` option will ask the node to
+dump all stored events before subscribing to new ones (without ``--catch-up``,
+you will only hear about events that occur after the tool has connected and
+subscribed).
Incidents
=========
Foolscap keeps a short list of recent events in memory. When something goes
wrong, it writes all the history it has (and everything that gets logged in
the next few seconds) into a file called an "incident". These files go into
-BASEDIR/logs/incidents/ , in a file named
-"incident-TIMESTAMP-UNIQUE.flog.bz2". The default definition of "something
-goes wrong" is the generation of a log event at the log.WEIRD level or
-higher, but other criteria could be implemented.
+``BASEDIR/logs/incidents/`` , in a file named
+"``incident-TIMESTAMP-UNIQUE.flog.bz2``". The default definition of
+"something goes wrong" is the generation of a log event at the ``log.WEIRD``
+level or higher, but other criteria could be implemented.
The typical "incident report" we've seen in a large Tahoe grid is about 40kB
compressed, representing about 1800 recent events.
-These "flogfiles" have a similar format to the files saved by "flogtool tail
---save-to". They are simply lists of log events, with a small header to
-indicate which event triggered the incident.
+These "flogfiles" have a similar format to the files saved by
+"``flogtool tail --save-to``". They are simply lists of log events, with a
+small header to indicate which event triggered the incident.
-The "flogtool dump FLOGFILE" command will take one of these .flog.bz2 files
-and print their contents to stdout, one line per event. The raw event
-dictionaries can be dumped by using "flogtool dump --verbose FLOGFILE".
+The "``flogtool dump FLOGFILE``" command will take one of these ``.flog.bz2``
+files and print their contents to stdout, one line per event. The raw event
+dictionaries can be dumped by using "``flogtool dump --verbose FLOGFILE``".
-The "flogtool web-viewer" command can be used to examine the flogfile in a
-web browser. It runs a small HTTP server and emits the URL on stdout. This
-view provides more structure than the output of "flogtool dump": the
-parent/child relationships of log events is displayed in a nested format.
-"flogtool web-viewer" is still fairly immature.
+The "``flogtool web-viewer``" command can be used to examine the flogfile
+in a web browser. It runs a small HTTP server and emits the URL on stdout.
+This view provides more structure than the output of "``flogtool dump``":
+the parent/child relationships of log events is displayed in a nested format.
+"``flogtool web-viewer``" is still fairly immature.
Working with flogfiles
======================
-The "flogtool filter" command can be used to take a large flogfile (perhaps
-one created by the log-gatherer, see below) and copy a subset of events into
-a second file. This smaller flogfile may be easier to work with than the
-original. The arguments to "flogtool filter" specify filtering criteria: a
-predicate that each event must match to be copied into the target file.
---before and --after are used to exclude events outside a given window of
-time. --above will retain events above a certain severity level. --from
-retains events send by a specific tubid. --strip-facility removes events that
-were emitted with a given facility (like foolscap.negotiation or
-tahoe.upload).
+The "``flogtool filter``" command can be used to take a large flogfile
+(perhaps one created by the log-gatherer, see below) and copy a subset of
+events into a second file. This smaller flogfile may be easier to work with
+than the original. The arguments to "``flogtool filter``" specify filtering
+criteria: a predicate that each event must match to be copied into the
+target file. ``--before`` and ``--after`` are used to exclude events outside
+a given window of time. ``--above`` will retain events above a certain
+severity level. ``--from`` retains events send by a specific tubid.
+``--strip-facility`` removes events that were emitted with a given facility
+(like ``foolscap.negotiation`` or ``tahoe.upload``).
Gatherers
=========
In a deployed Tahoe grid, it is useful to get log information automatically
transferred to a central log-gatherer host. This offloads the (admittedly
modest) storage requirements to a different host and provides access to
-logfiles from multiple nodes (webapi/storage/helper) nodes in a single place.
+logfiles from multiple nodes (web-API, storage, or helper) in a single place.
There are two kinds of gatherers. Both produce a FURL which needs to be
-placed in the NODEDIR/log_gatherer.furl file (one FURL per line) of the nodes
-that are to publish their logs to the gatherer. When the Tahoe node starts,
-it will connect to the configured gatherers and offer its logport: the
-gatherer will then use the logport to subscribe to hear about events.
+placed in the ``NODEDIR/log_gatherer.furl`` file (one FURL per line) of
+each node that is to publish its logs to the gatherer. When the Tahoe node
+starts, it will connect to the configured gatherers and offer its logport:
+the gatherer will then use the logport to subscribe to hear about events.
The gatherer will write to files in its working directory, which can then be
-examined with tools like "flogtool dump" as described above.
+examined with tools like "``flogtool dump``" as described above.
Incident Gatherer
-----------------
recognize when the same problem is happening multiple times.
A collection of classification functions that are useful for Tahoe nodes are
-provided in misc/incident-gatherer/support_classifiers.py . There is roughly
-one category for each log.WEIRD-or-higher level event in the Tahoe source
-code.
+provided in ``misc/incident-gatherer/support_classifiers.py`` . There is
+roughly one category for each ``log.WEIRD``-or-higher level event in the
+Tahoe source code.
-The incident gatherer is created with the "flogtool create-incident-gatherer
-WORKDIR" command, and started with "tahoe start". The generated
-"gatherer.tac" file should be modified to add classifier functions.
+The incident gatherer is created with the "``flogtool create-incident-gatherer
+WORKDIR``" command, and started with "``tahoe start``". The generated
+"``gatherer.tac``" file should be modified to add classifier functions.
The incident gatherer writes incident names (which are simply the relative
-pathname of the incident-\*.flog.bz2 file) into classified/CATEGORY. For
-example, the classified/mutable-retrieve-uncoordinated-write-error file
-contains a list of all incidents which were triggered by an uncoordinated
+pathname of the ``incident-\*.flog.bz2`` file) into ``classified/CATEGORY``.
+For example, the ``classified/mutable-retrieve-uncoordinated-write-error``
+file contains a list of all incidents which were triggered by an uncoordinated
write that was detected during mutable file retrieval (caused when somebody
changed the contents of the mutable file in between the node's mapupdate step
-and the retrieve step). The classified/unknown file contains a list of all
+and the retrieve step). The ``classified/unknown`` file contains a list of all
incidents that did not match any of the classification functions.
At startup, the incident gatherer will automatically reclassify any incident
-report which is not mentioned in any of the classified/* files. So the usual
-workflow is to examine the incidents in classified/unknown, add a new
-classification function, delete classified/unknown, then bound the gatherer
-with "tahoe restart WORKDIR". The incidents which can be classified with the
-new functions will be added to their own classified/FOO lists, and the
-remaining ones will be put in classified/unknown, where the process can be
-repeated until all events are classifiable.
+report which is not mentioned in any of the ``classified/\*`` files. So the
+usual workflow is to examine the incidents in ``classified/unknown``, add a
+new classification function, delete ``classified/unknown``, then bound the
+gatherer with "``tahoe restart WORKDIR``". The incidents which can be
+classified with the new functions will be added to their own ``classified/FOO``
+lists, and the remaining ones will be put in ``classified/unknown``, where
+the process can be repeated until all events are classifiable.
The incident gatherer is still fairly immature: future versions will have a
web interface and an RSS feed, so operations personnel can track problems in
the storage grid.
-In our experience, each Incident takes about two seconds to transfer from the
-node which generated it to the gatherer. The gatherer will automatically
+In our experience, each incident takes about two seconds to transfer from
+the node that generated it to the gatherer. The gatherer will automatically
catch up to any incidents which occurred while it is offline.
Log Gatherer
events into a large flogfile that is rotated (closed, compressed, and
replaced with a new one) on a periodic basis. Each flogfile is named
according to the range of time it represents, with names like
-"from-2008-08-26-132256--to-2008-08-26-162256.flog.bz2". The flogfiles
+"``from-2008-08-26-132256--to-2008-08-26-162256.flog.bz2``". The flogfiles
contain events from many different sources, making it easier to correlate
things that happened on multiple machines (such as comparing a client node
making a request with the storage servers that respond to that request).
-The Log Gatherer is created with the "flogtool create-gatherer WORKDIR"
-command, and started with "tahoe start". The log_gatherer.furl it creates
-then needs to be copied into the BASEDIR/log_gatherer.furl file of all nodes
-which should be sending it log events.
+The Log Gatherer is created with the "``flogtool create-gatherer WORKDIR``"
+command, and started with "``tahoe start``". The ``log_gatherer.furl`` it
+creates then needs to be copied into the ``BASEDIR/log_gatherer.furl`` file
+of all nodes that should be sending it log events.
-The "flogtool filter" command, described above, is useful to cut down the
+The "``flogtool filter``" command, described above, is useful to cut down the
potentially-large flogfiles into more a narrowly-focussed form.
-Busy nodes, particularly wapi nodes which are performing recursive
+Busy nodes, particularly web-API nodes which are performing recursive
deep-size/deep-stats/deep-check operations, can produce a lot of log events.
To avoid overwhelming the node (and using an unbounded amount of memory for
the outbound TCP queue), publishing nodes will start dropping log events when
Local twistd.log files
======================
-[TODO: not yet true, requires foolscap-0.3.1 and a change to allmydata.node]
+[TODO: not yet true, requires foolscap-0.3.1 and a change to ``allmydata.node``]
In addition to the foolscap-based event logs, certain high-level events will
be recorded directly in human-readable text form, in the
-BASEDIR/logs/twistd.log file (and its rotated old versions: twistd.log.1,
-twistd.log.2, etc). This form does not contain as much information as the
+``BASEDIR/logs/twistd.log`` file (and its rotated old versions: ``twistd.log.1``,
+``twistd.log.2``, etc). This form does not contain as much information as the
flogfiles available through the means described previously, but they are
immediately available to the curious developer, and are retained until the
twistd.log.NN files are explicitly deleted.
-Only events at the log.OPERATIONAL level or higher are bridged to twistd.log
-(i.e. not the log.NOISY debugging events). In addition, foolscap internal
-events (like connection negotiation messages) are not bridged to twistd.log .
+Only events at the ``log.OPERATIONAL`` level or higher are bridged to
+``twistd.log`` (i.e. not the ``log.NOISY`` debugging events). In addition,
+foolscap internal events (like connection negotiation messages) are not
+bridged to ``twistd.log``.
Adding log messages
===================
new log events. For details, please see the Foolscap logging documentation,
but a few notes are worth stating here:
-* use a facility prefix of "tahoe.", like "tahoe.mutable.publish"
+* use a facility prefix of "``tahoe.``", like "``tahoe.mutable.publish``"
-* assign each severe (log.WEIRD or higher) event a unique message
- identifier, as the umid= argument to the log.msg() call. The
- misc/coding_tools/make_umid script may be useful for this purpose. This will make it
- easier to write a classification function for these messages.
+* assign each severe (``log.WEIRD`` or higher) event a unique message
+ identifier, as the ``umid=`` argument to the ``log.msg()`` call. The
+ ``misc/coding_tools/make_umid`` script may be useful for this purpose.
+ This will make it easier to write a classification function for these
+ messages.
-* use the parent= argument whenever the event is causally/temporally
+* use the ``parent=`` argument whenever the event is causally/temporally
clustered with its parent. For example, a download process that involves
three sequential hash fetches could announce the send and receipt of those
- hash-fetch messages with a parent= argument that ties them to the overall
- download process. However, each new wapi download request should be
- unparented.
+ hash-fetch messages with a ``parent=`` argument that ties them to the
+ overall download process. However, each new web-API download request
+ should be unparented.
-* use the format= argument in preference to the message= argument. E.g.
- use log.msg(format="got %(n)d shares, need %(k)d", n=n, k=k) instead of
- log.msg("got %d shares, need %d" % (n,k)). This will allow later tools to
- analyze the event without needing to scrape/reconstruct the structured
- data out of the formatted string.
+* use the ``format=`` argument in preference to the ``message=`` argument.
+ E.g. use ``log.msg(format="got %(n)d shares, need %(k)d", n=n, k=k)``
+ instead of ``log.msg("got %d shares, need %d" % (n,k))``. This will allow
+ later tools to analyze the event without needing to scrape/reconstruct
+ the structured data out of the formatted string.
* Pass extra information as extra keyword arguments, even if they aren't
- included in the format= string. This information will be displayed in the
- "flogtool dump --verbose" output, as well as being available to other
- tools. The umid= argument should be passed this way.
+ included in the ``format=`` string. This information will be displayed in
+ the "``flogtool dump --verbose``" output, as well as being available to
+ other tools. The ``umid=`` argument should be passed this way.
-* use log.err for the catch-all addErrback that gets attached to the end of
- any given Deferred chain. When used in conjunction with LOGTOTWISTED=1,
- log.err() will tell Twisted about the error-nature of the log message,
- causing Trial to flunk the test (with an "ERROR" indication that prints a
- copy of the Failure, including a traceback). Don't use log.err for events
- that are BAD but handled (like hash failures: since these are often
- deliberately provoked by test code, they should not cause test failures):
- use log.msg(level=BAD) for those instead.
+* use ``log.err`` for the catch-all ``addErrback`` that gets attached to
+ the end of any given Deferred chain. When used in conjunction with
+ ``LOGTOTWISTED=1``, ``log.err()`` will tell Twisted about the error-nature
+ of the log message, causing Trial to flunk the test (with an "ERROR"
+ indication that prints a copy of the Failure, including a traceback).
+ Don't use ``log.err`` for events that are ``BAD`` but handled (like hash
+ failures: since these are often deliberately provoked by test code, they
+ should not cause test failures): use ``log.msg(level=BAD)`` for those
+ instead.
Log Messages During Unit Tests
==============================
If a test is failing and you aren't sure why, start by enabling
-FLOGTOTWISTED=1 like this:
+``FLOGTOTWISTED=1`` like this::
make test FLOGTOTWISTED=1
-With FLOGTOTWISTED=1, sufficiently-important log events will be written into
-_trial_temp/test.log, which may give you more ideas about why the test is
-failing. Note, however, that _trial_temp/log.out will not receive messages
-below the level=OPERATIONAL threshold, due to this issue:
-<http://foolscap.lothar.com/trac/ticket/154>
+With ``FLOGTOTWISTED=1``, sufficiently-important log events will be written
+into ``_trial_temp/test.log``, which may give you more ideas about why the
+test is failing. Note, however, that ``_trial_temp/log.out`` will not receive
+messages below the ``level=OPERATIONAL`` threshold, due to this issue:
+`<http://foolscap.lothar.com/trac/ticket/154>`_
If that isn't enough, look at the detailed foolscap logging messages instead,
-by running the tests like this:
+by running the tests like this::
make test FLOGFILE=flog.out.bz2 FLOGLEVEL=1 FLOGTOTWISTED=1
The first environment variable will cause foolscap log events to be written
-to ./flog.out.bz2 (instead of merely being recorded in the circular buffers
+to ``./flog.out.bz2`` (instead of merely being recorded in the circular buffers
for the use of remote subscribers or incident reports). The second will cause
all log events to be written out, not just the higher-severity ones. The
third will cause twisted log events (like the markers that indicate when each
easier to correlate log events with unit tests.
Enabling this form of logging appears to roughly double the runtime of the
-unit tests. The flog.out.bz2 file is approximately 2MB.
+unit tests. The ``flog.out.bz2`` file is approximately 2MB.
-You can then use "flogtool dump" or "flogtool web-viewer" on the resulting
-flog.out file.
+You can then use "``flogtool dump``" or "``flogtool web-viewer``" on the
+resulting ``flog.out`` file.
-("flogtool tail" and the log-gatherer are not useful during unit tests, since
-there is no single Tub to which all the log messages are published).
+("``flogtool tail``" and the log-gatherer are not useful during unit tests,
+since there is no single Tub to which all the log messages are published).
It is possible for setting these environment variables to cause spurious test
failures in tests with race condition bugs. All known instances of this have
sodipodi:role="line"
id="tspan2822"
x="469.52924"
- y="342.69528">Tahoe-LAFS WAPI</tspan></text>
+ y="342.69528">Tahoe-LAFS web-API</tspan></text>
</g>
</svg>
others just like filecaps and dircaps: knowledge of the authority string is
both necessary and complete to wield the authority it represents.
-webapi requests will include the authority necessary to complete the
+Web-API requests will include the authority necessary to complete the
operation. When used by a CLI tool, the authority is likely to come from
~/.tahoe/private/authority (i.e. it is ambient to the user who has access to
that node, just like aliases provide similar access to a specific "root
directory"). When used by the browser-oriented WUI, the authority will [TODO]
somehow be retained on each page in a way that minimizes the risk of CSRF
attacks and allows safe sharing (cut-and-paste of a URL without sharing the
-storage authority too). The client node receiving the webapi request will
+storage authority too). The client node receiving the web-API request will
extract the authority string from the request and use it to build the storage
server messages that it sends to fulfill that request.
file. Attenuations (see below) should be used to limit the delegated
authority in these cases.
-In the programmatic webapi interface (colloquially known as the "WAPI"), any
-operation that consumes storage will accept a storage-authority= query
-argument, the value of which will be the printable form of an authority
-string. This includes all PUT operations, POST t=upload and t=mkdir, and
-anything which creates a new file, creates a directory (perhaps an
-intermediate one), or modifies a mutable file.
+In the programmatic web-API, any operation that consumes storage will accept
+a storage-authority= query argument, the value of which will be the printable
+form of an authority string. This includes all PUT operations, POST t=upload
+and t=mkdir, and anything which creates a new file, creates a directory
+(perhaps an intermediate one), or modifies a mutable file.
Alternatively, the authority string can also be passed through an HTTP
header. A single "X-Tahoe-Storage-Authority:" header can be used with the
number for each account. This could be used by e.g. clients in a commercial
grid to report overall-space-used to the end user.
-There will be webapi URLs available for all of these reports.
+There will be web-API URLs available for all of these reports.
TODO: storage servers might also have a mechanism to apply space-usage limits
to specific account ids directly, rather than requiring that these be
section is organized to follow the storage authority, starting from the point
of grant. The discussion will thus begin at the storage server (where the
authority is first created), work back to the client (which receives the
-authority as a webapi argument), then follow the authority back to the
+authority as a web-API argument), then follow the authority back to the
servers as it is used to enable specific storage operations. It will then
detail the accounting tables that the storage server is obligated to
maintain, and describe the interfaces through which these tables are accessed
<p>The <a href="http://tahoe-lafs.org/trac/tahoe-lafs/wiki/SftpFrontend">SftpFrontend</a> page
on the wiki has more information about using SFTP with Tahoe-LAFS.</p>
- <h3>The WAPI</h3>
+ <h3>The Web-API</h3>
<p>Want to program your Tahoe-LAFS node to do your bidding? Easy! See <a
href="frontends/webapi.rst">webapi.rst</a>.</p>
to unconditionally replace the mutable file's contents with the new data.
This should not be used in delta application, but rather in situations where
you want to replace the file's contents with completely unrelated ones. When
-raw files are uploaded into a mutable slot through the tahoe webapi (using
-POST and the ?mutable=true argument), they are put in place with overwrite().
+raw files are uploaded into a mutable slot through the Tahoe-LAFS web-API
+(using POST and the ?mutable=true argument), they are put in place with
+overwrite().
The peer-selection and data-structure manipulation (and signing/verification)
steps will be implemented in a separate class in allmydata/mutable.py .