From: Zooko O'Whielacronx Date: Fri, 15 Oct 2010 05:29:13 +0000 (-0700) Subject: docs: convert all .txt docs to .rst thanks to Ravi Pinjala X-Git-Tag: trac-4800~48 X-Git-Url: https://git.rkrishnan.org/%5B/frontends//%22%22?a=commitdiff_plain;h=8143183e3973378637584ff51e486974fd7fc0f9;p=tahoe-lafs%2Ftahoe-lafs.git docs: convert all .txt docs to .rst thanks to Ravi Pinjala fixes #1225 --- diff --git a/CREDITS b/CREDITS index 34ce6ed5..6c6674e5 100644 --- a/CREDITS +++ b/CREDITS @@ -122,3 +122,7 @@ D: fix layout issue and server version numbers in WUI N: Jacob Lyles E: jacob.lyles@gmail.com D: fixed bug in WUI with Python 2.5 and a system clock set far in the past + +N: Ravi Pinjala +E: ravi@p-static.net +D: converted docs from .txt to .rst diff --git a/NEWS b/NEWS index dbbfb8a8..42f81cbb 100644 --- a/NEWS +++ b/NEWS @@ -1,5 +1,11 @@ User visible changes in Tahoe-LAFS. -*- outline; coding: utf-8 -*- +* Release 1.8.1 (coming) + +** Documentation + + - All .txt documents have been converted to .rst format (#1225) + * Release 1.8.0 (2010-09-23) ** New Features diff --git a/docs/architecture.rst b/docs/architecture.rst new file mode 100644 index 00000000..cea67cad --- /dev/null +++ b/docs/architecture.rst @@ -0,0 +1,559 @@ +======================= +Tahoe-LAFS Architecture +======================= + +1. `Overview`_ +2. `The Key-Value Store`_ +3. `File Encoding`_ +4. `Capabilities`_ +5. `Server Selection`_ +6. `Swarming Download, Trickling Upload`_ +7. `The Filesystem Layer`_ +8. `Leases, Refreshing, Garbage Collection`_ +9. `File Repairer`_ +10. `Security`_ +11. `Reliability`_ + + +Overview +======== + +(See the docs/specifications directory for more details.) + +There are three layers: the key-value store, the filesystem, and the +application. + +The lowest layer is the key-value store. The keys are "capabilities" -- short +ascii strings -- and the values are sequences of data bytes. This data is +encrypted and distributed across a number of nodes, such that it will survive +the loss of most of the nodes. There are no hard limits on the size of the +values, but there may be performance issues with extremely large values (just +due to the limitation of network bandwidth). In practice, values as small as +a few bytes and as large as tens of gigabytes are in common use. + +The middle layer is the decentralized filesystem: a directed graph in which +the intermediate nodes are directories and the leaf nodes are files. The leaf +nodes contain only the data -- they contain no metadata other than the length +in bytes. The edges leading to leaf nodes have metadata attached to them +about the file they point to. Therefore, the same file may be associated with +different metadata if it is referred to through different edges. + +The top layer consists of the applications using the filesystem. +Allmydata.com uses it for a backup service: the application periodically +copies files from the local disk onto the decentralized filesystem. We later +provide read-only access to those files, allowing users to recover them. +There are several other applications built on top of the Tahoe-LAFS +filesystem (see the `RelatedProjects +`_ page of the +wiki for a list). + + +The Key-Value Store +=================== + +The key-value store is implemented by a grid of Tahoe-LAFS storage servers -- +user-space processes. Tahoe-LAFS storage clients communicate with the storage +servers over TCP. + +Storage servers hold data in the form of "shares". Shares are encoded pieces +of files. There are a configurable number of shares for each file, 10 by +default. Normally, each share is stored on a separate server, but in some +cases a single server can hold multiple shares of a file. + +Nodes learn about each other through an "introducer". Each server connects to +the introducer at startup and announces its presence. Each client connects to +the introducer at startup, and receives a list of all servers from it. Each +client then connects to every server, creating a "bi-clique" topology. In the +current release, nodes behind NAT boxes will connect to all nodes that they +can open connections to, but they cannot open connections to other nodes +behind NAT boxes. Therefore, the more nodes behind NAT boxes, the less the +topology resembles the intended bi-clique topology. + +The introducer is a Single Point of Failure ("SPoF"), in that clients who +never connect to the introducer will be unable to connect to any storage +servers, but once a client has been introduced to everybody, it does not need +the introducer again until it is restarted. The danger of a SPoF is further +reduced in two ways. First, the introducer is defined by a hostname and a +private key, which are easy to move to a new host in case the original one +suffers an unrecoverable hardware problem. Second, even if the private key is +lost, clients can be reconfigured to use a new introducer. + +For future releases, we have plans to decentralize introduction, allowing any +server to tell a new client about all the others. + + +File Encoding +============= + +When a client stores a file on the grid, it first encrypts the file. It then +breaks the encrypted file into small segments, in order to reduce the memory +footprint, and to decrease the lag between initiating a download and +receiving the first part of the file; for example the lag between hitting +"play" and a movie actually starting. + +The client then erasure-codes each segment, producing blocks of which only a +subset are needed to reconstruct the segment (3 out of 10, with the default +settings). + +It sends one block from each segment to a given server. The set of blocks on +a given server constitutes a "share". Therefore a subset f the shares (3 out +of 10, by default) are needed to reconstruct the file. + +A hash of the encryption key is used to form the "storage index", which is +used for both server selection (described below) and to index shares within +the Storage Servers on the selected nodes. + +The client computes secure hashes of the ciphertext and of the shares. It +uses Merkle Trees so that it is possible to verify the correctness of a +subset of the data without requiring all of the data. For example, this +allows you to verify the correctness of the first segment of a movie file and +then begin playing the movie file in your movie viewer before the entire +movie file has been downloaded. + +These hashes are stored in a small datastructure named the Capability +Extension Block which is stored on the storage servers alongside each share. + +The capability contains the encryption key, the hash of the Capability +Extension Block, and any encoding parameters necessary to perform the +eventual decoding process. For convenience, it also contains the size of the +file being stored. + +To download, the client that wishes to turn a capability into a sequence of +bytes will obtain the blocks from storage servers, use erasure-decoding to +turn them into segments of ciphertext, use the decryption key to convert that +into plaintext, then emit the plaintext bytes to the output target. + + +Capabilities +============ + +Capabilities to immutable files represent a specific set of bytes. Think of +it like a hash function: you feed in a bunch of bytes, and you get out a +capability, which is deterministically derived from the input data: changing +even one bit of the input data will result in a completely different +capability. + +Read-only capabilities to mutable files represent the ability to get a set of +bytes representing some version of the file, most likely the latest version. +Each read-only capability is unique. In fact, each mutable file has a unique +public/private key pair created when the mutable file is created, and the +read-only capability to that file includes a secure hash of the public key. + +Read-write capabilities to mutable files represent the ability to read the +file (just like a read-only capability) and also to write a new version of +the file, overwriting any extant version. Read-write capabilities are unique +-- each one includes the secure hash of the private key associated with that +mutable file. + +The capability provides both "location" and "identification": you can use it +to retrieve a set of bytes, and then you can use it to validate ("identify") +that these potential bytes are indeed the ones that you were looking for. + +The "key-value store" layer doesn't include human-meaningful names. +Capabilities sit on the "global+secure" edge of `Zooko's Triangle`_. They are +self-authenticating, meaning that nobody can trick you into accepting a file +that doesn't match the capability you used to refer to that file. The +filesystem layer (described below) adds human-meaningful names atop the +key-value layer. + +.. _`Zooko's Triangle`: http://en.wikipedia.org/wiki/Zooko%27s_triangle + + +Server Selection +================ + +When a file is uploaded, the encoded shares are sent to some servers. But to +which ones? The "server selection" algorithm is used to make this choice. + +The storage index is used to consistently-permute the set of all servers nodes +(by sorting them by ``HASH(storage_index+nodeid)``). Each file gets a different +permutation, which (on average) will evenly distribute shares among the grid +and avoid hotspots. Each server has announced its available space when it +connected to the introducer, and we use that available space information to +remove any servers that cannot hold an encoded share for our file. Then we ask +some of the servers thus removed if they are already holding any encoded shares +for our file; we use this information later. (We ask any servers which are in +the first 2*N elements of the permuted list.) + +We then use the permuted list of servers to ask each server, in turn, if it +will hold a share for us (a share that was not reported as being already +present when we talked to the full servers earlier, and that we have not +already planned to upload to a different server). We plan to send a share to a +server by sending an 'allocate_buckets() query' to the server with the number +of that share. Some will say yes they can hold that share, others (those who +have become full since they announced their available space) will say no; when +a server refuses our request, we take that share to the next server on the +list. In the response to allocate_buckets() the server will also inform us of +any shares of that file that it already has. We keep going until we run out of +shares that need to be stored. At the end of the process, we'll have a table +that maps each share number to a server, and then we can begin the encode and +push phase, using the table to decide where each share should be sent. + +Most of the time, this will result in one share per server, which gives us +maximum reliability. If there are fewer writable servers than there are +unstored shares, we'll be forced to loop around, eventually giving multiple +shares to a single server. + +If we have to loop through the node list a second time, we accelerate the query +process, by asking each node to hold multiple shares on the second pass. In +most cases, this means we'll never send more than two queries to any given +node. + +If a server is unreachable, or has an error, or refuses to accept any of our +shares, we remove it from the permuted list, so we won't query it again for +this file. If a server already has shares for the file we're uploading, we add +that information to the share-to-server table. This lets us do less work for +files which have been uploaded once before, while making sure we still wind up +with as many shares as we desire. + +Before a file upload is called successful, it has to pass an upload health +check. For immutable files, we check to see that a condition called +'servers-of-happiness' is satisfied. When satisfied, 'servers-of-happiness' +assures us that enough pieces of the file are distributed across enough +servers on the grid to ensure that the availability of the file will not be +affected if a few of those servers later fail. For mutable files and +directories, we check to see that all of the encoded shares generated during +the upload process were successfully placed on the grid. This is a weaker +check than 'servers-of-happiness'; it does not consider any information about +how the encoded shares are placed on the grid, and cannot detect situations in +which all or a majority of the encoded shares generated during the upload +process reside on only one storage server. We hope to extend +'servers-of-happiness' to mutable files in a future release of Tahoe-LAFS. If, +at the end of the upload process, the appropriate upload health check fails, +the upload is considered a failure. + +The current defaults use k=3, servers_of_happiness=7, and N=10. N=10 means that +we'll try to place 10 shares. k=3 means that we need any three shares to +recover the file. servers_of_happiness=7 means that we'll consider an immutable +file upload to be successful if we can place shares on enough servers that +there are 7 different servers, the correct functioning of any k of which +guarantee the availability of the immutable file. + +N=10 and k=3 means there is a 3.3x expansion factor. On a small grid, you +should set N about equal to the number of storage servers in your grid; on a +large grid, you might set it to something smaller to avoid the overhead of +contacting every server to place a file. In either case, you should then set k +such that N/k reflects your desired availability goals. The best value for +servers_of_happiness will depend on how you use Tahoe-LAFS. In a friendnet with +a variable number of servers, it might make sense to set it to the smallest +number of servers that you expect to have online and accepting shares at any +given time. In a stable environment without much server churn, it may make +sense to set servers_of_happiness = N. + +When downloading a file, the current version just asks all known servers for +any shares they might have. Once it has received enough responses that it +knows where to find the needed k shares, it downloads at least the first +segment from those servers. This means that it tends to download shares from +the fastest servers. If some servers had more than one share, it will continue +sending "Do You Have Block" requests to other servers, so that it can download +subsequent segments from distinct servers (sorted by their DYHB round-trip +times), if possible. + + *future work* + + A future release will use the server selection algorithm to reduce the + number of queries that must be sent out. + + Other peer-node selection algorithms are possible. One earlier version + (known as "Tahoe 3") used the permutation to place the nodes around a large + ring, distributed the shares evenly around the same ring, then walked + clockwise from 0 with a basket. Each time it encountered a share, it put it + in the basket, each time it encountered a server, give it as many shares + from the basket as they'd accept. This reduced the number of queries + (usually to 1) for small grids (where N is larger than the number of + nodes), but resulted in extremely non-uniform share distribution, which + significantly hurt reliability (sometimes the permutation resulted in most + of the shares being dumped on a single node). + + Another algorithm (known as "denver airport" [#naming]_) uses the permuted hash to + decide on an approximate target for each share, then sends lease requests + via Chord routing. The request includes the contact information of the + uploading node, and asks that the node which eventually accepts the lease + should contact the uploader directly. The shares are then transferred over + direct connections rather than through multiple Chord hops. Download uses + the same approach. This allows nodes to avoid maintaining a large number of + long-term connections, at the expense of complexity and latency. + +.. [#naming] all of these names are derived from the location where they were + concocted, in this case in a car ride from Boulder to DEN. To be + precise, "Tahoe 1" was an unworkable scheme in which everyone who holds + shares for a given file would form a sort of cabal which kept track of + all the others, "Tahoe 2" is the first-100-nodes in the permuted hash + described in this document, and "Tahoe 3" (or perhaps "Potrero hill 1") + was the abandoned ring-with-many-hands approach. + + +Swarming Download, Trickling Upload +=================================== + +Because the shares being downloaded are distributed across a large number of +nodes, the download process will pull from many of them at the same time. The +current encoding parameters require 3 shares to be retrieved for each +segment, which means that up to 3 nodes will be used simultaneously. For +larger networks, 8-of-22 encoding could be used, meaning 8 nodes can be used +simultaneously. This allows the download process to use the sum of the +available nodes' upload bandwidths, resulting in downloads that take full +advantage of the common 8x disparity between download and upload bandwith on +modern ADSL lines. + +On the other hand, uploads are hampered by the need to upload encoded shares +that are larger than the original data (3.3x larger with the current default +encoding parameters), through the slow end of the asymmetric connection. This +means that on a typical 8x ADSL line, uploading a file will take about 32 +times longer than downloading it again later. + +Smaller expansion ratios can reduce this upload penalty, at the expense of +reliability (see RELIABILITY, below). By using an "upload helper", this +penalty is eliminated: the client does a 1x upload of encrypted data to the +helper, then the helper performs encoding and pushes the shares to the +storage servers. This is an improvement if the helper has significantly +higher upload bandwidth than the client, so it makes the most sense for a +commercially-run grid for which all of the storage servers are in a colo +facility with high interconnect bandwidth. In this case, the helper is placed +in the same facility, so the helper-to-storage-server bandwidth is huge. + +See "helper.txt" for details about the upload helper. + + +The Filesystem Layer +==================== + +The "filesystem" layer is responsible for mapping human-meaningful pathnames +(directories and filenames) to pieces of data. The actual bytes inside these +files are referenced by capability, but the filesystem layer is where the +directory names, file names, and metadata are kept. + +The filesystem layer is a graph of directories. Each directory contains a +table of named children. These children are either other directories or +files. All children are referenced by their capability. + +A directory has two forms of capability: read-write caps and read-only caps. +The table of children inside the directory has a read-write and read-only +capability for each child. If you have a read-only capability for a given +directory, you will not be able to access the read-write capability of its +children. This results in "transitively read-only" directory access. + +By having two different capabilities, you can choose which you want to share +with someone else. If you create a new directory and share the read-write +capability for it with a friend, then you will both be able to modify its +contents. If instead you give them the read-only capability, then they will +*not* be able to modify the contents. Any capability that you receive can be +linked in to any directory that you can modify, so very powerful +shared+published directory structures can be built from these components. + +This structure enable individual users to have their own personal space, with +links to spaces that are shared with specific other users, and other spaces +that are globally visible. + + +Leases, Refreshing, Garbage Collection +====================================== + +When a file or directory in the virtual filesystem is no longer referenced, +the space that its shares occupied on each storage server can be freed, +making room for other shares. Tahoe-LAFS uses a garbage collection ("GC") +mechanism to implement this space-reclamation process. Each share has one or +more "leases", which are managed by clients who want the file/directory to be +retained. The storage server accepts each share for a pre-defined period of +time, and is allowed to delete the share if all of the leases are cancelled +or allowed to expire. + +Garbage collection is not enabled by default: storage servers will not delete +shares without being explicitly configured to do so. When GC is enabled, +clients are responsible for renewing their leases on a periodic basis at +least frequently enough to prevent any of the leases from expiring before the +next renewal pass. + +See docs/garbage-collection.txt for further information, and how to configure +garbage collection. + + +File Repairer +============= + +Shares may go away because the storage server hosting them has suffered a +failure: either temporary downtime (affecting availability of the file), or a +permanent data loss (affecting the preservation of the file). Hard drives +crash, power supplies explode, coffee spills, and asteroids strike. The goal +of a robust distributed filesystem is to survive these setbacks. + +To work against this slow, continual loss of shares, a File Checker is used +to periodically count the number of shares still available for any given +file. A more extensive form of checking known as the File Verifier can +download the ciphertext of the target file and perform integrity checks +(using strong hashes) to make sure the data is stil intact. When the file is +found to have decayed below some threshold, the File Repairer can be used to +regenerate and re-upload the missing shares. These processes are conceptually +distinct (the repairer is only run if the checker/verifier decides it is +necessary), but in practice they will be closely related, and may run in the +same process. + +The repairer process does not get the full capability of the file to be +maintained: it merely gets the "repairer capability" subset, which does not +include the decryption key. The File Verifier uses that data to find out +which nodes ought to hold shares for this file, and to see if those nodes are +still around and willing to provide the data. If the file is not healthy +enough, the File Repairer is invoked to download the ciphertext, regenerate +any missing shares, and upload them to new nodes. The goal of the File +Repairer is to finish up with a full set of "N" shares. + +There are a number of engineering issues to be resolved here. The bandwidth, +disk IO, and CPU time consumed by the verification/repair process must be +balanced against the robustness that it provides to the grid. The nodes +involved in repair will have very different access patterns than normal +nodes, such that these processes may need to be run on hosts with more memory +or network connectivity than usual. The frequency of repair will directly +affect the resources consumed. In some cases, verification of multiple files +can be performed at the same time, and repair of files can be delegated off +to other nodes. + + *future work* + + Currently there are two modes of checking on the health of your file: + "Checker" simply asks storage servers which shares they have and does + nothing to try to verify that they aren't lying. "Verifier" downloads and + cryptographically verifies every bit of every share of the file from every + server, which costs a lot of network and CPU. A future improvement would be + to make a random-sampling verifier which downloads and cryptographically + verifies only a few randomly-chosen blocks from each server. This would + require much less network and CPU but it could make it extremely unlikely + that any sort of corruption -- even malicious corruption intended to evade + detection -- would evade detection. This would be an instance of a + cryptographic notion called "Proof of Retrievability". Note that to implement + this requires no change to the server or to the cryptographic data structure + -- with the current data structure and the current protocol it is up to the + client which blocks they choose to download, so this would be solely a change + in client behavior. + + +Security +======== + +The design goal for this project is that an attacker may be able to deny +service (i.e. prevent you from recovering a file that was uploaded earlier) +but can accomplish none of the following three attacks: + +1) violate confidentiality: the attacker gets to view data to which you have + not granted them access +2) violate integrity: the attacker convinces you that the wrong data is + actually the data you were intending to retrieve +3) violate unforgeability: the attacker gets to modify a mutable file or + directory (either the pathnames or the file contents) to which you have + not given them write permission + +Integrity (the promise that the downloaded data will match the uploaded data) +is provided by the hashes embedded in the capability (for immutable files) or +the digital signature (for mutable files). Confidentiality (the promise that +the data is only readable by people with the capability) is provided by the +encryption key embedded in the capability (for both immutable and mutable +files). Data availability (the hope that data which has been uploaded in the +past will be downloadable in the future) is provided by the grid, which +distributes failures in a way that reduces the correlation between individual +node failure and overall file recovery failure, and by the erasure-coding +technique used to generate shares. + +Many of these security properties depend upon the usual cryptographic +assumptions: the resistance of AES and RSA to attack, the resistance of +SHA-256 to collision attacks and pre-image attacks, and upon the proximity of +2^-128 and 2^-256 to zero. A break in AES would allow a confidentiality +violation, a collision break in SHA-256 would allow a consistency violation, +and a break in RSA would allow a mutability violation. + +There is no attempt made to provide anonymity, neither of the origin of a +piece of data nor the identity of the subsequent downloaders. In general, +anyone who already knows the contents of a file will be in a strong position +to determine who else is uploading or downloading it. Also, it is quite easy +for a sufficiently large coalition of nodes to correlate the set of nodes who +are all uploading or downloading the same file, even if the attacker does not +know the contents of the file in question. + +Also note that the file size and (when convergence is being used) a keyed +hash of the plaintext are not protected. Many people can determine the size +of the file you are accessing, and if they already know the contents of a +given file, they will be able to determine that you are uploading or +downloading the same one. + +The capability-based security model is used throughout this project. +Directory operations are expressed in terms of distinct read- and write- +capabilities. Knowing the read-capability of a file is equivalent to the +ability to read the corresponding data. The capability to validate the +correctness of a file is strictly weaker than the read-capability (possession +of read-capability automatically grants you possession of +validate-capability, but not vice versa). These capabilities may be expressly +delegated (irrevocably) by simply transferring the relevant secrets. + +The application layer can provide whatever access model is desired, built on +top of this capability access model. The first big user of this system so far +is allmydata.com. The allmydata.com access model currently works like a +normal web site, using username and password to give a user access to her +"virtual drive". In addition, allmydata.com users can share individual files +(using a file sharing interface built on top of the immutable file read +capabilities). + + +Reliability +=========== + +File encoding and peer-node selection parameters can be adjusted to achieve +different goals. Each choice results in a number of properties; there are +many tradeoffs. + +First, some terms: the erasure-coding algorithm is described as K-out-of-N +(for this release, the default values are K=3 and N=10). Each grid will have +some number of nodes; this number will rise and fall over time as nodes join, +drop out, come back, and leave forever. Files are of various sizes, some are +popular, others are unpopular. Nodes have various capacities, variable +upload/download bandwidths, and network latency. Most of the mathematical +models that look at node failure assume some average (and independent) +probability 'P' of a given node being available: this can be high (servers +tend to be online and available >90% of the time) or low (laptops tend to be +turned on for an hour then disappear for several days). Files are encoded in +segments of a given maximum size, which affects memory usage. + +The ratio of N/K is the "expansion factor". Higher expansion factors improve +reliability very quickly (the binomial distribution curve is very sharp), but +consumes much more grid capacity. When P=50%, the absolute value of K affects +the granularity of the binomial curve (1-out-of-2 is much worse than +50-out-of-100), but high values asymptotically approach a constant (i.e. +500-of-1000 is not much better than 50-of-100). When P is high and the +expansion factor is held at a constant, higher values of K and N give much +better reliability (for P=99%, 50-out-of-100 is much much better than +5-of-10, roughly 10^50 times better), because there are more shares that can +be lost without losing the file. + +Likewise, the total number of nodes in the network affects the same +granularity: having only one node means a single point of failure, no matter +how many copies of the file you make. Independent nodes (with uncorrelated +failures) are necessary to hit the mathematical ideals: if you have 100 nodes +but they are all in the same office building, then a single power failure +will take out all of them at once. The "Sybil Attack" is where a single +attacker convinces you that they are actually multiple servers, so that you +think you are using a large number of independent nodes, but in fact you have +a single point of failure (where the attacker turns off all their machines at +once). Large grids, with lots of truly independent nodes, will enable the use +of lower expansion factors to achieve the same reliability, but will increase +overhead because each node needs to know something about every other, and the +rate at which nodes come and go will be higher (requiring network maintenance +traffic). Also, the File Repairer work will increase with larger grids, +although then the job can be distributed out to more nodes. + +Higher values of N increase overhead: more shares means more Merkle hashes +that must be included with the data, and more nodes to contact to retrieve +the shares. Smaller segment sizes reduce memory usage (since each segment +must be held in memory while erasure coding runs) and improves "alacrity" +(since downloading can validate a smaller piece of data faster, delivering it +to the target sooner), but also increase overhead (because more blocks means +more Merkle hashes to validate them). + +In general, small private grids should work well, but the participants will +have to decide between storage overhead and reliability. Large stable grids +will be able to reduce the expansion factor down to a bare minimum while +still retaining high reliability, but large unstable grids (where nodes are +coming and going very quickly) may require more repair/verification bandwidth +than actual upload/download traffic. + +Tahoe-LAFS nodes that run a webserver have a page dedicated to provisioning +decisions: this tool may help you evaluate different expansion factors and +view the disk consumption of each. It is also acquiring some sections with +availability/reliability numbers, as well as preliminary cost analysis data. +This tool will continue to evolve as our analysis improves. diff --git a/docs/architecture.txt b/docs/architecture.txt deleted file mode 100644 index 604b5947..00000000 --- a/docs/architecture.txt +++ /dev/null @@ -1,547 +0,0 @@ -= Tahoe-LAFS Architecture = - -1. Overview -2. The Key-Value Store -3. File Encoding -4. Capabilities -5. Server Selection -6. Swarming Download, Trickling Upload -7. The Filesystem Layer -8. Leases, Refreshing, Garbage Collection -9. File Repairer -10. Security -11. Reliability - - -== Overview == - -(See the docs/specifications directory for more details.) - -There are three layers: the key-value store, the filesystem, and the -application. - -The lowest layer is the key-value store. The keys are "capabilities" -- short -ascii strings -- and the values are sequences of data bytes. This data is -encrypted and distributed across a number of nodes, such that it will survive -the loss of most of the nodes. There are no hard limits on the size of the -values, but there may be performance issues with extremely large values (just -due to the limitation of network bandwidth). In practice, values as small as -a few bytes and as large as tens of gigabytes are in common use. - -The middle layer is the decentralized filesystem: a directed graph in which -the intermediate nodes are directories and the leaf nodes are files. The leaf -nodes contain only the data -- they contain no metadata other than the length -in bytes. The edges leading to leaf nodes have metadata attached to them -about the file they point to. Therefore, the same file may be associated with -different metadata if it is referred to through different edges. - -The top layer consists of the applications using the filesystem. -Allmydata.com uses it for a backup service: the application periodically -copies files from the local disk onto the decentralized filesystem. We later -provide read-only access to those files, allowing users to recover them. -There are several other applications built on top of the Tahoe-LAFS -filesystem (see the RelatedProjects page of the wiki for a list). - - -== The Key-Value Store == - -The key-value store is implemented by a grid of Tahoe-LAFS storage servers -- -user-space processes. Tahoe-LAFS storage clients communicate with the storage -servers over TCP. - -Storage servers hold data in the form of "shares". Shares are encoded pieces -of files. There are a configurable number of shares for each file, 10 by -default. Normally, each share is stored on a separate server, but in some -cases a single server can hold multiple shares of a file. - -Nodes learn about each other through an "introducer". Each server connects to -the introducer at startup and announces its presence. Each client connects to -the introducer at startup, and receives a list of all servers from it. Each -client then connects to every server, creating a "bi-clique" topology. In the -current release, nodes behind NAT boxes will connect to all nodes that they -can open connections to, but they cannot open connections to other nodes -behind NAT boxes. Therefore, the more nodes behind NAT boxes, the less the -topology resembles the intended bi-clique topology. - -The introducer is a Single Point of Failure ("SPoF"), in that clients who -never connect to the introducer will be unable to connect to any storage -servers, but once a client has been introduced to everybody, it does not need -the introducer again until it is restarted. The danger of a SPoF is further -reduced in two ways. First, the introducer is defined by a hostname and a -private key, which are easy to move to a new host in case the original one -suffers an unrecoverable hardware problem. Second, even if the private key is -lost, clients can be reconfigured to use a new introducer. - -For future releases, we have plans to decentralize introduction, allowing any -server to tell a new client about all the others. - - -== File Encoding == - -When a client stores a file on the grid, it first encrypts the file. It then -breaks the encrypted file into small segments, in order to reduce the memory -footprint, and to decrease the lag between initiating a download and -receiving the first part of the file; for example the lag between hitting -"play" and a movie actually starting. - -The client then erasure-codes each segment, producing blocks of which only a -subset are needed to reconstruct the segment (3 out of 10, with the default -settings). - -It sends one block from each segment to a given server. The set of blocks on -a given server constitutes a "share". Therefore a subset f the shares (3 out -of 10, by default) are needed to reconstruct the file. - -A hash of the encryption key is used to form the "storage index", which is -used for both server selection (described below) and to index shares within -the Storage Servers on the selected nodes. - -The client computes secure hashes of the ciphertext and of the shares. It -uses Merkle Trees so that it is possible to verify the correctness of a -subset of the data without requiring all of the data. For example, this -allows you to verify the correctness of the first segment of a movie file and -then begin playing the movie file in your movie viewer before the entire -movie file has been downloaded. - -These hashes are stored in a small datastructure named the Capability -Extension Block which is stored on the storage servers alongside each share. - -The capability contains the encryption key, the hash of the Capability -Extension Block, and any encoding parameters necessary to perform the -eventual decoding process. For convenience, it also contains the size of the -file being stored. - -To download, the client that wishes to turn a capability into a sequence of -bytes will obtain the blocks from storage servers, use erasure-decoding to -turn them into segments of ciphertext, use the decryption key to convert that -into plaintext, then emit the plaintext bytes to the output target. - - -== Capabilities == - -Capabilities to immutable files represent a specific set of bytes. Think of -it like a hash function: you feed in a bunch of bytes, and you get out a -capability, which is deterministically derived from the input data: changing -even one bit of the input data will result in a completely different -capability. - -Read-only capabilities to mutable files represent the ability to get a set of -bytes representing some version of the file, most likely the latest version. -Each read-only capability is unique. In fact, each mutable file has a unique -public/private key pair created when the mutable file is created, and the -read-only capability to that file includes a secure hash of the public key. - -Read-write capabilities to mutable files represent the ability to read the -file (just like a read-only capability) and also to write a new version of -the file, overwriting any extant version. Read-write capabilities are unique --- each one includes the secure hash of the private key associated with that -mutable file. - -The capability provides both "location" and "identification": you can use it -to retrieve a set of bytes, and then you can use it to validate ("identify") -that these potential bytes are indeed the ones that you were looking for. - -The "key-value store" layer doesn't include human-meaningful names. -Capabilities sit on the "global+secure" edge of Zooko's Triangle[1]. They are -self-authenticating, meaning that nobody can trick you into accepting a file -that doesn't match the capability you used to refer to that file. The -filesystem layer (described below) adds human-meaningful names atop the -key-value layer. - - -== Server Selection == - -When a file is uploaded, the encoded shares are sent to some servers. But to -which ones? The "server selection" algorithm is used to make this choice. - -The storage index is used to consistently-permute the set of all servers nodes -(by sorting them by HASH(storage_index+nodeid)). Each file gets a different -permutation, which (on average) will evenly distribute shares among the grid -and avoid hotspots. Each server has announced its available space when it -connected to the introducer, and we use that available space information to -remove any servers that cannot hold an encoded share for our file. Then we ask -some of the servers thus removed if they are already holding any encoded shares -for our file; we use this information later. (We ask any servers which are in -the first 2*N elements of the permuted list.) - -We then use the permuted list of servers to ask each server, in turn, if it -will hold a share for us (a share that was not reported as being already -present when we talked to the full servers earlier, and that we have not -already planned to upload to a different server). We plan to send a share to a -server by sending an 'allocate_buckets() query' to the server with the number -of that share. Some will say yes they can hold that share, others (those who -have become full since they announced their available space) will say no; when -a server refuses our request, we take that share to the next server on the -list. In the response to allocate_buckets() the server will also inform us of -any shares of that file that it already has. We keep going until we run out of -shares that need to be stored. At the end of the process, we'll have a table -that maps each share number to a server, and then we can begin the encode and -push phase, using the table to decide where each share should be sent. - -Most of the time, this will result in one share per server, which gives us -maximum reliability. If there are fewer writable servers than there are -unstored shares, we'll be forced to loop around, eventually giving multiple -shares to a single server. - -If we have to loop through the node list a second time, we accelerate the query -process, by asking each node to hold multiple shares on the second pass. In -most cases, this means we'll never send more than two queries to any given -node. - -If a server is unreachable, or has an error, or refuses to accept any of our -shares, we remove it from the permuted list, so we won't query it again for -this file. If a server already has shares for the file we're uploading, we add -that information to the share-to-server table. This lets us do less work for -files which have been uploaded once before, while making sure we still wind up -with as many shares as we desire. - -Before a file upload is called successful, it has to pass an upload health -check. For immutable files, we check to see that a condition called -'servers-of-happiness' is satisfied. When satisfied, 'servers-of-happiness' -assures us that enough pieces of the file are distributed across enough -servers on the grid to ensure that the availability of the file will not be -affected if a few of those servers later fail. For mutable files and -directories, we check to see that all of the encoded shares generated during -the upload process were successfully placed on the grid. This is a weaker -check than 'servers-of-happiness'; it does not consider any information about -how the encoded shares are placed on the grid, and cannot detect situations in -which all or a majority of the encoded shares generated during the upload -process reside on only one storage server. We hope to extend -'servers-of-happiness' to mutable files in a future release of Tahoe-LAFS. If, -at the end of the upload process, the appropriate upload health check fails, -the upload is considered a failure. - -The current defaults use k=3, servers_of_happiness=7, and N=10. N=10 means that -we'll try to place 10 shares. k=3 means that we need any three shares to -recover the file. servers_of_happiness=7 means that we'll consider an immutable -file upload to be successful if we can place shares on enough servers that -there are 7 different servers, the correct functioning of any k of which -guarantee the availability of the immutable file. - -N=10 and k=3 means there is a 3.3x expansion factor. On a small grid, you -should set N about equal to the number of storage servers in your grid; on a -large grid, you might set it to something smaller to avoid the overhead of -contacting every server to place a file. In either case, you should then set k -such that N/k reflects your desired availability goals. The best value for -servers_of_happiness will depend on how you use Tahoe-LAFS. In a friendnet with -a variable number of servers, it might make sense to set it to the smallest -number of servers that you expect to have online and accepting shares at any -given time. In a stable environment without much server churn, it may make -sense to set servers_of_happiness = N. - -When downloading a file, the current version just asks all known servers for -any shares they might have. Once it has received enough responses that it -knows where to find the needed k shares, it downloads at least the first -segment from those servers. This means that it tends to download shares from -the fastest servers. If some servers had more than one share, it will continue -sending "Do You Have Block" requests to other servers, so that it can download -subsequent segments from distinct servers (sorted by their DYHB round-trip -times), if possible. - - *future work* - - A future release will use the server selection algorithm to reduce the - number of queries that must be sent out. - - Other peer-node selection algorithms are possible. One earlier version - (known as "Tahoe 3") used the permutation to place the nodes around a large - ring, distributed the shares evenly around the same ring, then walked - clockwise from 0 with a basket. Each time it encountered a share, it put it - in the basket, each time it encountered a server, give it as many shares - from the basket as they'd accept. This reduced the number of queries - (usually to 1) for small grids (where N is larger than the number of - nodes), but resulted in extremely non-uniform share distribution, which - significantly hurt reliability (sometimes the permutation resulted in most - of the shares being dumped on a single node). - - Another algorithm (known as "denver airport"[2]) uses the permuted hash to - decide on an approximate target for each share, then sends lease requests - via Chord routing. The request includes the contact information of the - uploading node, and asks that the node which eventually accepts the lease - should contact the uploader directly. The shares are then transferred over - direct connections rather than through multiple Chord hops. Download uses - the same approach. This allows nodes to avoid maintaining a large number of - long-term connections, at the expense of complexity and latency. - - -== Swarming Download, Trickling Upload == - -Because the shares being downloaded are distributed across a large number of -nodes, the download process will pull from many of them at the same time. The -current encoding parameters require 3 shares to be retrieved for each -segment, which means that up to 3 nodes will be used simultaneously. For -larger networks, 8-of-22 encoding could be used, meaning 8 nodes can be used -simultaneously. This allows the download process to use the sum of the -available nodes' upload bandwidths, resulting in downloads that take full -advantage of the common 8x disparity between download and upload bandwith on -modern ADSL lines. - -On the other hand, uploads are hampered by the need to upload encoded shares -that are larger than the original data (3.3x larger with the current default -encoding parameters), through the slow end of the asymmetric connection. This -means that on a typical 8x ADSL line, uploading a file will take about 32 -times longer than downloading it again later. - -Smaller expansion ratios can reduce this upload penalty, at the expense of -reliability (see RELIABILITY, below). By using an "upload helper", this -penalty is eliminated: the client does a 1x upload of encrypted data to the -helper, then the helper performs encoding and pushes the shares to the -storage servers. This is an improvement if the helper has significantly -higher upload bandwidth than the client, so it makes the most sense for a -commercially-run grid for which all of the storage servers are in a colo -facility with high interconnect bandwidth. In this case, the helper is placed -in the same facility, so the helper-to-storage-server bandwidth is huge. - -See "helper.txt" for details about the upload helper. - - -== The Filesystem Layer == - -The "filesystem" layer is responsible for mapping human-meaningful pathnames -(directories and filenames) to pieces of data. The actual bytes inside these -files are referenced by capability, but the filesystem layer is where the -directory names, file names, and metadata are kept. - -The filesystem layer is a graph of directories. Each directory contains a -table of named children. These children are either other directories or -files. All children are referenced by their capability. - -A directory has two forms of capability: read-write caps and read-only caps. -The table of children inside the directory has a read-write and read-only -capability for each child. If you have a read-only capability for a given -directory, you will not be able to access the read-write capability of its -children. This results in "transitively read-only" directory access. - -By having two different capabilities, you can choose which you want to share -with someone else. If you create a new directory and share the read-write -capability for it with a friend, then you will both be able to modify its -contents. If instead you give them the read-only capability, then they will -*not* be able to modify the contents. Any capability that you receive can be -linked in to any directory that you can modify, so very powerful -shared+published directory structures can be built from these components. - -This structure enable individual users to have their own personal space, with -links to spaces that are shared with specific other users, and other spaces -that are globally visible. - - -== Leases, Refreshing, Garbage Collection == - -When a file or directory in the virtual filesystem is no longer referenced, -the space that its shares occupied on each storage server can be freed, -making room for other shares. Tahoe-LAFS uses a garbage collection ("GC") -mechanism to implement this space-reclamation process. Each share has one or -more "leases", which are managed by clients who want the file/directory to be -retained. The storage server accepts each share for a pre-defined period of -time, and is allowed to delete the share if all of the leases are cancelled -or allowed to expire. - -Garbage collection is not enabled by default: storage servers will not delete -shares without being explicitly configured to do so. When GC is enabled, -clients are responsible for renewing their leases on a periodic basis at -least frequently enough to prevent any of the leases from expiring before the -next renewal pass. - -See docs/garbage-collection.txt for further information, and how to configure -garbage collection. - - -== File Repairer == - -Shares may go away because the storage server hosting them has suffered a -failure: either temporary downtime (affecting availability of the file), or a -permanent data loss (affecting the preservation of the file). Hard drives -crash, power supplies explode, coffee spills, and asteroids strike. The goal -of a robust distributed filesystem is to survive these setbacks. - -To work against this slow, continual loss of shares, a File Checker is used -to periodically count the number of shares still available for any given -file. A more extensive form of checking known as the File Verifier can -download the ciphertext of the target file and perform integrity checks -(using strong hashes) to make sure the data is stil intact. When the file is -found to have decayed below some threshold, the File Repairer can be used to -regenerate and re-upload the missing shares. These processes are conceptually -distinct (the repairer is only run if the checker/verifier decides it is -necessary), but in practice they will be closely related, and may run in the -same process. - -The repairer process does not get the full capability of the file to be -maintained: it merely gets the "repairer capability" subset, which does not -include the decryption key. The File Verifier uses that data to find out -which nodes ought to hold shares for this file, and to see if those nodes are -still around and willing to provide the data. If the file is not healthy -enough, the File Repairer is invoked to download the ciphertext, regenerate -any missing shares, and upload them to new nodes. The goal of the File -Repairer is to finish up with a full set of "N" shares. - -There are a number of engineering issues to be resolved here. The bandwidth, -disk IO, and CPU time consumed by the verification/repair process must be -balanced against the robustness that it provides to the grid. The nodes -involved in repair will have very different access patterns than normal -nodes, such that these processes may need to be run on hosts with more memory -or network connectivity than usual. The frequency of repair will directly -affect the resources consumed. In some cases, verification of multiple files -can be performed at the same time, and repair of files can be delegated off -to other nodes. - - *future work* - - Currently there are two modes of checking on the health of your file: - "Checker" simply asks storage servers which shares they have and does - nothing to try to verify that they aren't lying. "Verifier" downloads and - cryptographically verifies every bit of every share of the file from every - server, which costs a lot of network and CPU. A future improvement would be - to make a random-sampling verifier which downloads and cryptographically - verifies only a few randomly-chosen blocks from each server. This would - require much less network and CPU but it could make it extremely unlikely - that any sort of corruption -- even malicious corruption intended to evade - detection -- would evade detection. This would be an instance of a - cryptographic notion called "Proof of Retrievability". Note that to implement - this requires no change to the server or to the cryptographic data structure - -- with the current data structure and the current protocol it is up to the - client which blocks they choose to download, so this would be solely a change - in client behavior. - - -== Security == - -The design goal for this project is that an attacker may be able to deny -service (i.e. prevent you from recovering a file that was uploaded earlier) -but can accomplish none of the following three attacks: - - 1) violate confidentiality: the attacker gets to view data to which you have - not granted them access - 2) violate integrity: the attacker convinces you that the wrong data is - actually the data you were intending to retrieve - 3) violate unforgeability: the attacker gets to modify a mutable file or - directory (either the pathnames or the file contents) to which you have - not given them write permission - -Integrity (the promise that the downloaded data will match the uploaded data) -is provided by the hashes embedded in the capability (for immutable files) or -the digital signature (for mutable files). Confidentiality (the promise that -the data is only readable by people with the capability) is provided by the -encryption key embedded in the capability (for both immutable and mutable -files). Data availability (the hope that data which has been uploaded in the -past will be downloadable in the future) is provided by the grid, which -distributes failures in a way that reduces the correlation between individual -node failure and overall file recovery failure, and by the erasure-coding -technique used to generate shares. - -Many of these security properties depend upon the usual cryptographic -assumptions: the resistance of AES and RSA to attack, the resistance of -SHA-256 to collision attacks and pre-image attacks, and upon the proximity of -2^-128 and 2^-256 to zero. A break in AES would allow a confidentiality -violation, a collision break in SHA-256 would allow a consistency violation, -and a break in RSA would allow a mutability violation. - -There is no attempt made to provide anonymity, neither of the origin of a -piece of data nor the identity of the subsequent downloaders. In general, -anyone who already knows the contents of a file will be in a strong position -to determine who else is uploading or downloading it. Also, it is quite easy -for a sufficiently large coalition of nodes to correlate the set of nodes who -are all uploading or downloading the same file, even if the attacker does not -know the contents of the file in question. - -Also note that the file size and (when convergence is being used) a keyed -hash of the plaintext are not protected. Many people can determine the size -of the file you are accessing, and if they already know the contents of a -given file, they will be able to determine that you are uploading or -downloading the same one. - -The capability-based security model is used throughout this project. -Directory operations are expressed in terms of distinct read- and write- -capabilities. Knowing the read-capability of a file is equivalent to the -ability to read the corresponding data. The capability to validate the -correctness of a file is strictly weaker than the read-capability (possession -of read-capability automatically grants you possession of -validate-capability, but not vice versa). These capabilities may be expressly -delegated (irrevocably) by simply transferring the relevant secrets. - -The application layer can provide whatever access model is desired, built on -top of this capability access model. The first big user of this system so far -is allmydata.com. The allmydata.com access model currently works like a -normal web site, using username and password to give a user access to her -"virtual drive". In addition, allmydata.com users can share individual files -(using a file sharing interface built on top of the immutable file read -capabilities). - - -== Reliability == - -File encoding and peer-node selection parameters can be adjusted to achieve -different goals. Each choice results in a number of properties; there are -many tradeoffs. - -First, some terms: the erasure-coding algorithm is described as K-out-of-N -(for this release, the default values are K=3 and N=10). Each grid will have -some number of nodes; this number will rise and fall over time as nodes join, -drop out, come back, and leave forever. Files are of various sizes, some are -popular, others are unpopular. Nodes have various capacities, variable -upload/download bandwidths, and network latency. Most of the mathematical -models that look at node failure assume some average (and independent) -probability 'P' of a given node being available: this can be high (servers -tend to be online and available >90% of the time) or low (laptops tend to be -turned on for an hour then disappear for several days). Files are encoded in -segments of a given maximum size, which affects memory usage. - -The ratio of N/K is the "expansion factor". Higher expansion factors improve -reliability very quickly (the binomial distribution curve is very sharp), but -consumes much more grid capacity. When P=50%, the absolute value of K affects -the granularity of the binomial curve (1-out-of-2 is much worse than -50-out-of-100), but high values asymptotically approach a constant (i.e. -500-of-1000 is not much better than 50-of-100). When P is high and the -expansion factor is held at a constant, higher values of K and N give much -better reliability (for P=99%, 50-out-of-100 is much much better than -5-of-10, roughly 10^50 times better), because there are more shares that can -be lost without losing the file. - -Likewise, the total number of nodes in the network affects the same -granularity: having only one node means a single point of failure, no matter -how many copies of the file you make. Independent nodes (with uncorrelated -failures) are necessary to hit the mathematical ideals: if you have 100 nodes -but they are all in the same office building, then a single power failure -will take out all of them at once. The "Sybil Attack" is where a single -attacker convinces you that they are actually multiple servers, so that you -think you are using a large number of independent nodes, but in fact you have -a single point of failure (where the attacker turns off all their machines at -once). Large grids, with lots of truly independent nodes, will enable the use -of lower expansion factors to achieve the same reliability, but will increase -overhead because each node needs to know something about every other, and the -rate at which nodes come and go will be higher (requiring network maintenance -traffic). Also, the File Repairer work will increase with larger grids, -although then the job can be distributed out to more nodes. - -Higher values of N increase overhead: more shares means more Merkle hashes -that must be included with the data, and more nodes to contact to retrieve -the shares. Smaller segment sizes reduce memory usage (since each segment -must be held in memory while erasure coding runs) and improves "alacrity" -(since downloading can validate a smaller piece of data faster, delivering it -to the target sooner), but also increase overhead (because more blocks means -more Merkle hashes to validate them). - -In general, small private grids should work well, but the participants will -have to decide between storage overhead and reliability. Large stable grids -will be able to reduce the expansion factor down to a bare minimum while -still retaining high reliability, but large unstable grids (where nodes are -coming and going very quickly) may require more repair/verification bandwidth -than actual upload/download traffic. - -Tahoe-LAFS nodes that run a webserver have a page dedicated to provisioning -decisions: this tool may help you evaluate different expansion factors and -view the disk consumption of each. It is also acquiring some sections with -availability/reliability numbers, as well as preliminary cost analysis data. -This tool will continue to evolve as our analysis improves. - ------------------------------- - -[1]: http://en.wikipedia.org/wiki/Zooko%27s_triangle - -[2]: all of these names are derived from the location where they were - concocted, in this case in a car ride from Boulder to DEN. To be - precise, "Tahoe 1" was an unworkable scheme in which everyone who holds - shares for a given file would form a sort of cabal which kept track of - all the others, "Tahoe 2" is the first-100-nodes in the permuted hash - described in this document, and "Tahoe 3" (or perhaps "Potrero hill 1") - was the abandoned ring-with-many-hands approach. - diff --git a/docs/backdoors.rst b/docs/backdoors.rst new file mode 100644 index 00000000..ad12610e --- /dev/null +++ b/docs/backdoors.rst @@ -0,0 +1,51 @@ +====================== +Statement on Backdoors +====================== + +October 5, 2010 + +The New York Times has recently reported that the current U.S. administration +is proposing a bill that would apparently, if passed, require communication +systems to facilitate government wiretapping and access to encrypted data: + + http://www.nytimes.com/2010/09/27/us/27wiretap.html (login required; username/password pairs available at http://www.bugmenot.com/view/nytimes.com). + +Commentary by the Electronic Frontier Foundation +(https://www.eff.org/deeplinks/2010/09/government-seeks ), Peter Suderman / +Reason (http://reason.com/blog/2010/09/27/obama-administration-frustrate ), +Julian Sanchez / Cato Institute +(http://www.cato-at-liberty.org/designing-an-insecure-internet/ ). + +The core Tahoe developers promise never to change Tahoe-LAFS to facilitate +government access to data stored or transmitted by it. Even if it were +desirable to facilitate such access—which it is not—we believe it would not be +technically feasible to do so without severely compromising Tahoe-LAFS' +security against other attackers. There have been many examples in which +backdoors intended for use by government have introduced vulnerabilities +exploitable by other parties (a notable example being the Greek cellphone +eavesdropping scandal in 2004/5). RFCs 1984 and 2804 elaborate on the +security case against such backdoors. + +Note that since Tahoe-LAFS is open-source software, forks by people other than +the current core developers are possible. In that event, we would try to +persuade any such forks to adopt a similar policy. + +The following Tahoe-LAFS developers agree with this statement: + +David-Sarah Hopwood + +Zooko Wilcox-O'Hearn + +Brian Warner + +Kevan Carstensen + +Frédéric Marti + +Jack Lloyd + +François Deppierraz + +Yu Xue + +Marc Tooley diff --git a/docs/backdoors.txt b/docs/backdoors.txt deleted file mode 100644 index c08e26c9..00000000 --- a/docs/backdoors.txt +++ /dev/null @@ -1,25 +0,0 @@ -Statement on Backdoors - -October 5, 2010 - -The New York Times has recently reported that the current U.S. administration is proposing a bill that would apparently, if passed, require communication systems to facilitate government wiretapping and access to encrypted data: - - http://www.nytimes.com/2010/09/27/us/27wiretap.html (login required; username/password pairs available at http://www.bugmenot.com/view/nytimes.com). - -Commentary by the Electronic Frontier Foundation (https://www.eff.org/deeplinks/2010/09/government-seeks ), Peter Suderman / Reason (http://reason.com/blog/2010/09/27/obama-administration-frustrate ), Julian Sanchez / Cato Institute (http://www.cato-at-liberty.org/designing-an-insecure-internet/ ). - -The core Tahoe developers promise never to change Tahoe-LAFS to facilitate government access to data stored or transmitted by it. Even if it were desirable to facilitate such access—which it is not—we believe it would not be technically feasible to do so without severely compromising Tahoe-LAFS' security against other attackers. There have been many examples in which backdoors intended for use by government have introduced vulnerabilities exploitable by other parties (a notable example being the Greek cellphone eavesdropping scandal in 2004/5). RFCs 1984 and 2804 elaborate on the security case against such backdoors. - -Note that since Tahoe-LAFS is open-source software, forks by people other than the current core developers are possible. In that event, we would try to persuade any such forks to adopt a similar policy. - -The following Tahoe-LAFS developers agree with this statement: - -David-Sarah Hopwood -Zooko Wilcox-O'Hearn -Brian Warner -Kevan Carstensen -Frédéric Marti -Jack Lloyd -François Deppierraz -Yu Xue -Marc Tooley diff --git a/docs/backupdb.rst b/docs/backupdb.rst new file mode 100644 index 00000000..b91a8e4d --- /dev/null +++ b/docs/backupdb.rst @@ -0,0 +1,190 @@ +================== +The Tahoe BackupDB +================== + +1. `Overview`_ +2. `Schema`_ +3. `Upload Operation`_ +4. `Directory Operations`_ + +Overview +======== +To speed up backup operations, Tahoe maintains a small database known as the +"backupdb". This is used to avoid re-uploading files which have already been +uploaded recently. + +This database lives in ~/.tahoe/private/backupdb.sqlite, and is a SQLite +single-file database. It is used by the "tahoe backup" command. In the future, +it will also be used by "tahoe mirror", and by "tahoe cp" when the +--use-backupdb option is included. + +The purpose of this database is twofold: to manage the file-to-cap +translation (the "upload" step) and the directory-to-cap translation (the +"mkdir-immutable" step). + +The overall goal of optimizing backup is to reduce the work required when the +source disk has not changed (much) since the last backup. In the ideal case, +running "tahoe backup" twice in a row, with no intervening changes to the +disk, will not require any network traffic. Minimal changes to the source +disk should result in minimal traffic. + +This database is optional. If it is deleted, the worst effect is that a +subsequent backup operation may use more effort (network bandwidth, CPU +cycles, and disk IO) than it would have without the backupdb. + +The database uses sqlite3, which is included as part of the standard python +library with python2.5 and later. For python2.4, Tahoe will try to install the +"pysqlite" package at build-time, but this will succeed only if sqlite3 with +development headers is already installed. On Debian and Debian derivatives +you can install the "python-pysqlite2" package (which, despite the name, +actually provides sqlite3 rather than sqlite2), but on old distributions such +as Debian etch (4.0 "oldstable") or Ubuntu Edgy (6.10) the "python-pysqlite2" +package won't work, but the "sqlite3-dev" package will. + +Schema +====== + +The database contains the following tables:: + + CREATE TABLE version + ( + version integer # contains one row, set to 1 + ); + + CREATE TABLE local_files + ( + path varchar(1024), PRIMARY KEY -- index, this is os.path.abspath(fn) + size integer, -- os.stat(fn)[stat.ST_SIZE] + mtime number, -- os.stat(fn)[stat.ST_MTIME] + ctime number, -- os.stat(fn)[stat.ST_CTIME] + fileid integer + ); + + CREATE TABLE caps + ( + fileid integer PRIMARY KEY AUTOINCREMENT, + filecap varchar(256) UNIQUE -- URI:CHK:... + ); + + CREATE TABLE last_upload + ( + fileid INTEGER PRIMARY KEY, + last_uploaded TIMESTAMP, + last_checked TIMESTAMP + ); + + CREATE TABLE directories + ( + dirhash varchar(256) PRIMARY KEY, + dircap varchar(256), + last_uploaded TIMESTAMP, + last_checked TIMESTAMP + ); + +Upload Operation +================ + +The upload process starts with a pathname (like ~/.emacs) and wants to end up +with a file-cap (like URI:CHK:...). + +The first step is to convert the path to an absolute form +(/home/warner/.emacs) and do a lookup in the local_files table. If the path +is not present in this table, the file must be uploaded. The upload process +is: + +1. record the file's size, creation time, and modification time + +2. upload the file into the grid, obtaining an immutable file read-cap + +3. add an entry to the 'caps' table, with the read-cap, to get a fileid + +4. add an entry to the 'last_upload' table, with the current time + +5. add an entry to the 'local_files' table, with the fileid, the path, + and the local file's size/ctime/mtime + +If the path *is* present in 'local_files', the easy-to-compute identifying +information is compared: file size and ctime/mtime. If these differ, the file +must be uploaded. The row is removed from the local_files table, and the +upload process above is followed. + +If the path is present but ctime or mtime differs, the file may have changed. +If the size differs, then the file has certainly changed. At this point, a +future version of the "backup" command might hash the file and look for a +match in an as-yet-defined table, in the hopes that the file has simply been +moved from somewhere else on the disk. This enhancement requires changes to +the Tahoe upload API before it can be significantly more efficient than +simply handing the file to Tahoe and relying upon the normal convergence to +notice the similarity. + +If ctime, mtime, or size is different, the client will upload the file, as +above. + +If these identifiers are the same, the client will assume that the file is +unchanged (unless the --ignore-timestamps option is provided, in which case +the client always re-uploads the file), and it may be allowed to skip the +upload. For safety, however, we require the client periodically perform a +filecheck on these probably-already-uploaded files, and re-upload anything +that doesn't look healthy. The client looks the fileid up in the +'last_checked' table, to see how long it has been since the file was last +checked. + +A "random early check" algorithm should be used, in which a check is +performed with a probability that increases with the age of the previous +results. E.g. files that were last checked within a month are not checked, +files that were checked 5 weeks ago are re-checked with 25% probability, 6 +weeks with 50%, more than 8 weeks are always checked. This reduces the +"thundering herd" of filechecks-on-everything that would otherwise result +when a backup operation is run one month after the original backup. If a +filecheck reveals the file is not healthy, it is re-uploaded. + +If the filecheck shows the file is healthy, or if the filecheck was skipped, +the client gets to skip the upload, and uses the previous filecap (from the +'caps' table) to add to the parent directory. + +If a new file is uploaded, a new entry is put in the 'caps' and 'last_upload' +table, and an entry is made in the 'local_files' table to reflect the mapping +from local disk pathname to uploaded filecap. If an old file is re-uploaded, +the 'last_upload' entry is updated with the new timestamps. If an old file is +checked and found healthy, the 'last_upload' entry is updated. + +Relying upon timestamps is a compromise between efficiency and safety: a file +which is modified without changing the timestamp or size will be treated as +unmodified, and the "tahoe backup" command will not copy the new contents +into the grid. The --no-timestamps can be used to disable this optimization, +forcing every byte of the file to be hashed and encoded. + +Directory Operations +==================== + +Once the contents of a directory are known (a filecap for each file, and a +dircap for each directory), the backup process must find or create a tahoe +directory node with the same contents. The contents are hashed, and the hash +is queried in the 'directories' table. If found, the last-checked timestamp +is used to perform the same random-early-check algorithm described for files +above, but no new upload is performed. Since "tahoe backup" creates immutable +directories, it is perfectly safe to re-use a directory from a previous +backup. + +If not found, the webapi "mkdir-immutable" operation is used to create a new +directory, and an entry is stored in the table. + +The comparison operation ignores timestamps and metadata, and pays attention +solely to the file names and contents. + +By using a directory-contents hash, the "tahoe backup" command is able to +re-use directories from other places in the backed up data, or from old +backups. This means that renaming a directory and moving a subdirectory to a +new parent both count as "minor changes" and will result in minimal Tahoe +operations and subsequent network traffic (new directories will be created +for the modified directory and all of its ancestors). It also means that you +can perform a backup ("#1"), delete a file or directory, perform a backup +("#2"), restore it, and then the next backup ("#3") will re-use the +directories from backup #1. + +The best case is a null backup, in which nothing has changed. This will +result in minimal network bandwidth: one directory read and two modifies. The +Archives/ directory must be read to locate the latest backup, and must be +modified to add a new snapshot, and the Latest/ directory will be updated to +point to that same snapshot. + diff --git a/docs/backupdb.txt b/docs/backupdb.txt deleted file mode 100644 index 7e4842fa..00000000 --- a/docs/backupdb.txt +++ /dev/null @@ -1,175 +0,0 @@ -= The Tahoe BackupDB = - -== Overview == -To speed up backup operations, Tahoe maintains a small database known as the -"backupdb". This is used to avoid re-uploading files which have already been -uploaded recently. - -This database lives in ~/.tahoe/private/backupdb.sqlite, and is a SQLite -single-file database. It is used by the "tahoe backup" command. In the future, -it will also be used by "tahoe mirror", and by "tahoe cp" when the ---use-backupdb option is included. - -The purpose of this database is twofold: to manage the file-to-cap -translation (the "upload" step) and the directory-to-cap translation (the -"mkdir-immutable" step). - -The overall goal of optimizing backup is to reduce the work required when the -source disk has not changed (much) since the last backup. In the ideal case, -running "tahoe backup" twice in a row, with no intervening changes to the -disk, will not require any network traffic. Minimal changes to the source -disk should result in minimal traffic. - -This database is optional. If it is deleted, the worst effect is that a -subsequent backup operation may use more effort (network bandwidth, CPU -cycles, and disk IO) than it would have without the backupdb. - -The database uses sqlite3, which is included as part of the standard python -library with python2.5 and later. For python2.4, Tahoe will try to install the -"pysqlite" package at build-time, but this will succeed only if sqlite3 with -development headers is already installed. On Debian and Debian derivatives -you can install the "python-pysqlite2" package (which, despite the name, -actually provides sqlite3 rather than sqlite2), but on old distributions such -as Debian etch (4.0 "oldstable") or Ubuntu Edgy (6.10) the "python-pysqlite2" -package won't work, but the "sqlite3-dev" package will. - -== Schema == - -The database contains the following tables: - -CREATE TABLE version -( - version integer # contains one row, set to 1 -); - -CREATE TABLE local_files -( - path varchar(1024), PRIMARY KEY -- index, this is os.path.abspath(fn) - size integer, -- os.stat(fn)[stat.ST_SIZE] - mtime number, -- os.stat(fn)[stat.ST_MTIME] - ctime number, -- os.stat(fn)[stat.ST_CTIME] - fileid integer -); - -CREATE TABLE caps -( - fileid integer PRIMARY KEY AUTOINCREMENT, - filecap varchar(256) UNIQUE -- URI:CHK:... -); - -CREATE TABLE last_upload -( - fileid INTEGER PRIMARY KEY, - last_uploaded TIMESTAMP, - last_checked TIMESTAMP -); - -CREATE TABLE directories -( - dirhash varchar(256) PRIMARY KEY, - dircap varchar(256), - last_uploaded TIMESTAMP, - last_checked TIMESTAMP -); - -== Upload Operation == - -The upload process starts with a pathname (like ~/.emacs) and wants to end up -with a file-cap (like URI:CHK:...). - -The first step is to convert the path to an absolute form -(/home/warner/.emacs) and do a lookup in the local_files table. If the path -is not present in this table, the file must be uploaded. The upload process -is: - - 1. record the file's size, creation time, and modification time - 2. upload the file into the grid, obtaining an immutable file read-cap - 3. add an entry to the 'caps' table, with the read-cap, to get a fileid - 4. add an entry to the 'last_upload' table, with the current time - 5. add an entry to the 'local_files' table, with the fileid, the path, - and the local file's size/ctime/mtime - -If the path *is* present in 'local_files', the easy-to-compute identifying -information is compared: file size and ctime/mtime. If these differ, the file -must be uploaded. The row is removed from the local_files table, and the -upload process above is followed. - -If the path is present but ctime or mtime differs, the file may have changed. -If the size differs, then the file has certainly changed. At this point, a -future version of the "backup" command might hash the file and look for a -match in an as-yet-defined table, in the hopes that the file has simply been -moved from somewhere else on the disk. This enhancement requires changes to -the Tahoe upload API before it can be significantly more efficient than -simply handing the file to Tahoe and relying upon the normal convergence to -notice the similarity. - -If ctime, mtime, or size is different, the client will upload the file, as -above. - -If these identifiers are the same, the client will assume that the file is -unchanged (unless the --ignore-timestamps option is provided, in which case -the client always re-uploads the file), and it may be allowed to skip the -upload. For safety, however, we require the client periodically perform a -filecheck on these probably-already-uploaded files, and re-upload anything -that doesn't look healthy. The client looks the fileid up in the -'last_checked' table, to see how long it has been since the file was last -checked. - -A "random early check" algorithm should be used, in which a check is -performed with a probability that increases with the age of the previous -results. E.g. files that were last checked within a month are not checked, -files that were checked 5 weeks ago are re-checked with 25% probability, 6 -weeks with 50%, more than 8 weeks are always checked. This reduces the -"thundering herd" of filechecks-on-everything that would otherwise result -when a backup operation is run one month after the original backup. If a -filecheck reveals the file is not healthy, it is re-uploaded. - -If the filecheck shows the file is healthy, or if the filecheck was skipped, -the client gets to skip the upload, and uses the previous filecap (from the -'caps' table) to add to the parent directory. - -If a new file is uploaded, a new entry is put in the 'caps' and 'last_upload' -table, and an entry is made in the 'local_files' table to reflect the mapping -from local disk pathname to uploaded filecap. If an old file is re-uploaded, -the 'last_upload' entry is updated with the new timestamps. If an old file is -checked and found healthy, the 'last_upload' entry is updated. - -Relying upon timestamps is a compromise between efficiency and safety: a file -which is modified without changing the timestamp or size will be treated as -unmodified, and the "tahoe backup" command will not copy the new contents -into the grid. The --no-timestamps can be used to disable this optimization, -forcing every byte of the file to be hashed and encoded. - -== Directory Operations == - -Once the contents of a directory are known (a filecap for each file, and a -dircap for each directory), the backup process must find or create a tahoe -directory node with the same contents. The contents are hashed, and the hash -is queried in the 'directories' table. If found, the last-checked timestamp -is used to perform the same random-early-check algorithm described for files -above, but no new upload is performed. Since "tahoe backup" creates immutable -directories, it is perfectly safe to re-use a directory from a previous -backup. - -If not found, the webapi "mkdir-immutable" operation is used to create a new -directory, and an entry is stored in the table. - -The comparison operation ignores timestamps and metadata, and pays attention -solely to the file names and contents. - -By using a directory-contents hash, the "tahoe backup" command is able to -re-use directories from other places in the backed up data, or from old -backups. This means that renaming a directory and moving a subdirectory to a -new parent both count as "minor changes" and will result in minimal Tahoe -operations and subsequent network traffic (new directories will be created -for the modified directory and all of its ancestors). It also means that you -can perform a backup ("#1"), delete a file or directory, perform a backup -("#2"), restore it, and then the next backup ("#3") will re-use the -directories from backup #1. - -The best case is a null backup, in which nothing has changed. This will -result in minimal network bandwidth: one directory read and two modifies. The -Archives/ directory must be read to locate the latest backup, and must be -modified to add a new snapshot, and the Latest/ directory will be updated to -point to that same snapshot. - diff --git a/docs/configuration.rst b/docs/configuration.rst new file mode 100644 index 00000000..1e7fcb95 --- /dev/null +++ b/docs/configuration.rst @@ -0,0 +1,572 @@ +======================== +Configuring a Tahoe node +======================== + +1. `Overall Node Configuration`_ +2. `Client Configuration`_ +3. `Storage Server Configuration`_ +4. `Running A Helper`_ +5. `Running An Introducer`_ +6. `Other Files in BASEDIR`_ +7. `Other files`_ +8. `Backwards Compatibility Files`_ +9. `Example`_ + +A Tahoe node is configured by writing to files in its base directory. These +files are read by the node when it starts, so each time you change them, you +need to restart the node. + +The node also writes state to its base directory, so it will create files on +its own. + +This document contains a complete list of the config files that are examined +by the client node, as well as the state files that you'll observe in its +base directory. + +The main file is named 'tahoe.cfg', which is an ".INI"-style configuration +file (parsed by the Python stdlib 'ConfigParser' module: "[name]" section +markers, lines with "key.subkey: value", rfc822-style continuations). There +are other files that contain information which does not easily fit into this +format. The 'tahoe create-node' or 'tahoe create-client' command will create +an initial tahoe.cfg file for you. After creation, the node will never modify +the 'tahoe.cfg' file: all persistent state is put in other files. + +The item descriptions below use the following types: + +boolean + one of (True, yes, on, 1, False, off, no, 0), case-insensitive + +strports string + a Twisted listening-port specification string, like "tcp:80" + or "tcp:3456:interface=127.0.0.1". For a full description of + the format, see + http://twistedmatrix.com/documents/current/api/twisted.application.strports.html + +FURL string + a Foolscap endpoint identifier, like + pb://soklj4y7eok5c3xkmjeqpw@192.168.69.247:44801/eqpwqtzm + + +Overall Node Configuration +========================== + +This section controls the network behavior of the node overall: which ports +and IP addresses are used, when connections are timed out, etc. This +configuration is independent of the services that the node is offering: the +same controls are used for client and introducer nodes. + +If your node is behind a firewall or NAT device and you want other clients to +connect to it, you'll need to open a port in the firewall or NAT, and specify +that port number in the tub.port option. If behind a NAT, you *may* need to +set the tub.location option described below. + +:: + + [node] + + nickname = (UTF-8 string, optional) + + This value will be displayed in management tools as this node's "nickname". + If not provided, the nickname will be set to "". This string + shall be a UTF-8 encoded unicode string. + + web.port = (strports string, optional) + + This controls where the node's webserver should listen, providing filesystem + access and node status as defined in webapi.txt . This file contains a + Twisted "strports" specification such as "3456" or + "tcp:3456:interface=127.0.0.1". The 'tahoe create-node' or 'tahoe create-client' + commands set the web.port to "tcp:3456:interface=127.0.0.1" by default; this + is overridable by the "--webport" option. You can make it use SSL by writing + "ssl:3456:privateKey=mykey.pem:certKey=cert.pem" instead. + + If this is not provided, the node will not run a web server. + + web.static = (string, optional) + + This controls where the /static portion of the URL space is served. The + value is a directory name (~username is allowed, and non-absolute names are + interpreted relative to the node's basedir) which can contain HTML and other + files. This can be used to serve a javascript-based frontend to the Tahoe + node, or other services. + + The default value is "public_html", which will serve $BASEDIR/public_html . + With the default settings, http://127.0.0.1:3456/static/foo.html will serve + the contents of $BASEDIR/public_html/foo.html . + + tub.port = (integer, optional) + + This controls which port the node uses to accept Foolscap connections from + other nodes. If not provided, the node will ask the kernel for any available + port. The port will be written to a separate file (named client.port or + introducer.port), so that subsequent runs will re-use the same port. + + tub.location = (string, optional) + + In addition to running as a client, each Tahoe node also runs as a server, + listening for connections from other Tahoe clients. The node announces its + location by publishing a "FURL" (a string with some connection hints) to the + Introducer. The string it publishes can be found in + $BASEDIR/private/storage.furl . The "tub.location" configuration controls + what location is published in this announcement. + + If you don't provide tub.location, the node will try to figure out a useful + one by itself, by using tools like 'ifconfig' to determine the set of IP + addresses on which it can be reached from nodes both near and far. It will + also include the TCP port number on which it is listening (either the one + specified by tub.port, or whichever port was assigned by the kernel when + tub.port is left unspecified). + + You might want to override this value if your node lives behind a firewall + that is doing inbound port forwarding, or if you are using other proxies + such that the local IP address or port number is not the same one that + remote clients should use to connect. You might also want to control this + when using a Tor proxy to avoid revealing your actual IP address through the + Introducer announcement. + + The value is a comma-separated string of host:port location hints, like + this: + + 123.45.67.89:8098,tahoe.example.com:8098,127.0.0.1:8098 + + A few examples: + + Emulate default behavior, assuming your host has IP address 123.45.67.89 + and the kernel-allocated port number was 8098: + + tub.port = 8098 + tub.location = 123.45.67.89:8098,127.0.0.1:8098 + + Use a DNS name so you can change the IP address more easily: + + tub.port = 8098 + tub.location = tahoe.example.com:8098 + + Run a node behind a firewall (which has an external IP address) that has + been configured to forward port 7912 to our internal node's port 8098: + + tub.port = 8098 + tub.location = external-firewall.example.com:7912 + + Run a node behind a Tor proxy (perhaps via torsocks), in client-only mode + (i.e. we can make outbound connections, but other nodes will not be able to + connect to us). The literal 'unreachable.example.org' will not resolve, but + will serve as a reminder to human observers that this node cannot be + reached. "Don't call us.. we'll call you": + + tub.port = 8098 + tub.location = unreachable.example.org:0 + + Run a node behind a Tor proxy, and make the server available as a Tor + "hidden service". (this assumes that other clients are running their node + with torsocks, such that they are prepared to connect to a .onion address). + The hidden service must first be configured in Tor, by giving it a local + port number and then obtaining a .onion name, using something in the torrc + file like: + + HiddenServiceDir /var/lib/tor/hidden_services/tahoe + HiddenServicePort 29212 127.0.0.1:8098 + + once Tor is restarted, the .onion hostname will be in + /var/lib/tor/hidden_services/tahoe/hostname . Then set up your tahoe.cfg + like: + + tub.port = 8098 + tub.location = ualhejtq2p7ohfbb.onion:29212 + + Most users will not need to set tub.location . + + Note that the old 'advertised_ip_addresses' file from earlier releases is no + longer supported. Tahoe 1.3.0 and later will ignore this file. + + log_gatherer.furl = (FURL, optional) + + If provided, this contains a single FURL string which is used to contact a + 'log gatherer', which will be granted access to the logport. This can be + used by centralized storage meshes to gather operational logs in a single + place. Note that when an old-style BASEDIR/log_gatherer.furl file exists + (see 'Backwards Compatibility Files', below), both are used. (for most other + items, the separate config file overrides the entry in tahoe.cfg) + + timeout.keepalive = (integer in seconds, optional) + timeout.disconnect = (integer in seconds, optional) + + If timeout.keepalive is provided, it is treated as an integral number of + seconds, and sets the Foolscap "keepalive timer" to that value. For each + connection to another node, if nothing has been heard for a while, we will + attempt to provoke the other end into saying something. The duration of + silence that passes before sending the PING will be between KT and 2*KT. + This is mainly intended to keep NAT boxes from expiring idle TCP sessions, + but also gives TCP's long-duration keepalive/disconnect timers some traffic + to work with. The default value is 240 (i.e. 4 minutes). + + If timeout.disconnect is provided, this is treated as an integral number of + seconds, and sets the Foolscap "disconnect timer" to that value. For each + connection to another node, if nothing has been heard for a while, we will + drop the connection. The duration of silence that passes before dropping the + connection will be between DT-2*KT and 2*DT+2*KT (please see ticket #521 for + more details). If we are sending a large amount of data to the other end + (which takes more than DT-2*KT to deliver), we might incorrectly drop the + connection. The default behavior (when this value is not provided) is to + disable the disconnect timer. + + See ticket #521 for a discussion of how to pick these timeout values. Using + 30 minutes means we'll disconnect after 22 to 68 minutes of inactivity. + Receiving data will reset this timeout, however if we have more than 22min + of data in the outbound queue (such as 800kB in two pipelined segments of 10 + shares each) and the far end has no need to contact us, our ping might be + delayed, so we may disconnect them by accident. + + ssh.port = (strports string, optional) + ssh.authorized_keys_file = (filename, optional) + + This enables an SSH-based interactive Python shell, which can be used to + inspect the internal state of the node, for debugging. To cause the node to + accept SSH connections on port 8022 from the same keys as the rest of your + account, use: + + [tub] + ssh.port = 8022 + ssh.authorized_keys_file = ~/.ssh/authorized_keys + + tempdir = (string, optional) + + This specifies a temporary directory for the webapi server to use, for + holding large files while they are being uploaded. If a webapi client + attempts to upload a 10GB file, this tempdir will need to have at least 10GB + available for the upload to complete. + + The default value is the "tmp" directory in the node's base directory (i.e. + $NODEDIR/tmp), but it can be placed elsewhere. This directory is used for + files that usually (on a unix system) go into /tmp . The string will be + interpreted relative to the node's base directory. + +Client Configuration +==================== + +:: + + [client] + introducer.furl = (FURL string, mandatory) + + This FURL tells the client how to connect to the introducer. Each Tahoe grid + is defined by an introducer. The introducer's furl is created by the + introducer node and written into its base directory when it starts, + whereupon it should be published to everyone who wishes to attach a client + to that grid + + helper.furl = (FURL string, optional) + + If provided, the node will attempt to connect to and use the given helper + for uploads. See docs/helper.txt for details. + + key_generator.furl = (FURL string, optional) + + If provided, the node will attempt to connect to and use the given + key-generator service, using RSA keys from the external process rather than + generating its own. + + stats_gatherer.furl = (FURL string, optional) + + If provided, the node will connect to the given stats gatherer and provide + it with operational statistics. + + shares.needed = (int, optional) aka "k", default 3 + shares.total = (int, optional) aka "N", N >= k, default 10 + shares.happy = (int, optional) 1 <= happy <= N, default 7 + + These three values set the default encoding parameters. Each time a new file + is uploaded, erasure-coding is used to break the ciphertext into separate + pieces. There will be "N" (i.e. shares.total) pieces created, and the file + will be recoverable if any "k" (i.e. shares.needed) pieces are retrieved. + The default values are 3-of-10 (i.e. shares.needed = 3, shares.total = 10). + Setting k to 1 is equivalent to simple replication (uploading N copies of + the file). + + These values control the tradeoff between storage overhead, performance, and + reliability. To a first approximation, a 1MB file will use (1MB*N/k) of + backend storage space (the actual value will be a bit more, because of other + forms of overhead). Up to N-k shares can be lost before the file becomes + unrecoverable, so assuming there are at least N servers, up to N-k servers + can be offline without losing the file. So large N/k ratios are more + reliable, and small N/k ratios use less disk space. Clearly, k must never be + smaller than N. + + Large values of N will slow down upload operations slightly, since more + servers must be involved, and will slightly increase storage overhead due to + the hash trees that are created. Large values of k will cause downloads to + be marginally slower, because more servers must be involved. N cannot be + larger than 256, because of the 8-bit erasure-coding algorithm that Tahoe + uses. + + shares.happy allows you control over the distribution of your immutable file. + For a successful upload, shares are guaranteed to be initially placed on + at least 'shares.happy' distinct servers, the correct functioning of any + k of which is sufficient to guarantee the availability of the uploaded file. + This value should not be larger than the number of servers on your grid. + + A value of shares.happy <= k is allowed, but does not provide any redundancy + if some servers fail or lose shares. + + (Mutable files use a different share placement algorithm that does not + consider this parameter.) + + +Storage Server Configuration +============================ + +:: + + [storage] + enabled = (boolean, optional) + + If this is True, the node will run a storage server, offering space to other + clients. If it is False, the node will not run a storage server, meaning + that no shares will be stored on this node. Use False this for clients who + do not wish to provide storage service. The default value is True. + + readonly = (boolean, optional) + + If True, the node will run a storage server but will not accept any shares, + making it effectively read-only. Use this for storage servers which are + being decommissioned: the storage/ directory could be mounted read-only, + while shares are moved to other servers. Note that this currently only + affects immutable shares. Mutable shares (used for directories) will be + written and modified anyway. See ticket #390 for the current status of this + bug. The default value is False. + + reserved_space = (str, optional) + + If provided, this value defines how much disk space is reserved: the storage + server will not accept any share which causes the amount of free disk space + to drop below this value. (The free space is measured by a call to statvfs(2) + on Unix, or GetDiskFreeSpaceEx on Windows, and is the space available to the + user account under which the storage server runs.) + + This string contains a number, with an optional case-insensitive scale + suffix like "K" or "M" or "G", and an optional "B" or "iB" suffix. So + "100MB", "100M", "100000000B", "100000000", and "100000kb" all mean the same + thing. Likewise, "1MiB", "1024KiB", and "1048576B" all mean the same thing. + + expire.enabled = + expire.mode = + expire.override_lease_duration = + expire.cutoff_date = + expire.immutable = + expire.mutable = + + These settings control garbage-collection, in which the server will delete + shares that no longer have an up-to-date lease on them. Please see the + neighboring "garbage-collection.txt" document for full details. + + +Running A Helper +================ + +A "helper" is a regular client node that also offers the "upload helper" +service. + +:: + + [helper] + enabled = (boolean, optional) + + If True, the node will run a helper (see docs/helper.txt for details). The + helper's contact FURL will be placed in private/helper.furl, from which it + can be copied to any clients which wish to use it. Clearly nodes should not + both run a helper and attempt to use one: do not create both helper.furl and + run_helper in the same node. The default is False. + + +Running An Introducer +===================== + +The introducer node uses a different '.tac' file (named introducer.tac), and +pays attention to the "[node]" section, but not the others. + +The Introducer node maintains some different state than regular client +nodes. + +BASEDIR/introducer.furl : This is generated the first time the introducer +node is started, and used again on subsequent runs, to give the introduction +service a persistent long-term identity. This file should be published and +copied into new client nodes before they are started for the first time. + + +Other Files in BASEDIR +====================== + +Some configuration is not kept in tahoe.cfg, for the following reasons: + +* it is generated by the node at startup, e.g. encryption keys. The node + never writes to tahoe.cfg +* it is generated by user action, e.g. the 'tahoe create-alias' command + +In addition, non-configuration persistent state is kept in the node's base +directory, next to the configuration knobs. + +This section describes these other files. + +private/node.pem + This contains an SSL private-key certificate. The node + generates this the first time it is started, and re-uses it on subsequent + runs. This certificate allows the node to have a cryptographically-strong + identifier (the Foolscap "TubID"), and to establish secure connections to + other nodes. + +storage/ + Nodes which host StorageServers will create this directory to hold + shares of files on behalf of other clients. There will be a directory + underneath it for each StorageIndex for which this node is holding shares. + There is also an "incoming" directory where partially-completed shares are + held while they are being received. + +client.tac + this file defines the client, by constructing the actual Client + instance each time the node is started. It is used by the 'twistd' + daemonization program (in the "-y" mode), which is run internally by the + "tahoe start" command. This file is created by the "tahoe create-node" or + "tahoe create-client" commands. + +private/control.furl + this file contains a FURL that provides access to a + control port on the client node, from which files can be uploaded and + downloaded. This file is created with permissions that prevent anyone else + from reading it (on operating systems that support such a concept), to insure + that only the owner of the client node can use this feature. This port is + intended for debugging and testing use. + +private/logport.furl + this file contains a FURL that provides access to a + 'log port' on the client node, from which operational logs can be retrieved. + Do not grant logport access to strangers, because occasionally secret + information may be placed in the logs. + +private/helper.furl + if the node is running a helper (for use by other + clients), its contact FURL will be placed here. See docs/helper.txt for more + details. + +private/root_dir.cap (optional) + The command-line tools will read a directory + cap out of this file and use it, if you don't specify a '--dir-cap' option or + if you specify '--dir-cap=root'. + +private/convergence (automatically generated) + An added secret for encrypting + immutable files. Everyone who has this same string in their + private/convergence file encrypts their immutable files in the same way when + uploading them. This causes identical files to "converge" -- to share the + same storage space since they have identical ciphertext -- which conserves + space and optimizes upload time, but it also exposes files to the possibility + of a brute-force attack by people who know that string. In this attack, if + the attacker can guess most of the contents of a file, then they can use + brute-force to learn the remaining contents. + +So the set of people who know your private/convergence string is the set of +people who converge their storage space with you when you and they upload +identical immutable files, and it is also the set of people who could mount +such an attack. + +The content of the private/convergence file is a base-32 encoded string. If +the file doesn't exist, then when the Tahoe client starts up it will generate +a random 256-bit string and write the base-32 encoding of this string into +the file. If you want to converge your immutable files with as many people as +possible, put the empty string (so that private/convergence is a zero-length +file). + +Other files +=========== + +logs/ + Each Tahoe node creates a directory to hold the log messages produced + as the node runs. These logfiles are created and rotated by the "twistd" + daemonization program, so logs/twistd.log will contain the most recent + messages, logs/twistd.log.1 will contain the previous ones, logs/twistd.log.2 + will be older still, and so on. twistd rotates logfiles after they grow + beyond 1MB in size. If the space consumed by logfiles becomes troublesome, + they should be pruned: a cron job to delete all files that were created more + than a month ago in this logs/ directory should be sufficient. + +my_nodeid + this is written by all nodes after startup, and contains a + base32-encoded (i.e. human-readable) NodeID that identifies this specific + node. This NodeID is the same string that gets displayed on the web page (in + the "which peers am I connected to" list), and the shortened form (the first + characters) is recorded in various log messages. + +Backwards Compatibility Files +============================= + +Tahoe releases before 1.3.0 had no 'tahoe.cfg' file, and used distinct files +for each item listed below. For each configuration knob, if the distinct file +exists, it will take precedence over the corresponding item in tahoe.cfg. + +=========================== =============================== ================= +Config setting File Comment +=========================== =============================== ================= +[node]nickname BASEDIR/nickname +[node]web.port BASEDIR/webport +[node]tub.port BASEDIR/client.port (for Clients, not Introducers) +[node]tub.port BASEDIR/introducer.port (for Introducers, not Clients) (note that, unlike other keys, tahoe.cfg overrides this file) +[node]tub.location BASEDIR/advertised_ip_addresses +[node]log_gatherer.furl BASEDIR/log_gatherer.furl (one per line) +[node]timeout.keepalive BASEDIR/keepalive_timeout +[node]timeout.disconnect BASEDIR/disconnect_timeout +[client]introducer.furl BASEDIR/introducer.furl +[client]helper.furl BASEDIR/helper.furl +[client]key_generator.furl BASEDIR/key_generator.furl +[client]stats_gatherer.furl BASEDIR/stats_gatherer.furl +[storage]enabled BASEDIR/no_storage (False if no_storage exists) +[storage]readonly BASEDIR/readonly_storage (True if readonly_storage exists) +[storage]sizelimit BASEDIR/sizelimit +[storage]debug_discard BASEDIR/debug_discard_storage +[helper]enabled BASEDIR/run_helper (True if run_helper exists) +=========================== =============================== ================= + +Note: the functionality of [node]ssh.port and [node]ssh.authorized_keys_file +were previously combined, controlled by the presence of a +BASEDIR/authorized_keys.SSHPORT file, in which the suffix of the filename +indicated which port the ssh server should listen on, and the contents of the +file provided the ssh public keys to accept. Support for these files has been +removed completely. To ssh into your Tahoe node, add [node]ssh.port and +[node].ssh_authorized_keys_file statements to your tahoe.cfg. + +Likewise, the functionality of [node]tub.location is a variant of the +now-unsupported BASEDIR/advertised_ip_addresses . The old file was additive +(the addresses specified in advertised_ip_addresses were used in addition to +any that were automatically discovered), whereas the new tahoe.cfg directive +is not (tub.location is used verbatim). + + +Example +======= + +The following is a sample tahoe.cfg file, containing values for all keys +described above. Note that this is not a recommended configuration (most of +these are not the default values), merely a legal one. + +:: + + [node] + nickname = Bob's Tahoe Node + tub.port = 34912 + tub.location = 123.45.67.89:8098,44.55.66.77:8098 + web.port = 3456 + log_gatherer.furl = pb://soklj4y7eok5c3xkmjeqpw@192.168.69.247:44801/eqpwqtzm + timeout.keepalive = 240 + timeout.disconnect = 1800 + ssh.port = 8022 + ssh.authorized_keys_file = ~/.ssh/authorized_keys + + [client] + introducer.furl = pb://ok45ssoklj4y7eok5c3xkmj@tahoe.example:44801/ii3uumo + helper.furl = pb://ggti5ssoklj4y7eok5c3xkmj@helper.tahoe.example:7054/kk8lhr + + [storage] + enabled = True + readonly_storage = True + sizelimit = 10000000000 + + [helper] + run_helper = True diff --git a/docs/configuration.txt b/docs/configuration.txt deleted file mode 100644 index 2f932140..00000000 --- a/docs/configuration.txt +++ /dev/null @@ -1,529 +0,0 @@ - -= Configuring a Tahoe node = - -A Tahoe node is configured by writing to files in its base directory. These -files are read by the node when it starts, so each time you change them, you -need to restart the node. - -The node also writes state to its base directory, so it will create files on -its own. - -This document contains a complete list of the config files that are examined -by the client node, as well as the state files that you'll observe in its -base directory. - -The main file is named 'tahoe.cfg', which is an ".INI"-style configuration -file (parsed by the Python stdlib 'ConfigParser' module: "[name]" section -markers, lines with "key.subkey: value", rfc822-style continuations). There -are other files that contain information which does not easily fit into this -format. The 'tahoe create-node' or 'tahoe create-client' command will create -an initial tahoe.cfg file for you. After creation, the node will never modify -the 'tahoe.cfg' file: all persistent state is put in other files. - -The item descriptions below use the following types: - - boolean: one of (True, yes, on, 1, False, off, no, 0), case-insensitive - strports string: a Twisted listening-port specification string, like "tcp:80" - or "tcp:3456:interface=127.0.0.1". For a full description of - the format, see - http://twistedmatrix.com/documents/current/api/twisted.application.strports.html - FURL string: a Foolscap endpoint identifier, like - pb://soklj4y7eok5c3xkmjeqpw@192.168.69.247:44801/eqpwqtzm - - -== Overall Node Configuration == - -This section controls the network behavior of the node overall: which ports -and IP addresses are used, when connections are timed out, etc. This -configuration is independent of the services that the node is offering: the -same controls are used for client and introducer nodes. - -If your node is behind a firewall or NAT device and you want other clients to -connect to it, you'll need to open a port in the firewall or NAT, and specify -that port number in the tub.port option. If behind a NAT, you *may* need to -set the tub.location option described below. - - -[node] - -nickname = (UTF-8 string, optional) - - This value will be displayed in management tools as this node's "nickname". - If not provided, the nickname will be set to "". This string - shall be a UTF-8 encoded unicode string. - -web.port = (strports string, optional) - - This controls where the node's webserver should listen, providing filesystem - access and node status as defined in webapi.txt . This file contains a - Twisted "strports" specification such as "3456" or - "tcp:3456:interface=127.0.0.1". The 'tahoe create-node' or 'tahoe create-client' - commands set the web.port to "tcp:3456:interface=127.0.0.1" by default; this - is overridable by the "--webport" option. You can make it use SSL by writing - "ssl:3456:privateKey=mykey.pem:certKey=cert.pem" instead. - - If this is not provided, the node will not run a web server. - -web.static = (string, optional) - - This controls where the /static portion of the URL space is served. The - value is a directory name (~username is allowed, and non-absolute names are - interpreted relative to the node's basedir) which can contain HTML and other - files. This can be used to serve a javascript-based frontend to the Tahoe - node, or other services. - - The default value is "public_html", which will serve $BASEDIR/public_html . - With the default settings, http://127.0.0.1:3456/static/foo.html will serve - the contents of $BASEDIR/public_html/foo.html . - -tub.port = (integer, optional) - - This controls which port the node uses to accept Foolscap connections from - other nodes. If not provided, the node will ask the kernel for any available - port. The port will be written to a separate file (named client.port or - introducer.port), so that subsequent runs will re-use the same port. - -tub.location = (string, optional) - - In addition to running as a client, each Tahoe node also runs as a server, - listening for connections from other Tahoe clients. The node announces its - location by publishing a "FURL" (a string with some connection hints) to the - Introducer. The string it publishes can be found in - $BASEDIR/private/storage.furl . The "tub.location" configuration controls - what location is published in this announcement. - - If you don't provide tub.location, the node will try to figure out a useful - one by itself, by using tools like 'ifconfig' to determine the set of IP - addresses on which it can be reached from nodes both near and far. It will - also include the TCP port number on which it is listening (either the one - specified by tub.port, or whichever port was assigned by the kernel when - tub.port is left unspecified). - - You might want to override this value if your node lives behind a firewall - that is doing inbound port forwarding, or if you are using other proxies - such that the local IP address or port number is not the same one that - remote clients should use to connect. You might also want to control this - when using a Tor proxy to avoid revealing your actual IP address through the - Introducer announcement. - - The value is a comma-separated string of host:port location hints, like - this: - - 123.45.67.89:8098,tahoe.example.com:8098,127.0.0.1:8098 - - A few examples: - - Emulate default behavior, assuming your host has IP address 123.45.67.89 - and the kernel-allocated port number was 8098: - - tub.port = 8098 - tub.location = 123.45.67.89:8098,127.0.0.1:8098 - - Use a DNS name so you can change the IP address more easily: - - tub.port = 8098 - tub.location = tahoe.example.com:8098 - - Run a node behind a firewall (which has an external IP address) that has - been configured to forward port 7912 to our internal node's port 8098: - - tub.port = 8098 - tub.location = external-firewall.example.com:7912 - - Run a node behind a Tor proxy (perhaps via torsocks), in client-only mode - (i.e. we can make outbound connections, but other nodes will not be able to - connect to us). The literal 'unreachable.example.org' will not resolve, but - will serve as a reminder to human observers that this node cannot be - reached. "Don't call us.. we'll call you": - - tub.port = 8098 - tub.location = unreachable.example.org:0 - - Run a node behind a Tor proxy, and make the server available as a Tor - "hidden service". (this assumes that other clients are running their node - with torsocks, such that they are prepared to connect to a .onion address). - The hidden service must first be configured in Tor, by giving it a local - port number and then obtaining a .onion name, using something in the torrc - file like: - - HiddenServiceDir /var/lib/tor/hidden_services/tahoe - HiddenServicePort 29212 127.0.0.1:8098 - - once Tor is restarted, the .onion hostname will be in - /var/lib/tor/hidden_services/tahoe/hostname . Then set up your tahoe.cfg - like: - - tub.port = 8098 - tub.location = ualhejtq2p7ohfbb.onion:29212 - - Most users will not need to set tub.location . - - Note that the old 'advertised_ip_addresses' file from earlier releases is no - longer supported. Tahoe 1.3.0 and later will ignore this file. - -log_gatherer.furl = (FURL, optional) - - If provided, this contains a single FURL string which is used to contact a - 'log gatherer', which will be granted access to the logport. This can be - used by centralized storage meshes to gather operational logs in a single - place. Note that when an old-style BASEDIR/log_gatherer.furl file exists - (see 'Backwards Compatibility Files', below), both are used. (for most other - items, the separate config file overrides the entry in tahoe.cfg) - -timeout.keepalive = (integer in seconds, optional) -timeout.disconnect = (integer in seconds, optional) - - If timeout.keepalive is provided, it is treated as an integral number of - seconds, and sets the Foolscap "keepalive timer" to that value. For each - connection to another node, if nothing has been heard for a while, we will - attempt to provoke the other end into saying something. The duration of - silence that passes before sending the PING will be between KT and 2*KT. - This is mainly intended to keep NAT boxes from expiring idle TCP sessions, - but also gives TCP's long-duration keepalive/disconnect timers some traffic - to work with. The default value is 240 (i.e. 4 minutes). - - If timeout.disconnect is provided, this is treated as an integral number of - seconds, and sets the Foolscap "disconnect timer" to that value. For each - connection to another node, if nothing has been heard for a while, we will - drop the connection. The duration of silence that passes before dropping the - connection will be between DT-2*KT and 2*DT+2*KT (please see ticket #521 for - more details). If we are sending a large amount of data to the other end - (which takes more than DT-2*KT to deliver), we might incorrectly drop the - connection. The default behavior (when this value is not provided) is to - disable the disconnect timer. - - See ticket #521 for a discussion of how to pick these timeout values. Using - 30 minutes means we'll disconnect after 22 to 68 minutes of inactivity. - Receiving data will reset this timeout, however if we have more than 22min - of data in the outbound queue (such as 800kB in two pipelined segments of 10 - shares each) and the far end has no need to contact us, our ping might be - delayed, so we may disconnect them by accident. - -ssh.port = (strports string, optional) -ssh.authorized_keys_file = (filename, optional) - - This enables an SSH-based interactive Python shell, which can be used to - inspect the internal state of the node, for debugging. To cause the node to - accept SSH connections on port 8022 from the same keys as the rest of your - account, use: - - [tub] - ssh.port = 8022 - ssh.authorized_keys_file = ~/.ssh/authorized_keys - -tempdir = (string, optional) - - This specifies a temporary directory for the webapi server to use, for - holding large files while they are being uploaded. If a webapi client - attempts to upload a 10GB file, this tempdir will need to have at least 10GB - available for the upload to complete. - - The default value is the "tmp" directory in the node's base directory (i.e. - $NODEDIR/tmp), but it can be placed elsewhere. This directory is used for - files that usually (on a unix system) go into /tmp . The string will be - interpreted relative to the node's base directory. - -== Client Configuration == - -[client] -introducer.furl = (FURL string, mandatory) - - This FURL tells the client how to connect to the introducer. Each Tahoe grid - is defined by an introducer. The introducer's furl is created by the - introducer node and written into its base directory when it starts, - whereupon it should be published to everyone who wishes to attach a client - to that grid - -helper.furl = (FURL string, optional) - - If provided, the node will attempt to connect to and use the given helper - for uploads. See docs/helper.txt for details. - -key_generator.furl = (FURL string, optional) - - If provided, the node will attempt to connect to and use the given - key-generator service, using RSA keys from the external process rather than - generating its own. - -stats_gatherer.furl = (FURL string, optional) - - If provided, the node will connect to the given stats gatherer and provide - it with operational statistics. - -shares.needed = (int, optional) aka "k", default 3 -shares.total = (int, optional) aka "N", N >= k, default 10 -shares.happy = (int, optional) 1 <= happy <= N, default 7 - - These three values set the default encoding parameters. Each time a new file - is uploaded, erasure-coding is used to break the ciphertext into separate - pieces. There will be "N" (i.e. shares.total) pieces created, and the file - will be recoverable if any "k" (i.e. shares.needed) pieces are retrieved. - The default values are 3-of-10 (i.e. shares.needed = 3, shares.total = 10). - Setting k to 1 is equivalent to simple replication (uploading N copies of - the file). - - These values control the tradeoff between storage overhead, performance, and - reliability. To a first approximation, a 1MB file will use (1MB*N/k) of - backend storage space (the actual value will be a bit more, because of other - forms of overhead). Up to N-k shares can be lost before the file becomes - unrecoverable, so assuming there are at least N servers, up to N-k servers - can be offline without losing the file. So large N/k ratios are more - reliable, and small N/k ratios use less disk space. Clearly, k must never be - smaller than N. - - Large values of N will slow down upload operations slightly, since more - servers must be involved, and will slightly increase storage overhead due to - the hash trees that are created. Large values of k will cause downloads to - be marginally slower, because more servers must be involved. N cannot be - larger than 256, because of the 8-bit erasure-coding algorithm that Tahoe - uses. - - shares.happy allows you control over the distribution of your immutable file. - For a successful upload, shares are guaranteed to be initially placed on - at least 'shares.happy' distinct servers, the correct functioning of any - k of which is sufficient to guarantee the availability of the uploaded file. - This value should not be larger than the number of servers on your grid. - - A value of shares.happy <= k is allowed, but does not provide any redundancy - if some servers fail or lose shares. - - (Mutable files use a different share placement algorithm that does not - consider this parameter.) - - -== Storage Server Configuration == - -[storage] -enabled = (boolean, optional) - - If this is True, the node will run a storage server, offering space to other - clients. If it is False, the node will not run a storage server, meaning - that no shares will be stored on this node. Use False this for clients who - do not wish to provide storage service. The default value is True. - -readonly = (boolean, optional) - - If True, the node will run a storage server but will not accept any shares, - making it effectively read-only. Use this for storage servers which are - being decommissioned: the storage/ directory could be mounted read-only, - while shares are moved to other servers. Note that this currently only - affects immutable shares. Mutable shares (used for directories) will be - written and modified anyway. See ticket #390 for the current status of this - bug. The default value is False. - -reserved_space = (str, optional) - - If provided, this value defines how much disk space is reserved: the storage - server will not accept any share which causes the amount of free disk space - to drop below this value. (The free space is measured by a call to statvfs(2) - on Unix, or GetDiskFreeSpaceEx on Windows, and is the space available to the - user account under which the storage server runs.) - - This string contains a number, with an optional case-insensitive scale - suffix like "K" or "M" or "G", and an optional "B" or "iB" suffix. So - "100MB", "100M", "100000000B", "100000000", and "100000kb" all mean the same - thing. Likewise, "1MiB", "1024KiB", and "1048576B" all mean the same thing. - -expire.enabled = -expire.mode = -expire.override_lease_duration = -expire.cutoff_date = -expire.immutable = -expire.mutable = - - These settings control garbage-collection, in which the server will delete - shares that no longer have an up-to-date lease on them. Please see the - neighboring "garbage-collection.txt" document for full details. - - -== Running A Helper == - -A "helper" is a regular client node that also offers the "upload helper" -service. - -[helper] -enabled = (boolean, optional) - - If True, the node will run a helper (see docs/helper.txt for details). The - helper's contact FURL will be placed in private/helper.furl, from which it - can be copied to any clients which wish to use it. Clearly nodes should not - both run a helper and attempt to use one: do not create both helper.furl and - run_helper in the same node. The default is False. - - -== Running An Introducer == - -The introducer node uses a different '.tac' file (named introducer.tac), and -pays attention to the "[node]" section, but not the others. - -The Introducer node maintains some different state than regular client -nodes. - -BASEDIR/introducer.furl : This is generated the first time the introducer -node is started, and used again on subsequent runs, to give the introduction -service a persistent long-term identity. This file should be published and -copied into new client nodes before they are started for the first time. - - -== Other Files in BASEDIR == - -Some configuration is not kept in tahoe.cfg, for the following reasons: - - * it is generated by the node at startup, e.g. encryption keys. The node - never writes to tahoe.cfg - * it is generated by user action, e.g. the 'tahoe create-alias' command - -In addition, non-configuration persistent state is kept in the node's base -directory, next to the configuration knobs. - -This section describes these other files. - - -private/node.pem : This contains an SSL private-key certificate. The node -generates this the first time it is started, and re-uses it on subsequent -runs. This certificate allows the node to have a cryptographically-strong -identifier (the Foolscap "TubID"), and to establish secure connections to -other nodes. - -storage/ : Nodes which host StorageServers will create this directory to hold -shares of files on behalf of other clients. There will be a directory -underneath it for each StorageIndex for which this node is holding shares. -There is also an "incoming" directory where partially-completed shares are -held while they are being received. - -client.tac : this file defines the client, by constructing the actual Client -instance each time the node is started. It is used by the 'twistd' -daemonization program (in the "-y" mode), which is run internally by the -"tahoe start" command. This file is created by the "tahoe create-node" or -"tahoe create-client" commands. - -private/control.furl : this file contains a FURL that provides access to a -control port on the client node, from which files can be uploaded and -downloaded. This file is created with permissions that prevent anyone else -from reading it (on operating systems that support such a concept), to insure -that only the owner of the client node can use this feature. This port is -intended for debugging and testing use. - -private/logport.furl : this file contains a FURL that provides access to a -'log port' on the client node, from which operational logs can be retrieved. -Do not grant logport access to strangers, because occasionally secret -information may be placed in the logs. - -private/helper.furl : if the node is running a helper (for use by other -clients), its contact FURL will be placed here. See docs/helper.txt for more -details. - -private/root_dir.cap (optional): The command-line tools will read a directory -cap out of this file and use it, if you don't specify a '--dir-cap' option or -if you specify '--dir-cap=root'. - -private/convergence (automatically generated): An added secret for encrypting -immutable files. Everyone who has this same string in their -private/convergence file encrypts their immutable files in the same way when -uploading them. This causes identical files to "converge" -- to share the -same storage space since they have identical ciphertext -- which conserves -space and optimizes upload time, but it also exposes files to the possibility -of a brute-force attack by people who know that string. In this attack, if -the attacker can guess most of the contents of a file, then they can use -brute-force to learn the remaining contents. - -So the set of people who know your private/convergence string is the set of -people who converge their storage space with you when you and they upload -identical immutable files, and it is also the set of people who could mount -such an attack. - -The content of the private/convergence file is a base-32 encoded string. If -the file doesn't exist, then when the Tahoe client starts up it will generate -a random 256-bit string and write the base-32 encoding of this string into -the file. If you want to converge your immutable files with as many people as -possible, put the empty string (so that private/convergence is a zero-length -file). - - -== Other files == - -logs/ : Each Tahoe node creates a directory to hold the log messages produced -as the node runs. These logfiles are created and rotated by the "twistd" -daemonization program, so logs/twistd.log will contain the most recent -messages, logs/twistd.log.1 will contain the previous ones, logs/twistd.log.2 -will be older still, and so on. twistd rotates logfiles after they grow -beyond 1MB in size. If the space consumed by logfiles becomes troublesome, -they should be pruned: a cron job to delete all files that were created more -than a month ago in this logs/ directory should be sufficient. - -my_nodeid : this is written by all nodes after startup, and contains a -base32-encoded (i.e. human-readable) NodeID that identifies this specific -node. This NodeID is the same string that gets displayed on the web page (in -the "which peers am I connected to" list), and the shortened form (the first -characters) is recorded in various log messages. - - -== Backwards Compatibility Files == - -Tahoe releases before 1.3.0 had no 'tahoe.cfg' file, and used distinct files -for each item listed below. For each configuration knob, if the distinct file -exists, it will take precedence over the corresponding item in tahoe.cfg . - - -[node]nickname : BASEDIR/nickname -[node]web.port : BASEDIR/webport -[node]tub.port : BASEDIR/client.port (for Clients, not Introducers) -[node]tub.port : BASEDIR/introducer.port (for Introducers, not Clients) - (note that, unlike other keys, tahoe.cfg overrides the *.port file) -[node]tub.location : replaces BASEDIR/advertised_ip_addresses -[node]log_gatherer.furl : BASEDIR/log_gatherer.furl (one per line) -[node]timeout.keepalive : BASEDIR/keepalive_timeout -[node]timeout.disconnect : BASEDIR/disconnect_timeout -[client]introducer.furl : BASEDIR/introducer.furl -[client]helper.furl : BASEDIR/helper.furl -[client]key_generator.furl : BASEDIR/key_generator.furl -[client]stats_gatherer.furl : BASEDIR/stats_gatherer.furl -[storage]enabled : BASEDIR/no_storage (False if no_storage exists) -[storage]readonly : BASEDIR/readonly_storage (True if readonly_storage exists) -[storage]sizelimit : BASEDIR/sizelimit -[storage]debug_discard : BASEDIR/debug_discard_storage -[helper]enabled : BASEDIR/run_helper (True if run_helper exists) - -Note: the functionality of [node]ssh.port and [node]ssh.authorized_keys_file -were previously combined, controlled by the presence of a -BASEDIR/authorized_keys.SSHPORT file, in which the suffix of the filename -indicated which port the ssh server should listen on, and the contents of the -file provided the ssh public keys to accept. Support for these files has been -removed completely. To ssh into your Tahoe node, add [node]ssh.port and -[node].ssh_authorized_keys_file statements to your tahoe.cfg . - -Likewise, the functionality of [node]tub.location is a variant of the -now-unsupported BASEDIR/advertised_ip_addresses . The old file was additive -(the addresses specified in advertised_ip_addresses were used in addition to -any that were automatically discovered), whereas the new tahoe.cfg directive -is not (tub.location is used verbatim). - - -== Example == - -The following is a sample tahoe.cfg file, containing values for all keys -described above. Note that this is not a recommended configuration (most of -these are not the default values), merely a legal one. - -[node] -nickname = Bob's Tahoe Node -tub.port = 34912 -tub.location = 123.45.67.89:8098,44.55.66.77:8098 -web.port = 3456 -log_gatherer.furl = pb://soklj4y7eok5c3xkmjeqpw@192.168.69.247:44801/eqpwqtzm -timeout.keepalive = 240 -timeout.disconnect = 1800 -ssh.port = 8022 -ssh.authorized_keys_file = ~/.ssh/authorized_keys - -[client] -introducer.furl = pb://ok45ssoklj4y7eok5c3xkmj@tahoe.example:44801/ii3uumo -helper.furl = pb://ggti5ssoklj4y7eok5c3xkmj@helper.tahoe.example:7054/kk8lhr - -[storage] -enabled = True -readonly_storage = True -sizelimit = 10000000000 - -[helper] -run_helper = True diff --git a/docs/debian.rst b/docs/debian.rst new file mode 100644 index 00000000..a0248f18 --- /dev/null +++ b/docs/debian.rst @@ -0,0 +1,180 @@ +============== +Debian Support +============== + +1. `Overview`_ +2. `TL;DR supporting package building instructions`_ +3. `TL;DR package building instructions for Tahoe`_ +4. `Building Debian Packages`_ +5. `Using Pre-Built Debian Packages`_ +6. `Building From Source on Debian Systems`_ + +Overview +======== + +One convenient way to install Tahoe-LAFS is with debian packages. +This document attempts to explain how to complete a desert island build for +people in a hurry. It also attempts to explain more about our Debian packaging +for those willing to read beyond the simple pragmatic packaging exercises. + +TL;DR supporting package building instructions +============================================== + +There are only four supporting packages that are currently not available from +the debian apt repositories in Debian Lenny:: + + python-foolscap python-zfec argparse zbase32 + +First, we'll install some common packages for development:: + + sudo apt-get install -y build-essential debhelper cdbs python-central \ + python-setuptools python python-dev python-twisted-core \ + fakeroot darcs python-twisted python-nevow \ + python-simplejson python-pycryptopp devscripts \ + apt-file + sudo apt-file update + + +To create packages for Lenny, we'll also install stdeb:: + + sudo apt-get install python-all-dev + STDEB_VERSION="0.5.1" + wget http://pypi.python.org/packages/source/s/stdeb/stdeb-$STDEB_VERSION.tar.gz + tar xzf stdeb-$STDEB_VERSION.tar.gz + cd stdeb-$STDEB_VERSION + python setup.py --command-packages=stdeb.command bdist_deb + sudo dpkg -i deb_dist/python-stdeb_$STDEB_VERSION-1_all.deb + +Now we're ready to build and install the zfec Debian package:: + + darcs get http://allmydata.org/source/zfec/trunk zfac + cd zfac/zfec/ + python setup.py sdist_dsc + cd `find deb_dist -mindepth 1 -maxdepth 1 -type d` && \ + dpkg-buildpackage -rfakeroot -uc -us + sudo dpkg -i ../python-zfec_1.4.6-r333-1_amd64.deb + +We need to build a pyutil package:: + + wget http://pypi.python.org/packages/source/p/pyutil/pyutil-1.6.1.tar.gz + tar -xvzf pyutil-1.6.1.tar.gz + cd pyutil-1.6.1/ + python setup.py --command-packages=stdeb.command sdist_dsc + cd deb_dist/pyutil-1.6.1/ + dpkg-buildpackage -rfakeroot -uc -us + sudo dpkg -i ../python-pyutil_1.6.1-1_all.deb + +We also need to install argparse and zbase32:: + + sudo easy_install argparse # argparse won't install with stdeb (!) :-( + sudo easy_install zbase32 # XXX TODO: package with stdeb + +Finally, we'll fetch, unpack, build and install foolscap:: + + # You may not already have Brian's key: + # gpg --recv-key 0x1514A7BD + wget http://foolscap.lothar.com/releases/foolscap-0.5.0.tar.gz.asc + wget http://foolscap.lothar.com/releases/foolscap-0.5.0.tar.gz + gpg --verify foolscap-0.5.0.tar.gz.asc + tar -xvzf foolscap-0.5.0.tar.gz + cd foolscap-0.5.0/ + python setup.py --command-packages=stdeb.command sdist_dsc + cd deb_dist/foolscap-0.5.0/ + dpkg-buildpackage -rfakeroot -uc -us + sudo dpkg -i ../python-foolscap_0.5.0-1_all.deb + +TL;DR package building instructions for Tahoe +============================================= + +If you want to build your own Debian packages from the darcs tree or from +a source release, do the following:: + + cd ~/ + mkdir src && cd src/ + darcs get --lazy http://allmydata.org/source/tahoe-lafs/trunk tahoe-lafs + cd tahoe-lafs + # set this for your Debian release name (lenny, sid, etc) + make deb-lenny-head + # You must have your dependency issues worked out by hand for this to work + sudo dpkg -i ../allmydata-tahoe_1.6.1-r4262_all.deb + +You should now have a functional desert island build of Tahoe with all of the +supported libraries as .deb packages. You'll need to edit the Debian specific +/etc/defaults/allmydata-tahoe file to get Tahoe started. Data is by default +stored in /var/lib/tahoelafsd/ and Tahoe runs as the 'tahoelafsd' user. + +Building Debian Packages +======================== + +The Tahoe source tree comes with limited support for building debian packages +on a variety of Debian and Ubuntu platforms. For each supported platform, +there is a "deb-PLATFORM-head" target in the Makefile that will produce a +debian package from a darcs checkout, using a version number that is derived +from the most recent darcs tag, plus the total number of revisions present in +the tree (e.g. "1.1-r2678"). + +To create debian packages from a Tahoe tree, you will need some additional +tools installed. The canonical list of these packages is in the +"Build-Depends" clause of misc/sid/debian/control , and includes:: + + build-essential + debhelper + cdbs + python-central + python-setuptools + python + python-dev + python-twisted-core + +In addition, to use the "deb-$PLATFORM-head" target, you will also need the +"debchange" utility from the "devscripts" package, and the "fakeroot" package. + +Some recent platforms can be handled by using the targets for the previous +release, for example if there is no "deb-hardy-head" target, try building +"deb-gutsy-head" and see if the resulting package will work. + +Note that we haven't tried to build source packages (.orig.tar.gz + dsc) yet, +and there are no such source packages in our APT repository. + +Using Pre-Built Debian Packages +=============================== + +The allmydata.org site hosts an APT repository with debian packages that are +built after each checkin. `This wiki page +`_ describes this +repository. + +The allmydata.org APT repository also includes debian packages of support +libraries, like Foolscap, zfec, pycryptopp, and everything else you need that +isn't already in debian. + +Building From Source on Debian Systems +====================================== + +Many of Tahoe's build dependencies can be satisfied by first installing +certain debian packages: simplejson is one of these. Some debian/ubuntu +platforms do not provide the necessary .egg-info metadata with their +packages, so the Tahoe build process may not believe they are present. Some +Tahoe dependencies are not present in most debian systems (such as foolscap +and zfec): debs for these are made available in the APT repository described +above. + +The Tahoe build process will acquire (via setuptools) most of the libraries +that it needs to run and which are not already present in the build +environment). + +We have observed occasional problems with this acquisition process. In some +cases, setuptools will only be half-aware of an installed debian package, +just enough to interfere with the automatic download+build of the dependency. +For example, on some platforms, if Nevow-0.9.26 is installed via a debian +package, setuptools will believe that it must download Nevow anyways, but it +will insist upon downloading that specific 0.9.26 version. Since the current +release of Nevow is 0.9.31, and 0.9.26 is no longer available for download, +this will fail. + +The Tahoe source tree currently ships with a directory full of tarballs of +dependent libraries (misc/dependencies/), to enable a "desert-island build". +There are plans to remove these tarballs from the source repository (but +still provide a way to get Tahoe source plus dependencies). This Nevow-0.9.26 +-type problem can be mitigated by putting the right dependency tarball in +misc/dependencies/ . diff --git a/docs/debian.txt b/docs/debian.txt deleted file mode 100644 index afa0a7fe..00000000 --- a/docs/debian.txt +++ /dev/null @@ -1,172 +0,0 @@ -= Debian Support = - -1. Overview -2. TL;DR supporting package building instructions -3. TL;DR package building instructions for Tahoe -4. Building Debian Packages -5. Using Pre-Built Debian Packages -6. Building From Source on Debian Systems - -= Overview == - -One convenient way to install Tahoe-LAFS is with debian packages. -This document attempts to explain how to complete a desert island build for -people in a hurry. It also attempts to explain more about our Debian packaging -for those willing to read beyond the simple pragmatic packaging exercises. - -== TL;DR supporting package building instructions == - -There are only four supporting packages that are currently not available from -the debian apt repositories in Debian Lenny: - - python-foolscap python-zfec argparse zbase32 - -First, we'll install some common packages for development: - - sudo apt-get install -y build-essential debhelper cdbs python-central \ - python-setuptools python python-dev python-twisted-core \ - fakeroot darcs python-twisted python-nevow \ - python-simplejson python-pycryptopp devscripts \ - apt-file - sudo apt-file update - - -To create packages for Lenny, we'll also install stdeb: - - sudo apt-get install python-all-dev - STDEB_VERSION="0.5.1" - wget http://pypi.python.org/packages/source/s/stdeb/stdeb-$STDEB_VERSION.tar.gz - tar xzf stdeb-$STDEB_VERSION.tar.gz - cd stdeb-$STDEB_VERSION - python setup.py --command-packages=stdeb.command bdist_deb - sudo dpkg -i deb_dist/python-stdeb_$STDEB_VERSION-1_all.deb - -Now we're ready to build and install the zfec Debian package: - - darcs get http://allmydata.org/source/zfec/trunk zfac - cd zfac/zfec/ - python setup.py sdist_dsc - cd `find deb_dist -mindepth 1 -maxdepth 1 -type d` && \ - dpkg-buildpackage -rfakeroot -uc -us - sudo dpkg -i ../python-zfec_1.4.6-r333-1_amd64.deb - -We need to build a pyutil package: - - wget http://pypi.python.org/packages/source/p/pyutil/pyutil-1.6.1.tar.gz - tar -xvzf pyutil-1.6.1.tar.gz - cd pyutil-1.6.1/ - python setup.py --command-packages=stdeb.command sdist_dsc - cd deb_dist/pyutil-1.6.1/ - dpkg-buildpackage -rfakeroot -uc -us - sudo dpkg -i ../python-pyutil_1.6.1-1_all.deb - -We also need to install argparse and zbase32: - - sudo easy_install argparse # argparse won't install with stdeb (!) :-( - sudo easy_install zbase32 # XXX TODO: package with stdeb - -Finally, we'll fetch, unpack, build and install foolscap: - - # You may not already have Brian's key: - # gpg --recv-key 0x1514A7BD - wget http://foolscap.lothar.com/releases/foolscap-0.5.0.tar.gz.asc - wget http://foolscap.lothar.com/releases/foolscap-0.5.0.tar.gz - gpg --verify foolscap-0.5.0.tar.gz.asc - tar -xvzf foolscap-0.5.0.tar.gz - cd foolscap-0.5.0/ - python setup.py --command-packages=stdeb.command sdist_dsc - cd deb_dist/foolscap-0.5.0/ - dpkg-buildpackage -rfakeroot -uc -us - sudo dpkg -i ../python-foolscap_0.5.0-1_all.deb - -== TL;DR package building instructions for Tahoe == - -If you want to build your own Debian packages from the darcs tree or from -a source release, do the following: - - cd ~/ - mkdir src && cd src/ - darcs get --lazy http://allmydata.org/source/tahoe-lafs/trunk tahoe-lafs - cd tahoe-lafs - # set this for your Debian release name (lenny, sid, etc) - make deb-lenny-head - # You must have your dependency issues worked out by hand for this to work - sudo dpkg -i ../allmydata-tahoe_1.6.1-r4262_all.deb - -You should now have a functional desert island build of Tahoe with all of the -supported libraries as .deb packages. You'll need to edit the Debian specific -/etc/defaults/allmydata-tahoe file to get Tahoe started. Data is by default -stored in /var/lib/tahoelafsd/ and Tahoe runs as the 'tahoelafsd' user. - -== Building Debian Packages == - -The Tahoe source tree comes with limited support for building debian packages -on a variety of Debian and Ubuntu platforms. For each supported platform, -there is a "deb-PLATFORM-head" target in the Makefile that will produce a -debian package from a darcs checkout, using a version number that is derived -from the most recent darcs tag, plus the total number of revisions present in -the tree (e.g. "1.1-r2678"). - -To create debian packages from a Tahoe tree, you will need some additional -tools installed. The canonical list of these packages is in the -"Build-Depends" clause of misc/sid/debian/control , and includes: - - build-essential - debhelper - cdbs - python-central - python-setuptools - python - python-dev - python-twisted-core - -In addition, to use the "deb-$PLATFORM-head" target, you will also need the -"debchange" utility from the "devscripts" package, and the "fakeroot" package. - -Some recent platforms can be handled by using the targets for the previous -release, for example if there is no "deb-hardy-head" target, try building -"deb-gutsy-head" and see if the resulting package will work. - -Note that we haven't tried to build source packages (.orig.tar.gz + dsc) yet, -and there are no such source packages in our APT repository. - -== Using Pre-Built Debian Packages == - -The allmydata.org site hosts an APT repository with debian packages that are -built after each checkin. The following wiki page describes this repository: - - http://allmydata.org/trac/tahoe/wiki/DownloadDebianPackages - -The allmydata.org APT repository also includes debian packages of support -libraries, like Foolscap, zfec, pycryptopp, and everything else you need that -isn't already in debian. - -== Building From Source on Debian Systems == - -Many of Tahoe's build dependencies can be satisfied by first installing -certain debian packages: simplejson is one of these. Some debian/ubuntu -platforms do not provide the necessary .egg-info metadata with their -packages, so the Tahoe build process may not believe they are present. Some -Tahoe dependencies are not present in most debian systems (such as foolscap -and zfec): debs for these are made available in the APT repository described -above. - -The Tahoe build process will acquire (via setuptools) most of the libraries -that it needs to run and which are not already present in the build -environment). - -We have observed occasional problems with this acquisition process. In some -cases, setuptools will only be half-aware of an installed debian package, -just enough to interfere with the automatic download+build of the dependency. -For example, on some platforms, if Nevow-0.9.26 is installed via a debian -package, setuptools will believe that it must download Nevow anyways, but it -will insist upon downloading that specific 0.9.26 version. Since the current -release of Nevow is 0.9.31, and 0.9.26 is no longer available for download, -this will fail. - -The Tahoe source tree currently ships with a directory full of tarballs of -dependent libraries (misc/dependencies/), to enable a "desert-island build". -There are plans to remove these tarballs from the source repository (but -still provide a way to get Tahoe source plus dependencies). This Nevow-0.9.26 --type problem can be mitigated by putting the right dependency tarball in -misc/dependencies/ . diff --git a/docs/filesystem-notes.rst b/docs/filesystem-notes.rst new file mode 100644 index 00000000..7c261b6d --- /dev/null +++ b/docs/filesystem-notes.rst @@ -0,0 +1,24 @@ +========================= +Filesystem-specific notes +========================= + +1. ext3_ + +Tahoe storage servers use a large number of subdirectories to store their +shares on local disk. This format is simple and robust, but depends upon the +local filesystem to provide fast access to those directories. + +ext3 +==== + +For moderate- or large-sized storage servers, you'll want to make sure the +"directory index" feature is enabled on your ext3 directories, otherwise +share lookup may be very slow. Recent versions of ext3 enable this +automatically, but older filesystems may not have it enabled:: + + $ sudo tune2fs -l /dev/sda1 |grep feature + Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file + +If "dir_index" is present in the "features:" line, then you're all set. If +not, you'll need to use tune2fs and e2fsck to enable and build the index. See + for some hints. diff --git a/docs/filesystem-notes.txt b/docs/filesystem-notes.txt deleted file mode 100644 index 34e02029..00000000 --- a/docs/filesystem-notes.txt +++ /dev/null @@ -1,18 +0,0 @@ - -Tahoe storage servers use a large number of subdirectories to store their -shares on local disk. This format is simple and robust, but depends upon the -local filesystem to provide fast access to those directories. - -= ext3 = - -For moderate- or large-sized storage servers, you'll want to make sure the -"directory index" feature is enabled on your ext3 directories, otherwise -share lookup may be very slow. Recent versions of ext3 enable this -automatically, but older filesystems may not have it enabled. - -$ sudo tune2fs -l /dev/sda1 |grep feature -Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file - -If "dir_index" is present in the "features:" line, then you're all set. If -not, you'll need to use tune2fs and e2fsck to enable and build the index. See -this page for some hints: http://wiki.dovecot.org/MailboxFormat/Maildir . diff --git a/docs/garbage-collection.rst b/docs/garbage-collection.rst new file mode 100644 index 00000000..5bfe6548 --- /dev/null +++ b/docs/garbage-collection.rst @@ -0,0 +1,295 @@ +=========================== +Garbage Collection in Tahoe +=========================== + +1. `Overview`_ +2. `Client-side Renewal`_ +3. `Server Side Expiration`_ +4. `Expiration Progress`_ +5. `Future Directions`_ + +Overview +======== + +When a file or directory in the virtual filesystem is no longer referenced, +the space that its shares occupied on each storage server can be freed, +making room for other shares. Tahoe currently uses a garbage collection +("GC") mechanism to implement this space-reclamation process. Each share has +one or more "leases", which are managed by clients who want the +file/directory to be retained. The storage server accepts each share for a +pre-defined period of time, and is allowed to delete the share if all of the +leases are cancelled or allowed to expire. + +Garbage collection is not enabled by default: storage servers will not delete +shares without being explicitly configured to do so. When GC is enabled, +clients are responsible for renewing their leases on a periodic basis at +least frequently enough to prevent any of the leases from expiring before the +next renewal pass. + +There are several tradeoffs to be considered when choosing the renewal timer +and the lease duration, and there is no single optimal pair of values. See +the "lease-tradeoffs.svg" diagram to get an idea for the tradeoffs involved. +If lease renewal occurs quickly and with 100% reliability, than any renewal +time that is shorter than the lease duration will suffice, but a larger ratio +of duration-over-renewal-time will be more robust in the face of occasional +delays or failures. + +The current recommended values for a small Tahoe grid are to renew the leases +once a week, and to give each lease a duration of 31 days. Renewing leases +can be expected to take about one second per file/directory, depending upon +the number of servers and the network speeds involved. Note that in the +current release, the server code enforces a 31 day lease duration: there is +not yet a way for the client to request a different duration (however the +server can use the "expire.override_lease_duration" configuration setting to +increase or decrease the effective duration to something other than 31 days). + +Client-side Renewal +=================== + +If all of the files and directories which you care about are reachable from a +single starting point (usually referred to as a "rootcap"), and you store +that rootcap as an alias (via "tahoe create-alias"), then the simplest way to +renew these leases is with the following CLI command: + + tahoe deep-check --add-lease ALIAS: + +This will recursively walk every directory under the given alias and renew +the leases on all files and directories. (You may want to add a --repair flag +to perform repair at the same time). Simply run this command once a week (or +whatever other renewal period your grid recommends) and make sure it +completes successfully. As a side effect, a manifest of all unique files and +directories will be emitted to stdout, as well as a summary of file sizes and +counts. It may be useful to track these statistics over time. + +Note that newly uploaded files (and newly created directories) get an initial +lease too: the --add-lease process is only needed to ensure that all older +objects have up-to-date leases on them. + +For larger systems (such as a commercial grid), a separate "maintenance +daemon" is under development. This daemon will acquire manifests from +rootcaps on a periodic basis, keep track of checker results, manage +lease-addition, and prioritize repair needs, using multiple worker nodes to +perform these jobs in parallel. Eventually, this daemon will be made +appropriate for use by individual users as well, and may be incorporated +directly into the client node. + +Server Side Expiration +====================== + +Expiration must be explicitly enabled on each storage server, since the +default behavior is to never expire shares. Expiration is enabled by adding +config keys to the "[storage]" section of the tahoe.cfg file (as described +below) and restarting the server node. + +Each lease has two parameters: a create/renew timestamp and a duration. The +timestamp is updated when the share is first uploaded (i.e. the file or +directory is created), and updated again each time the lease is renewed (i.e. +"tahoe check --add-lease" is performed). The duration is currently fixed at +31 days, and the "nominal lease expiration time" is simply $duration seconds +after the $create_renew timestamp. (In a future release of Tahoe, the client +will get to request a specific duration, and the server will accept or reject +the request depending upon its local configuration, so that servers can +achieve better control over their storage obligations). + +The lease-expiration code has two modes of operation. The first is age-based: +leases are expired when their age is greater than their duration. This is the +preferred mode: as long as clients consistently update their leases on a +periodic basis, and that period is shorter than the lease duration, then all +active files and directories will be preserved, and the garbage will +collected in a timely fashion. + +Since there is not yet a way for clients to request a lease duration of other +than 31 days, there is a tahoe.cfg setting to override the duration of all +leases. If, for example, this alternative duration is set to 60 days, then +clients could safely renew their leases with an add-lease operation perhaps +once every 50 days: even though nominally their leases would expire 31 days +after the renewal, the server would not actually expire the leases until 60 +days after renewal. + +The other mode is an absolute-date-cutoff: it compares the create/renew +timestamp against some absolute date, and expires any lease which was not +created or renewed since the cutoff date. If all clients have performed an +add-lease some time after March 20th, you could tell the storage server to +expire all leases that were created or last renewed on March 19th or earlier. +This is most useful if you have a manual (non-periodic) add-lease process. +Note that there is not much point to running a storage server in this mode +for a long period of time: once the lease-checker has examined all shares and +expired whatever it is going to expire, the second and subsequent passes are +not going to find any new leases to remove. + +The tahoe.cfg file uses the following keys to control lease expiration:: + + [storage] + + expire.enabled = (boolean, optional) + + If this is True, the storage server will delete shares on which all leases + have expired. Other controls dictate when leases are considered to have + expired. The default is False. + + expire.mode = (string, "age" or "cutoff-date", required if expiration enabled) + + If this string is "age", the age-based expiration scheme is used, and the + "expire.override_lease_duration" setting can be provided to influence the + lease ages. If it is "cutoff-date", the absolute-date-cutoff mode is used, + and the "expire.cutoff_date" setting must be provided to specify the cutoff + date. The mode setting currently has no default: you must provide a value. + + In a future release, this setting is likely to default to "age", but in this + release it was deemed safer to require an explicit mode specification. + + expire.override_lease_duration = (duration string, optional) + + When age-based expiration is in use, a lease will be expired if its + "lease.create_renew" timestamp plus its "lease.duration" time is + earlier/older than the current time. This key, if present, overrides the + duration value for all leases, changing the algorithm from: + + if (lease.create_renew_timestamp + lease.duration) < now: + expire_lease() + + to: + + if (lease.create_renew_timestamp + override_lease_duration) < now: + expire_lease() + + The value of this setting is a "duration string", which is a number of days, + months, or years, followed by a units suffix, and optionally separated by a + space, such as one of the following: + + 7days + 31day + 60 days + 2mo + 3 month + 12 months + 2years + + This key is meant to compensate for the fact that clients do not yet have + the ability to ask for leases that last longer than 31 days. A grid which + wants to use faster or slower GC than a 31-day lease timer permits can use + this parameter to implement it. The current fixed 31-day lease duration + makes the server behave as if "lease.override_lease_duration = 31days" had + been passed. + + This key is only valid when age-based expiration is in use (i.e. when + "expire.mode = age" is used). It will be rejected if cutoff-date expiration + is in use. + + expire.cutoff_date = (date string, required if mode=cutoff-date) + + When cutoff-date expiration is in use, a lease will be expired if its + create/renew timestamp is older than the cutoff date. This string will be a + date in the following format: + + 2009-01-16 (January 16th, 2009) + 2008-02-02 + 2007-12-25 + + The actual cutoff time shall be midnight UTC at the beginning of the given + day. Lease timers should naturally be generous enough to not depend upon + differences in timezone: there should be at least a few days between the + last renewal time and the cutoff date. + + This key is only valid when cutoff-based expiration is in use (i.e. when + "expire.mode = cutoff-date"). It will be rejected if age-based expiration is + in use. + + expire.immutable = (boolean, optional) + + If this is False, then immutable shares will never be deleted, even if their + leases have expired. This can be used in special situations to perform GC on + mutable files but not immutable ones. The default is True. + + expire.mutable = (boolean, optional) + + If this is False, then mutable shares will never be deleted, even if their + leases have expired. This can be used in special situations to perform GC on + immutable files but not mutable ones. The default is True. + +Expiration Progress +=================== + +In the current release, leases are stored as metadata in each share file, and +no separate database is maintained. As a result, checking and expiring leases +on a large server may require multiple reads from each of several million +share files. This process can take a long time and be very disk-intensive, so +a "share crawler" is used. The crawler limits the amount of time looking at +shares to a reasonable percentage of the storage server's overall usage: by +default it uses no more than 10% CPU, and yields to other code after 100ms. A +typical server with 1.1M shares was observed to take 3.5 days to perform this +rate-limited crawl through the whole set of shares, with expiration disabled. +It is expected to take perhaps 4 or 5 days to do the crawl with expiration +turned on. + +The crawler's status is displayed on the "Storage Server Status Page", a web +page dedicated to the storage server. This page resides at $NODEURL/storage, +and there is a link to it from the front "welcome" page. The "Lease +Expiration crawler" section of the status page shows the progress of the +current crawler cycle, expected completion time, amount of space recovered, +and details of how many shares have been examined. + +The crawler's state is persistent: restarting the node will not cause it to +lose significant progress. The state file is located in two files +($BASEDIR/storage/lease_checker.state and lease_checker.history), and the +crawler can be forcibly reset by stopping the node, deleting these two files, +then restarting the node. + +Future Directions +================= + +Tahoe's GC mechanism is undergoing significant changes. The global +mark-and-sweep garbage-collection scheme can require considerable network +traffic for large grids, interfering with the bandwidth available for regular +uploads and downloads (and for non-Tahoe users of the network). + +A preferable method might be to have a timer-per-client instead of a +timer-per-lease: the leases would not be expired until/unless the client had +not checked in with the server for a pre-determined duration. This would +reduce the network traffic considerably (one message per week instead of +thousands), but retain the same general failure characteristics. + +In addition, using timers is not fail-safe (from the client's point of view), +in that a client which leaves the network for an extended period of time may +return to discover that all of their files have been garbage-collected. (It +*is* fail-safe from the server's point of view, in that a server is not +obligated to provide disk space in perpetuity to an unresponsive client). It +may be useful to create a "renewal agent" to which a client can pass a list +of renewal-caps: the agent then takes the responsibility for keeping these +leases renewed, so the client can go offline safely. Of course, this requires +a certain amount of coordination: the renewal agent should not be keeping +files alive that the client has actually deleted. The client can send the +renewal-agent a manifest of renewal caps, and each new manifest should +replace the previous set. + +The GC mechanism is also not immediate: a client which deletes a file will +nevertheless be consuming extra disk space (and might be charged or otherwise +held accountable for it) until the ex-file's leases finally expire on their +own. If the client is certain that they've removed their last reference to +the file, they could accelerate the GC process by cancelling their lease. The +current storage server API provides a method to cancel a lease, but the +client must be careful to coordinate with anyone else who might be +referencing the same lease (perhaps a second directory in the same virtual +drive), otherwise they might accidentally remove a lease that should have +been retained. + +In the current release, these leases are each associated with a single "node +secret" (stored in $BASEDIR/private/secret), which is used to generate +renewal- and cancel- secrets for each lease. Two nodes with different secrets +will produce separate leases, and will not be able to renew or cancel each +others' leases. + +Once the Accounting project is in place, leases will be scoped by a +sub-delegatable "account id" instead of a node secret, so clients will be able +to manage multiple leases per file. In addition, servers will be able to +identify which shares are leased by which clients, so that clients can safely +reconcile their idea of which files/directories are active against the +server's list, and explicitly cancel leases on objects that aren't on the +active list. + +By reducing the size of the "lease scope", the coordination problem is made +easier. In general, mark-and-sweep is easier to implement (it requires mere +vigilance, rather than coordination), so unless the space used by deleted +files is not expiring fast enough, the renew/expire timed lease approach is +recommended. + diff --git a/docs/garbage-collection.txt b/docs/garbage-collection.txt deleted file mode 100644 index 8543a3fb..00000000 --- a/docs/garbage-collection.txt +++ /dev/null @@ -1,288 +0,0 @@ -= Garbage Collection in Tahoe = - -1. Overview -2. Client-side Renewal -3. Server Side Expiration -4. Expiration Progress -5. Future Directions - -== Overview == - -When a file or directory in the virtual filesystem is no longer referenced, -the space that its shares occupied on each storage server can be freed, -making room for other shares. Tahoe currently uses a garbage collection -("GC") mechanism to implement this space-reclamation process. Each share has -one or more "leases", which are managed by clients who want the -file/directory to be retained. The storage server accepts each share for a -pre-defined period of time, and is allowed to delete the share if all of the -leases are cancelled or allowed to expire. - -Garbage collection is not enabled by default: storage servers will not delete -shares without being explicitly configured to do so. When GC is enabled, -clients are responsible for renewing their leases on a periodic basis at -least frequently enough to prevent any of the leases from expiring before the -next renewal pass. - -There are several tradeoffs to be considered when choosing the renewal timer -and the lease duration, and there is no single optimal pair of values. See -the "lease-tradeoffs.svg" diagram to get an idea for the tradeoffs involved. -If lease renewal occurs quickly and with 100% reliability, than any renewal -time that is shorter than the lease duration will suffice, but a larger ratio -of duration-over-renewal-time will be more robust in the face of occasional -delays or failures. - -The current recommended values for a small Tahoe grid are to renew the leases -once a week, and to give each lease a duration of 31 days. Renewing leases -can be expected to take about one second per file/directory, depending upon -the number of servers and the network speeds involved. Note that in the -current release, the server code enforces a 31 day lease duration: there is -not yet a way for the client to request a different duration (however the -server can use the "expire.override_lease_duration" configuration setting to -increase or decrease the effective duration to something other than 31 days). - -== Client-side Renewal == - -If all of the files and directories which you care about are reachable from a -single starting point (usually referred to as a "rootcap"), and you store -that rootcap as an alias (via "tahoe create-alias"), then the simplest way to -renew these leases is with the following CLI command: - - tahoe deep-check --add-lease ALIAS: - -This will recursively walk every directory under the given alias and renew -the leases on all files and directories. (You may want to add a --repair flag -to perform repair at the same time). Simply run this command once a week (or -whatever other renewal period your grid recommends) and make sure it -completes successfully. As a side effect, a manifest of all unique files and -directories will be emitted to stdout, as well as a summary of file sizes and -counts. It may be useful to track these statistics over time. - -Note that newly uploaded files (and newly created directories) get an initial -lease too: the --add-lease process is only needed to ensure that all older -objects have up-to-date leases on them. - -For larger systems (such as a commercial grid), a separate "maintenance -daemon" is under development. This daemon will acquire manifests from -rootcaps on a periodic basis, keep track of checker results, manage -lease-addition, and prioritize repair needs, using multiple worker nodes to -perform these jobs in parallel. Eventually, this daemon will be made -appropriate for use by individual users as well, and may be incorporated -directly into the client node. - -== Server Side Expiration == - -Expiration must be explicitly enabled on each storage server, since the -default behavior is to never expire shares. Expiration is enabled by adding -config keys to the "[storage]" section of the tahoe.cfg file (as described -below) and restarting the server node. - -Each lease has two parameters: a create/renew timestamp and a duration. The -timestamp is updated when the share is first uploaded (i.e. the file or -directory is created), and updated again each time the lease is renewed (i.e. -"tahoe check --add-lease" is performed). The duration is currently fixed at -31 days, and the "nominal lease expiration time" is simply $duration seconds -after the $create_renew timestamp. (In a future release of Tahoe, the client -will get to request a specific duration, and the server will accept or reject -the request depending upon its local configuration, so that servers can -achieve better control over their storage obligations). - -The lease-expiration code has two modes of operation. The first is age-based: -leases are expired when their age is greater than their duration. This is the -preferred mode: as long as clients consistently update their leases on a -periodic basis, and that period is shorter than the lease duration, then all -active files and directories will be preserved, and the garbage will -collected in a timely fashion. - -Since there is not yet a way for clients to request a lease duration of other -than 31 days, there is a tahoe.cfg setting to override the duration of all -leases. If, for example, this alternative duration is set to 60 days, then -clients could safely renew their leases with an add-lease operation perhaps -once every 50 days: even though nominally their leases would expire 31 days -after the renewal, the server would not actually expire the leases until 60 -days after renewal. - -The other mode is an absolute-date-cutoff: it compares the create/renew -timestamp against some absolute date, and expires any lease which was not -created or renewed since the cutoff date. If all clients have performed an -add-lease some time after March 20th, you could tell the storage server to -expire all leases that were created or last renewed on March 19th or earlier. -This is most useful if you have a manual (non-periodic) add-lease process. -Note that there is not much point to running a storage server in this mode -for a long period of time: once the lease-checker has examined all shares and -expired whatever it is going to expire, the second and subsequent passes are -not going to find any new leases to remove. - -The tahoe.cfg file uses the following keys to control lease expiration: - -[storage] - -expire.enabled = (boolean, optional) - - If this is True, the storage server will delete shares on which all leases - have expired. Other controls dictate when leases are considered to have - expired. The default is False. - -expire.mode = (string, "age" or "cutoff-date", required if expiration enabled) - - If this string is "age", the age-based expiration scheme is used, and the - "expire.override_lease_duration" setting can be provided to influence the - lease ages. If it is "cutoff-date", the absolute-date-cutoff mode is used, - and the "expire.cutoff_date" setting must be provided to specify the cutoff - date. The mode setting currently has no default: you must provide a value. - - In a future release, this setting is likely to default to "age", but in this - release it was deemed safer to require an explicit mode specification. - -expire.override_lease_duration = (duration string, optional) - - When age-based expiration is in use, a lease will be expired if its - "lease.create_renew" timestamp plus its "lease.duration" time is - earlier/older than the current time. This key, if present, overrides the - duration value for all leases, changing the algorithm from: - - if (lease.create_renew_timestamp + lease.duration) < now: - expire_lease() - - to: - - if (lease.create_renew_timestamp + override_lease_duration) < now: - expire_lease() - - The value of this setting is a "duration string", which is a number of days, - months, or years, followed by a units suffix, and optionally separated by a - space, such as one of the following: - - 7days - 31day - 60 days - 2mo - 3 month - 12 months - 2years - - This key is meant to compensate for the fact that clients do not yet have - the ability to ask for leases that last longer than 31 days. A grid which - wants to use faster or slower GC than a 31-day lease timer permits can use - this parameter to implement it. The current fixed 31-day lease duration - makes the server behave as if "lease.override_lease_duration = 31days" had - been passed. - - This key is only valid when age-based expiration is in use (i.e. when - "expire.mode = age" is used). It will be rejected if cutoff-date expiration - is in use. - -expire.cutoff_date = (date string, required if mode=cutoff-date) - - When cutoff-date expiration is in use, a lease will be expired if its - create/renew timestamp is older than the cutoff date. This string will be a - date in the following format: - - 2009-01-16 (January 16th, 2009) - 2008-02-02 - 2007-12-25 - - The actual cutoff time shall be midnight UTC at the beginning of the given - day. Lease timers should naturally be generous enough to not depend upon - differences in timezone: there should be at least a few days between the - last renewal time and the cutoff date. - - This key is only valid when cutoff-based expiration is in use (i.e. when - "expire.mode = cutoff-date"). It will be rejected if age-based expiration is - in use. - -expire.immutable = (boolean, optional) - - If this is False, then immutable shares will never be deleted, even if their - leases have expired. This can be used in special situations to perform GC on - mutable files but not immutable ones. The default is True. - -expire.mutable = (boolean, optional) - - If this is False, then mutable shares will never be deleted, even if their - leases have expired. This can be used in special situations to perform GC on - immutable files but not mutable ones. The default is True. - -== Expiration Progress == - -In the current release, leases are stored as metadata in each share file, and -no separate database is maintained. As a result, checking and expiring leases -on a large server may require multiple reads from each of several million -share files. This process can take a long time and be very disk-intensive, so -a "share crawler" is used. The crawler limits the amount of time looking at -shares to a reasonable percentage of the storage server's overall usage: by -default it uses no more than 10% CPU, and yields to other code after 100ms. A -typical server with 1.1M shares was observed to take 3.5 days to perform this -rate-limited crawl through the whole set of shares, with expiration disabled. -It is expected to take perhaps 4 or 5 days to do the crawl with expiration -turned on. - -The crawler's status is displayed on the "Storage Server Status Page", a web -page dedicated to the storage server. This page resides at $NODEURL/storage, -and there is a link to it from the front "welcome" page. The "Lease -Expiration crawler" section of the status page shows the progress of the -current crawler cycle, expected completion time, amount of space recovered, -and details of how many shares have been examined. - -The crawler's state is persistent: restarting the node will not cause it to -lose significant progress. The state file is located in two files -($BASEDIR/storage/lease_checker.state and lease_checker.history), and the -crawler can be forcibly reset by stopping the node, deleting these two files, -then restarting the node. - -== Future Directions == - -Tahoe's GC mechanism is undergoing significant changes. The global -mark-and-sweep garbage-collection scheme can require considerable network -traffic for large grids, interfering with the bandwidth available for regular -uploads and downloads (and for non-Tahoe users of the network). - -A preferable method might be to have a timer-per-client instead of a -timer-per-lease: the leases would not be expired until/unless the client had -not checked in with the server for a pre-determined duration. This would -reduce the network traffic considerably (one message per week instead of -thousands), but retain the same general failure characteristics. - -In addition, using timers is not fail-safe (from the client's point of view), -in that a client which leaves the network for an extended period of time may -return to discover that all of their files have been garbage-collected. (It -*is* fail-safe from the server's point of view, in that a server is not -obligated to provide disk space in perpetuity to an unresponsive client). It -may be useful to create a "renewal agent" to which a client can pass a list -of renewal-caps: the agent then takes the responsibility for keeping these -leases renewed, so the client can go offline safely. Of course, this requires -a certain amount of coordination: the renewal agent should not be keeping -files alive that the client has actually deleted. The client can send the -renewal-agent a manifest of renewal caps, and each new manifest should -replace the previous set. - -The GC mechanism is also not immediate: a client which deletes a file will -nevertheless be consuming extra disk space (and might be charged or otherwise -held accountable for it) until the ex-file's leases finally expire on their -own. If the client is certain that they've removed their last reference to -the file, they could accelerate the GC process by cancelling their lease. The -current storage server API provides a method to cancel a lease, but the -client must be careful to coordinate with anyone else who might be -referencing the same lease (perhaps a second directory in the same virtual -drive), otherwise they might accidentally remove a lease that should have -been retained. - -In the current release, these leases are each associated with a single "node -secret" (stored in $BASEDIR/private/secret), which is used to generate -renewal- and cancel- secrets for each lease. Two nodes with different secrets -will produce separate leases, and will not be able to renew or cancel each -others' leases. - -Once the Accounting project is in place, leases will be scoped by a -sub-delegatable "account id" instead of a node secret, so clients will be able -to manage multiple leases per file. In addition, servers will be able to -identify which shares are leased by which clients, so that clients can safely -reconcile their idea of which files/directories are active against the -server's list, and explicitly cancel leases on objects that aren't on the -active list. - -By reducing the size of the "lease scope", the coordination problem is made -easier. In general, mark-and-sweep is easier to implement (it requires mere -vigilance, rather than coordination), so unless the space used by deleted -files is not expiring fast enough, the renew/expire timed lease approach is -recommended. - diff --git a/docs/helper.rst b/docs/helper.rst new file mode 100644 index 00000000..8d0283a2 --- /dev/null +++ b/docs/helper.rst @@ -0,0 +1,176 @@ +======================= +The Tahoe Upload Helper +======================= + +1. `Overview`_ +2. `Setting Up A Helper`_ +3. `Using a Helper`_ +4. `Other Helper Modes`_ + +Overview +======== + +As described in the "SWARMING DOWNLOAD, TRICKLING UPLOAD" section of +architecture.txt, Tahoe uploads require more bandwidth than downloads: you +must push the redundant shares during upload, but you do not need to retrieve +them during download. With the default 3-of-10 encoding parameters, this +means that an upload will require about 3.3x the traffic as a download of the +same file. + +Unfortunately, this "expansion penalty" occurs in the same upstream direction +that most consumer DSL lines are slow anyways. Typical ADSL lines get 8 times +as much download capacity as upload capacity. When the ADSL upstream penalty +is combined with the expansion penalty, the result is uploads that can take +up to 32 times longer than downloads. + +The "Helper" is a service that can mitigate the expansion penalty by +arranging for the client node to send data to a central Helper node instead +of sending it directly to the storage servers. It sends ciphertext to the +Helper, so the security properties remain the same as with non-Helper +uploads. The Helper is responsible for applying the erasure encoding +algorithm and placing the resulting shares on the storage servers. + +Of course, the helper cannot mitigate the ADSL upstream penalty. + +The second benefit of using an upload helper is that clients who lose their +network connections while uploading a file (because of a network flap, or +because they shut down their laptop while an upload was in progress) can +resume their upload rather than needing to start again from scratch. The +helper holds the partially-uploaded ciphertext on disk, and when the client +tries to upload the same file a second time, it discovers that the partial +ciphertext is already present. The client then only needs to upload the +remaining ciphertext. This reduces the "interrupted upload penalty" to a +minimum. + +This also serves to reduce the number of active connections between the +client and the outside world: most of their traffic flows over a single TCP +connection to the helper. This can improve TCP fairness, and should allow +other applications that are sharing the same uplink to compete more evenly +for the limited bandwidth. + +Setting Up A Helper +=================== + +Who should consider running a helper? + +* Benevolent entities which wish to provide better upload speed for clients + that have slow uplinks +* Folks which have machines with upload bandwidth to spare. +* Server grid operators who want clients to connect to a small number of + helpers rather than a large number of storage servers (a "multi-tier" + architecture) + +What sorts of machines are good candidates for running a helper? + +* The Helper needs to have good bandwidth to the storage servers. In + particular, it needs to have at least 3.3x better upload bandwidth than + the client does, or the client might as well upload directly to the + storage servers. In a commercial grid, the helper should be in the same + colo (and preferably in the same rack) as the storage servers. +* The Helper will take on most of the CPU load involved in uploading a file. + So having a dedicated machine will give better results. +* The Helper buffers ciphertext on disk, so the host will need at least as + much free disk space as there will be simultaneous uploads. When an upload + is interrupted, that space will be used for a longer period of time. + +To turn a Tahoe-LAFS node into a helper (i.e. to run a helper service in +addition to whatever else that node is doing), edit the tahoe.cfg file in your +node's base directory and set "enabled = true" in the section named +"[helper]". + +Then restart the node. This will signal the node to create a Helper service +and listen for incoming requests. Once the node has started, there will be a +file named private/helper.furl which contains the contact information for the +helper: you will need to give this FURL to any clients that wish to use your +helper. + +:: + + cat $BASEDIR/private/helper.furl | mail -s "helper furl" friend@example.com + +You can tell if your node is running a helper by looking at its web status +page. Assuming that you've set up the 'webport' to use port 3456, point your +browser at http://localhost:3456/ . The welcome page will say "Helper: 0 +active uploads" or "Not running helper" as appropriate. The +http://localhost:3456/helper_status page will also provide details on what +the helper is currently doing. + +The helper will store the ciphertext that is is fetching from clients in +$BASEDIR/helper/CHK_incoming/ . Once all the ciphertext has been fetched, it +will be moved to $BASEDIR/helper/CHK_encoding/ and erasure-coding will +commence. Once the file is fully encoded and the shares are pushed to the +storage servers, the ciphertext file will be deleted. + +If a client disconnects while the ciphertext is being fetched, the partial +ciphertext will remain in CHK_incoming/ until they reconnect and finish +sending it. If a client disconnects while the ciphertext is being encoded, +the data will remain in CHK_encoding/ until they reconnect and encoding is +finished. For long-running and busy helpers, it may be a good idea to delete +files in these directories that have not been modified for a week or two. +Future versions of tahoe will try to self-manage these files a bit better. + +Using a Helper +============== + +Who should consider using a Helper? + +* clients with limited upstream bandwidth, such as a consumer ADSL line +* clients who believe that the helper will give them faster uploads than + they could achieve with a direct upload +* clients who experience problems with TCP connection fairness: if other + programs or machines in the same home are getting less than their fair + share of upload bandwidth. If the connection is being shared fairly, then + a Tahoe upload that is happening at the same time as a single FTP upload + should get half the bandwidth. +* clients who have been given the helper.furl by someone who is running a + Helper and is willing to let them use it + +To take advantage of somebody else's Helper, take the helper.furl file that +they give you, and copy it into your node's base directory, then restart the +node: + +:: + + cat email >$BASEDIR/helper.furl + tahoe restart $BASEDIR + +This will signal the client to try and connect to the helper. Subsequent +uploads will use the helper rather than using direct connections to the +storage server. + +If the node has been configured to use a helper, that node's HTTP welcome +page (http://localhost:3456/) will say "Helper: $HELPERFURL" instead of +"Helper: None". If the helper is actually running and reachable, the next +line will say "Connected to helper?: yes" instead of "no". + +The helper is optional. If a helper is connected when an upload begins, the +upload will use the helper. If there is no helper connection present when an +upload begins, that upload will connect directly to the storage servers. The +client will automatically attempt to reconnect to the helper if the +connection is lost, using the same exponential-backoff algorithm as all other +tahoe/foolscap connections. + +The upload/download status page (http://localhost:3456/status) will announce +the using-helper-or-not state of each upload, in the "Helper?" column. + +Other Helper Modes +================== + +The Tahoe Helper only currently helps with one kind of operation: uploading +immutable files. There are three other things it might be able to help with +in the future: + +* downloading immutable files +* uploading mutable files (such as directories) +* downloading mutable files (like directories) + +Since mutable files are currently limited in size, the ADSL upstream penalty +is not so severe for them. There is no ADSL penalty to downloads, but there +may still be benefit to extending the helper interface to assist with them: +fewer connections to the storage servers, and better TCP fairness. + +A future version of the Tahoe helper might provide assistance with these +other modes. If it were to help with all four modes, then the clients would +not need direct connections to the storage servers at all: clients would +connect to helpers, and helpers would connect to servers. For a large grid +with tens of thousands of clients, this might make the grid more scalable. diff --git a/docs/helper.txt b/docs/helper.txt deleted file mode 100644 index fa53791c..00000000 --- a/docs/helper.txt +++ /dev/null @@ -1,168 +0,0 @@ -= The Tahoe Upload Helper = - -1. Overview -2. Setting Up A Helper -3. Using a Helper -4. Other Helper Modes - -== Overview == - -As described in the "SWARMING DOWNLOAD, TRICKLING UPLOAD" section of -architecture.txt, Tahoe uploads require more bandwidth than downloads: you -must push the redundant shares during upload, but you do not need to retrieve -them during download. With the default 3-of-10 encoding parameters, this -means that an upload will require about 3.3x the traffic as a download of the -same file. - -Unfortunately, this "expansion penalty" occurs in the same upstream direction -that most consumer DSL lines are slow anyways. Typical ADSL lines get 8 times -as much download capacity as upload capacity. When the ADSL upstream penalty -is combined with the expansion penalty, the result is uploads that can take -up to 32 times longer than downloads. - -The "Helper" is a service that can mitigate the expansion penalty by -arranging for the client node to send data to a central Helper node instead -of sending it directly to the storage servers. It sends ciphertext to the -Helper, so the security properties remain the same as with non-Helper -uploads. The Helper is responsible for applying the erasure encoding -algorithm and placing the resulting shares on the storage servers. - -Of course, the helper cannot mitigate the ADSL upstream penalty. - -The second benefit of using an upload helper is that clients who lose their -network connections while uploading a file (because of a network flap, or -because they shut down their laptop while an upload was in progress) can -resume their upload rather than needing to start again from scratch. The -helper holds the partially-uploaded ciphertext on disk, and when the client -tries to upload the same file a second time, it discovers that the partial -ciphertext is already present. The client then only needs to upload the -remaining ciphertext. This reduces the "interrupted upload penalty" to a -minimum. - -This also serves to reduce the number of active connections between the -client and the outside world: most of their traffic flows over a single TCP -connection to the helper. This can improve TCP fairness, and should allow -other applications that are sharing the same uplink to compete more evenly -for the limited bandwidth. - - - -== Setting Up A Helper == - -Who should consider running a helper? - - * Benevolent entities which wish to provide better upload speed for clients - that have slow uplinks - * Folks which have machines with upload bandwidth to spare. - * Server grid operators who want clients to connect to a small number of - helpers rather than a large number of storage servers (a "multi-tier" - architecture) - -What sorts of machines are good candidates for running a helper? - - * The Helper needs to have good bandwidth to the storage servers. In - particular, it needs to have at least 3.3x better upload bandwidth than - the client does, or the client might as well upload directly to the - storage servers. In a commercial grid, the helper should be in the same - colo (and preferably in the same rack) as the storage servers. - * The Helper will take on most of the CPU load involved in uploading a file. - So having a dedicated machine will give better results. - * The Helper buffers ciphertext on disk, so the host will need at least as - much free disk space as there will be simultaneous uploads. When an upload - is interrupted, that space will be used for a longer period of time. - -To turn a Tahoe-LAFS node into a helper (i.e. to run a helper service in -addition to whatever else that node is doing), edit the tahoe.cfg file in your -node's base directory and set "enabled = true" in the section named -"[helper]". - -Then restart the node. This will signal the node to create a Helper service -and listen for incoming requests. Once the node has started, there will be a -file named private/helper.furl which contains the contact information for the -helper: you will need to give this FURL to any clients that wish to use your -helper. - - cat $BASEDIR/private/helper.furl |mail -s "helper furl" friend@example.com - -You can tell if your node is running a helper by looking at its web status -page. Assuming that you've set up the 'webport' to use port 3456, point your -browser at http://localhost:3456/ . The welcome page will say "Helper: 0 -active uploads" or "Not running helper" as appropriate. The -http://localhost:3456/helper_status page will also provide details on what -the helper is currently doing. - -The helper will store the ciphertext that is is fetching from clients in -$BASEDIR/helper/CHK_incoming/ . Once all the ciphertext has been fetched, it -will be moved to $BASEDIR/helper/CHK_encoding/ and erasure-coding will -commence. Once the file is fully encoded and the shares are pushed to the -storage servers, the ciphertext file will be deleted. - -If a client disconnects while the ciphertext is being fetched, the partial -ciphertext will remain in CHK_incoming/ until they reconnect and finish -sending it. If a client disconnects while the ciphertext is being encoded, -the data will remain in CHK_encoding/ until they reconnect and encoding is -finished. For long-running and busy helpers, it may be a good idea to delete -files in these directories that have not been modified for a week or two. -Future versions of tahoe will try to self-manage these files a bit better. - -== Using a Helper == - -Who should consider using a Helper? - - * clients with limited upstream bandwidth, such as a consumer ADSL line - * clients who believe that the helper will give them faster uploads than - they could achieve with a direct upload - * clients who experience problems with TCP connection fairness: if other - programs or machines in the same home are getting less than their fair - share of upload bandwidth. If the connection is being shared fairly, then - a Tahoe upload that is happening at the same time as a single FTP upload - should get half the bandwidth. - * clients who have been given the helper.furl by someone who is running a - Helper and is willing to let them use it - -To take advantage of somebody else's Helper, take the helper.furl file that -they give you, and copy it into your node's base directory, then restart the -node: - - cat email >$BASEDIR/helper.furl - tahoe restart $BASEDIR - -This will signal the client to try and connect to the helper. Subsequent -uploads will use the helper rather than using direct connections to the -storage server. - -If the node has been configured to use a helper, that node's HTTP welcome -page (http://localhost:3456/) will say "Helper: $HELPERFURL" instead of -"Helper: None". If the helper is actually running and reachable, the next -line will say "Connected to helper?: yes" instead of "no". - -The helper is optional. If a helper is connected when an upload begins, the -upload will use the helper. If there is no helper connection present when an -upload begins, that upload will connect directly to the storage servers. The -client will automatically attempt to reconnect to the helper if the -connection is lost, using the same exponential-backoff algorithm as all other -tahoe/foolscap connections. - -The upload/download status page (http://localhost:3456/status) will announce -the using-helper-or-not state of each upload, in the "Helper?" column. - -== Other Helper Modes == - -The Tahoe Helper only currently helps with one kind of operation: uploading -immutable files. There are three other things it might be able to help with -in the future: - - * downloading immutable files - * uploading mutable files (such as directories) - * downloading mutable files (like directories) - -Since mutable files are currently limited in size, the ADSL upstream penalty -is not so severe for them. There is no ADSL penalty to downloads, but there -may still be benefit to extending the helper interface to assist with them: -fewer connections to the storage servers, and better TCP fairness. - -A future version of the Tahoe helper might provide assistance with these -other modes. If it were to help with all four modes, then the clients would -not need direct connections to the storage servers at all: clients would -connect to helpers, and helpers would connect to servers. For a large grid -with tens of thousands of clients, this might make the grid more scalable. diff --git a/docs/how_to_make_a_tahoe-lafs_release.rst b/docs/how_to_make_a_tahoe-lafs_release.rst new file mode 100644 index 00000000..ad3b1d5f --- /dev/null +++ b/docs/how_to_make_a_tahoe-lafs_release.rst @@ -0,0 +1,19 @@ +[ ] 1 update doc files: relnotes.txt, CREDITS, docs/known_issues.txt, NEWS. Add release name and date to top-most item in NEWS. +[ ] 2 change docs/quickstart.html to point to just the current allmydata-tahoe-X.Y.Z.zip source code file, or else to point to a directory which contains only allmydata-tahoe-X.Y.Z.* source code files +[ ] 3 darcs pull +[ ] 4 make tag +[ ] 5 build locally to make sure the release is reporting itself as the intended version +[ ] 6 make sure buildbot is green +[ ] 7 make sure other people aren't committing at that moment +[ ] 8 push tag along with some other documentation-only patch (typically to relnotes.txt) to trigger buildslaves +[ ] 9 make sure buildbot is green +[ ] 10 make sure debs got built and uploaded properly +[ ] 11 make sure a sumo sdist tarball got built and uploaded properly +[ ] 12 symlink the release tarball on tahoe-lafs.org: /var/www/source/tahoe-lafs/releases/ +[ ] 13 update Wiki: front page news, news, old news, parade of release notes +[ ] 14 send out relnotes.txt: [ ] tahoe-announce@tahoe-lafs.org, [ ] tahoe-dev@tahoe-lafs.org, [ ] p2p-hackers@lists.zooko.com, [ ] lwn@lwn.net, [ ] cap-talk@mail.eros-os.org, [ ] cryptography@metzdown.com & cryptography@randombit.net, [ ] twisted-python@twistedmatrix.com, [ ] owncloud@kde.org, [ ] the "decentralization" group on groups.yahoo.com, [ ] pycrypto mailing list, -> fuse-devel@lists.sourceforge.net, -> fuse-sshfs@lists.sourceforge.net, [ ] duplicity-talk@nongnu.org, [ ] news@phoronix.com, [ ] python-list@python.org, -> cygwin@cygwin.com, [ ] The Boulder Linux Users' Group, [ ] The Boulder Hackerspace mailing list, [ ] cryptopp-users@googlegroups.com, [ ] tiddlywiki, [ ] hadoop, [ ] bzr, [ ] mercurial, [ ] http://listcultures.org/pipermail/p2presearch_listcultures.org/ , openstack, deltacloud, libcloud +[ ] 15 update hacktahoe.org +[ ] 16 make an "announcement of new release" on freshmeat +[ ] 17 upload to pypi with "python ./setup.py sdist upload register" +[ ] 18 update "current version" information and make an "announcement of new release" on launchpad +[ ] 19 close the Milestone on the trac Roadmap diff --git a/docs/how_to_make_a_tahoe-lafs_release.txt b/docs/how_to_make_a_tahoe-lafs_release.txt deleted file mode 100644 index ad3b1d5f..00000000 --- a/docs/how_to_make_a_tahoe-lafs_release.txt +++ /dev/null @@ -1,19 +0,0 @@ -[ ] 1 update doc files: relnotes.txt, CREDITS, docs/known_issues.txt, NEWS. Add release name and date to top-most item in NEWS. -[ ] 2 change docs/quickstart.html to point to just the current allmydata-tahoe-X.Y.Z.zip source code file, or else to point to a directory which contains only allmydata-tahoe-X.Y.Z.* source code files -[ ] 3 darcs pull -[ ] 4 make tag -[ ] 5 build locally to make sure the release is reporting itself as the intended version -[ ] 6 make sure buildbot is green -[ ] 7 make sure other people aren't committing at that moment -[ ] 8 push tag along with some other documentation-only patch (typically to relnotes.txt) to trigger buildslaves -[ ] 9 make sure buildbot is green -[ ] 10 make sure debs got built and uploaded properly -[ ] 11 make sure a sumo sdist tarball got built and uploaded properly -[ ] 12 symlink the release tarball on tahoe-lafs.org: /var/www/source/tahoe-lafs/releases/ -[ ] 13 update Wiki: front page news, news, old news, parade of release notes -[ ] 14 send out relnotes.txt: [ ] tahoe-announce@tahoe-lafs.org, [ ] tahoe-dev@tahoe-lafs.org, [ ] p2p-hackers@lists.zooko.com, [ ] lwn@lwn.net, [ ] cap-talk@mail.eros-os.org, [ ] cryptography@metzdown.com & cryptography@randombit.net, [ ] twisted-python@twistedmatrix.com, [ ] owncloud@kde.org, [ ] the "decentralization" group on groups.yahoo.com, [ ] pycrypto mailing list, -> fuse-devel@lists.sourceforge.net, -> fuse-sshfs@lists.sourceforge.net, [ ] duplicity-talk@nongnu.org, [ ] news@phoronix.com, [ ] python-list@python.org, -> cygwin@cygwin.com, [ ] The Boulder Linux Users' Group, [ ] The Boulder Hackerspace mailing list, [ ] cryptopp-users@googlegroups.com, [ ] tiddlywiki, [ ] hadoop, [ ] bzr, [ ] mercurial, [ ] http://listcultures.org/pipermail/p2presearch_listcultures.org/ , openstack, deltacloud, libcloud -[ ] 15 update hacktahoe.org -[ ] 16 make an "announcement of new release" on freshmeat -[ ] 17 upload to pypi with "python ./setup.py sdist upload register" -[ ] 18 update "current version" information and make an "announcement of new release" on launchpad -[ ] 19 close the Milestone on the trac Roadmap diff --git a/docs/known_issues.rst b/docs/known_issues.rst new file mode 100644 index 00000000..58be6ab9 --- /dev/null +++ b/docs/known_issues.rst @@ -0,0 +1,202 @@ +============ +Known issues +============ + +* `Overview`_ +* `Issues in Tahoe-LAFS v1.8.0, released 2010-09-23` + + * `Potential unauthorized access by JavaScript in unrelated files`_ + * `Potential disclosure of file through embedded hyperlinks or JavaScript in that file`_ + * `Command-line arguments are leaked to other local users`_ + * `Capabilities may be leaked to web browser phishing filter / "safe browsing" servers`_ + * `Known issues in the FTP and SFTP frontends`_ + +Overview +======== + +Below is a list of known issues in recent releases of Tahoe-LAFS, and how to +manage them. The current version of this file can be found at + +http://tahoe-lafs.org/source/tahoe-lafs/trunk/docs/known_issues.txt + +If you've been using Tahoe-LAFS since v1.1 (released 2008-06-11) or if you're +just curious about what sort of mistakes we've made in the past, then you might +want to read the "historical known issues" document: + +http://tahoe-lafs.org/source/tahoe-lafs/trunk/docs/historical/historical_known_issues.txt + +Issues in Tahoe-LAFS v1.8.0, released 2010-09-23 +================================================ + +Potential unauthorized access by JavaScript in unrelated files +-------------------------------------------------------------- + +If you view a file stored in Tahoe-LAFS through a web user interface, +JavaScript embedded in that file might be able to access other files or +directories stored in Tahoe-LAFS which you view through the same web +user interface. Such a script would be able to send the contents of +those other files or directories to the author of the script, and if you +have the ability to modify the contents of those files or directories, +then that script could modify or delete those files or directories. + +how to manage it +~~~~~~~~~~~~~~~~ + +For future versions of Tahoe-LAFS, we are considering ways to close off +this leakage of authority while preserving ease of use -- the discussion +of this issue is ticket `#615 `_. + +For the present, either do not view files stored in Tahoe-LAFS through a +web user interface, or turn off JavaScript in your web browser before +doing so, or limit your viewing to files which you know don't contain +malicious JavaScript. + + +Potential disclosure of file through embedded hyperlinks or JavaScript in that file +----------------------------------------------------------------------------------- + +If there is a file stored on a Tahoe-LAFS storage grid, and that file +gets downloaded and displayed in a web browser, then JavaScript or +hyperlinks within that file can leak the capability to that file to a +third party, which means that third party gets access to the file. + +If there is JavaScript in the file, then it could deliberately leak +the capability to the file out to some remote listener. + +If there are hyperlinks in the file, and they get followed, then +whichever server they point to receives the capability to the +file. Note that IMG tags are typically followed automatically by web +browsers, so being careful which hyperlinks you click on is not +sufficient to prevent this from happening. + +how to manage it +~~~~~~~~~~~~~~~~ + +For future versions of Tahoe-LAFS, we are considering ways to close off +this leakage of authority while preserving ease of use -- the discussion +of this issue is ticket `#127 `_. + +For the present, a good work-around is that if you want to store and +view a file on Tahoe-LAFS and you want that file to remain private, then +remove from that file any hyperlinks pointing to other people's servers +and remove any JavaScript unless you are sure that the JavaScript is not +written to maliciously leak access. + + +Command-line arguments are leaked to other local users +------------------------------------------------------ + +Remember that command-line arguments are visible to other users (through +the 'ps' command, or the windows Process Explorer tool), so if you are +using a Tahoe-LAFS node on a shared host, other users on that host will +be able to see (and copy) any caps that you pass as command-line +arguments. This includes directory caps that you set up with the "tahoe +add-alias" command. + +how to manage it +~~~~~~~~~~~~~~~~ + +As of Tahoe-LAFS v1.3.0 there is a "tahoe create-alias" command that does +the following technique for you. + +Bypass add-alias and edit the NODEDIR/private/aliases file directly, by +adding a line like this: + + fun: URI:DIR2:ovjy4yhylqlfoqg2vcze36dhde:4d4f47qko2xm5g7osgo2yyidi5m4muyo2vjjy53q4vjju2u55mfa + +By entering the dircap through the editor, the command-line arguments +are bypassed, and other users will not be able to see them. Once you've +added the alias, if you use that alias instead of a cap itself on the +command-line, then no secrets are passed through the command line. Then +other processes on the system can still see your filenames and other +arguments you type there, but not the caps that Tahoe-LAFS uses to permit +access to your files and directories. + + +Capabilities may be leaked to web browser phishing filter / "safe browsing" servers +----------------------------------------------------------------------------------- + +Firefox, Internet Explorer, and Chrome include a "phishing filter" or +"safe browing" component, which is turned on by default, and which sends +any URLs that it deems suspicious to a central server. + +Microsoft gives a brief description of their filter's operation at +. Firefox +and Chrome both use Google's "safe browsing API" which is documented +at and +. + +This of course has implications for the privacy of general web browsing +(especially in the cases of Firefox and Chrome, which send your main +personally identifying Google cookie along with these requests without +your explicit consent, as described for Firefox in +). + +The reason for documenting this issue here, though, is that when using the +Tahoe-LAFS web user interface, it could also affect confidentiality and integrity +by leaking capabilities to the filter server. + +Since IE's filter sends URLs by SSL/TLS, the exposure of caps is limited to +the filter server operators (or anyone able to hack the filter server) rather +than to network eavesdroppers. The "safe browsing API" protocol used by +Firefox and Chrome, on the other hand, is *not* encrypted, although the +URL components are normally hashed. + +Opera also has a similar facility that is disabled by default. A previous +version of this file stated that Firefox had abandoned their phishing +filter; this was incorrect. + +how to manage it +~~~~~~~~~~~~~~~~ + +If you use any phishing filter or "safe browsing" feature, consider either +disabling it, or not using the WUI via that browser. Phishing filters have +very limited effectiveness (see +), and phishing +or malware attackers have learnt how to bypass them. + +To disable the filter in IE7 or IE8: +```````````````````````````````````` + +- Click Internet Options from the Tools menu. + +- Click the Advanced tab. + +- If an "Enable SmartScreen Filter" option is present, uncheck it. + If a "Use Phishing Filter" or "Phishing Filter" option is present, + set it to Disable. + +- Confirm (click OK or Yes) out of all dialogs. + +If you have a version of IE that splits the settings between security +zones, do this for all zones. + +To disable the filter in Firefox: +````````````````````````````````` + +- Click Options from the Tools menu. + +- Click the Security tab. + +- Uncheck both the "Block reported attack sites" and "Block reported + web forgeries" options. + +- Click OK. + +To disable the filter in Chrome: +```````````````````````````````` + +- Click Options from the Tools menu. + +- Click the "Under the Hood" tab and find the "Privacy" section. + +- Uncheck the "Enable phishing and malware protection" option. + +- Click Close. + + +Known issues in the FTP and SFTP frontends +------------------------------------------ + +These are documented in docs/frontends/FTP-and-SFTP.txt and at +. diff --git a/docs/known_issues.txt b/docs/known_issues.txt deleted file mode 100644 index aa112708..00000000 --- a/docs/known_issues.txt +++ /dev/null @@ -1,173 +0,0 @@ -= known issues = - -* overview -* issues in Tahoe-LAFS v1.8.0, released 2010-09-23 - - potential unauthorized access by JavaScript in unrelated files - - potential disclosure of file through embedded hyperlinks or JavaScript in that file - - command-line arguments are leaked to other local users - - capabilities may be leaked to web browser phishing filter / "safe browsing" servers === - - known issues in the FTP and SFTP frontends === - -== overview == - -Below is a list of known issues in recent releases of Tahoe-LAFS, and how to -manage them. The current version of this file can be found at - -http://tahoe-lafs.org/source/tahoe-lafs/trunk/docs/known_issues.txt - -If you've been using Tahoe-LAFS since v1.1 (released 2008-06-11) or if you're -just curious about what sort of mistakes we've made in the past, then you might -want to read the "historical known issues" document: - -http://tahoe-lafs.org/source/tahoe-lafs/trunk/docs/historical/historical_known_issues.txt - -== issues in Tahoe-LAFS v1.8.0, released 2010-09-18 == - -=== potential unauthorized access by JavaScript in unrelated files === - -If you view a file stored in Tahoe-LAFS through a web user interface, -JavaScript embedded in that file might be able to access other files or -directories stored in Tahoe-LAFS which you view through the same web -user interface. Such a script would be able to send the contents of -those other files or directories to the author of the script, and if you -have the ability to modify the contents of those files or directories, -then that script could modify or delete those files or directories. - -==== how to manage it ==== - -For future versions of Tahoe-LAFS, we are considering ways to close off -this leakage of authority while preserving ease of use -- the discussion -of this issue is ticket #615. - -For the present, either do not view files stored in Tahoe-LAFS through a -web user interface, or turn off JavaScript in your web browser before -doing so, or limit your viewing to files which you know don't contain -malicious JavaScript. - - -=== potential disclosure of file through embedded hyperlinks or JavaScript in that file === - -If there is a file stored on a Tahoe-LAFS storage grid, and that file -gets downloaded and displayed in a web browser, then JavaScript or -hyperlinks within that file can leak the capability to that file to a -third party, which means that third party gets access to the file. - -If there is JavaScript in the file, then it could deliberately leak -the capability to the file out to some remote listener. - -If there are hyperlinks in the file, and they get followed, then -whichever server they point to receives the capability to the -file. Note that IMG tags are typically followed automatically by web -browsers, so being careful which hyperlinks you click on is not -sufficient to prevent this from happening. - -==== how to manage it ==== - -For future versions of Tahoe-LAFS, we are considering ways to close off -this leakage of authority while preserving ease of use -- the discussion -of this issue is ticket #127. - -For the present, a good work-around is that if you want to store and -view a file on Tahoe-LAFS and you want that file to remain private, then -remove from that file any hyperlinks pointing to other people's servers -and remove any JavaScript unless you are sure that the JavaScript is not -written to maliciously leak access. - - -=== command-line arguments are leaked to other local users === - -Remember that command-line arguments are visible to other users (through -the 'ps' command, or the windows Process Explorer tool), so if you are -using a Tahoe-LAFS node on a shared host, other users on that host will -be able to see (and copy) any caps that you pass as command-line -arguments. This includes directory caps that you set up with the "tahoe -add-alias" command. - -==== how to manage it ==== - -As of Tahoe-LAFS v1.3.0 there is a "tahoe create-alias" command that does -the following technique for you. - -Bypass add-alias and edit the NODEDIR/private/aliases file directly, by -adding a line like this: - -fun: URI:DIR2:ovjy4yhylqlfoqg2vcze36dhde:4d4f47qko2xm5g7osgo2yyidi5m4muyo2vjjy53q4vjju2u55mfa - -By entering the dircap through the editor, the command-line arguments -are bypassed, and other users will not be able to see them. Once you've -added the alias, if you use that alias instead of a cap itself on the -command-line, then no secrets are passed through the command line. Then -other processes on the system can still see your filenames and other -arguments you type there, but not the caps that Tahoe-LAFS uses to permit -access to your files and directories. - - -=== capabilities may be leaked to web browser phishing filter / "safe browsing" servers === - -Firefox, Internet Explorer, and Chrome include a "phishing filter" or -"safe browing" component, which is turned on by default, and which sends -any URLs that it deems suspicious to a central server. - -Microsoft gives a brief description of their filter's operation at -. Firefox -and Chrome both use Google's "safe browsing API" which is documented -at and -. - -This of course has implications for the privacy of general web browsing -(especially in the cases of Firefox and Chrome, which send your main -personally identifying Google cookie along with these requests without -your explicit consent, as described for Firefox in -). - -The reason for documenting this issue here, though, is that when using the -Tahoe-LAFS web user interface, it could also affect confidentiality and integrity -by leaking capabilities to the filter server. - -Since IE's filter sends URLs by SSL/TLS, the exposure of caps is limited to -the filter server operators (or anyone able to hack the filter server) rather -than to network eavesdroppers. The "safe browsing API" protocol used by -Firefox and Chrome, on the other hand, is *not* encrypted, although the -URL components are normally hashed. - -Opera also has a similar facility that is disabled by default. A previous -version of this file stated that Firefox had abandoned their phishing -filter; this was incorrect. - -==== how to manage it ==== - -If you use any phishing filter or "safe browsing" feature, consider either -disabling it, or not using the WUI via that browser. Phishing filters have -very limited effectiveness (see -), and phishing -or malware attackers have learnt how to bypass them. - -To disable the filter in IE7 or IE8: - - Click Internet Options from the Tools menu. - - Click the Advanced tab. - - If an "Enable SmartScreen Filter" option is present, uncheck it. - If a "Use Phishing Filter" or "Phishing Filter" option is present, - set it to Disable. - - Confirm (click OK or Yes) out of all dialogs. - -If you have a version of IE that splits the settings between security -zones, do this for all zones. - -To disable the filter in Firefox: - - Click Options from the Tools menu. - - Click the Security tab. - - Uncheck both the "Block reported attack sites" and "Block reported - web forgeries" options. - - Click OK. - -To disable the filter in Chrome: - - Click Options from the Tools menu. - - Click the "Under the Hood" tab and find the "Privacy" section. - - Uncheck the "Enable phishing and malware protection" option. - - Click Close. - - -=== known issues in the FTP and SFTP frontends === - -These are documented in docs/frontends/FTP-and-SFTP.txt and at -. diff --git a/docs/logging.rst b/docs/logging.rst new file mode 100644 index 00000000..936e8467 --- /dev/null +++ b/docs/logging.rst @@ -0,0 +1,284 @@ +============= +Tahoe Logging +============= + +1. `Overview`_ +2. `Realtime Logging`_ +3. `Incidents`_ +4. `Working with flogfiles`_ +5. `Gatherers`_ + + 1. `Incident Gatherer`_ + 2. `Log Gatherer`_ + +6. `Local twistd.log files`_ +7. `Adding log messages`_ +8. `Log Messages During Unit Tests`_ + +Overview +======== + +Tahoe uses the Foolscap logging mechanism (known as the "flog" subsystem) to +record information about what is happening inside the Tahoe node. This is +primarily for use by programmers and grid operators who want to find out what +went wrong. + +The foolscap logging system is documented here: + + http://foolscap.lothar.com/docs/logging.html + +The foolscap distribution includes a utility named "flogtool" (usually at +/usr/bin/flogtool) which is used to get access to many foolscap logging +features. + +Realtime Logging +================ + +When you are working on Tahoe code, and want to see what the node is doing, +the easiest tool to use is "flogtool tail". This connects to the tahoe node +and subscribes to hear about all log events. These events are then displayed +to stdout, and optionally saved to a file. + +"flogtool tail" connects to the "logport", for which the FURL is stored in +BASEDIR/private/logport.furl . The following command will connect to this +port and start emitting log information: + + flogtool tail BASEDIR/private/logport.furl + +The "--save-to FILENAME" option will save all received events to a file, +where then can be examined later with "flogtool dump" or "flogtool +web-viewer". The --catch-up flag will ask the node to dump all stored events +before subscribing to new ones (without --catch-up, you will only hear about +events that occur after the tool has connected and subscribed). + +Incidents +========= + +Foolscap keeps a short list of recent events in memory. When something goes +wrong, it writes all the history it has (and everything that gets logged in +the next few seconds) into a file called an "incident". These files go into +BASEDIR/logs/incidents/ , in a file named +"incident-TIMESTAMP-UNIQUE.flog.bz2". The default definition of "something +goes wrong" is the generation of a log event at the log.WEIRD level or +higher, but other criteria could be implemented. + +The typical "incident report" we've seen in a large Tahoe grid is about 40kB +compressed, representing about 1800 recent events. + +These "flogfiles" have a similar format to the files saved by "flogtool tail +--save-to". They are simply lists of log events, with a small header to +indicate which event triggered the incident. + +The "flogtool dump FLOGFILE" command will take one of these .flog.bz2 files +and print their contents to stdout, one line per event. The raw event +dictionaries can be dumped by using "flogtool dump --verbose FLOGFILE". + +The "flogtool web-viewer" command can be used to examine the flogfile in a +web browser. It runs a small HTTP server and emits the URL on stdout. This +view provides more structure than the output of "flogtool dump": the +parent/child relationships of log events is displayed in a nested format. +"flogtool web-viewer" is still fairly immature. + +Working with flogfiles +====================== + +The "flogtool filter" command can be used to take a large flogfile (perhaps +one created by the log-gatherer, see below) and copy a subset of events into +a second file. This smaller flogfile may be easier to work with than the +original. The arguments to "flogtool filter" specify filtering criteria: a +predicate that each event must match to be copied into the target file. +--before and --after are used to exclude events outside a given window of +time. --above will retain events above a certain severity level. --from +retains events send by a specific tubid. --strip-facility removes events that +were emitted with a given facility (like foolscap.negotiation or +tahoe.upload). + +Gatherers +========= + +In a deployed Tahoe grid, it is useful to get log information automatically +transferred to a central log-gatherer host. This offloads the (admittedly +modest) storage requirements to a different host and provides access to +logfiles from multiple nodes (webapi/storage/helper) nodes in a single place. + +There are two kinds of gatherers. Both produce a FURL which needs to be +placed in the NODEDIR/log_gatherer.furl file (one FURL per line) of the nodes +that are to publish their logs to the gatherer. When the Tahoe node starts, +it will connect to the configured gatherers and offer its logport: the +gatherer will then use the logport to subscribe to hear about events. + +The gatherer will write to files in its working directory, which can then be +examined with tools like "flogtool dump" as described above. + +Incident Gatherer +----------------- + +The "incident gatherer" only collects Incidents: records of the log events +that occurred just before and slightly after some high-level "trigger event" +was recorded. Each incident is classified into a "category": a short string +that summarizes what sort of problem took place. These classification +functions are written after examining a new/unknown incident. The idea is to +recognize when the same problem is happening multiple times. + +A collection of classification functions that are useful for Tahoe nodes are +provided in misc/incident-gatherer/support_classifiers.py . There is roughly +one category for each log.WEIRD-or-higher level event in the Tahoe source +code. + +The incident gatherer is created with the "flogtool create-incident-gatherer +WORKDIR" command, and started with "tahoe start". The generated +"gatherer.tac" file should be modified to add classifier functions. + +The incident gatherer writes incident names (which are simply the relative +pathname of the incident-\*.flog.bz2 file) into classified/CATEGORY. For +example, the classified/mutable-retrieve-uncoordinated-write-error file +contains a list of all incidents which were triggered by an uncoordinated +write that was detected during mutable file retrieval (caused when somebody +changed the contents of the mutable file in between the node's mapupdate step +and the retrieve step). The classified/unknown file contains a list of all +incidents that did not match any of the classification functions. + +At startup, the incident gatherer will automatically reclassify any incident +report which is not mentioned in any of the classified/* files. So the usual +workflow is to examine the incidents in classified/unknown, add a new +classification function, delete classified/unknown, then bound the gatherer +with "tahoe restart WORKDIR". The incidents which can be classified with the +new functions will be added to their own classified/FOO lists, and the +remaining ones will be put in classified/unknown, where the process can be +repeated until all events are classifiable. + +The incident gatherer is still fairly immature: future versions will have a +web interface and an RSS feed, so operations personnel can track problems in +the storage grid. + +In our experience, each Incident takes about two seconds to transfer from the +node which generated it to the gatherer. The gatherer will automatically +catch up to any incidents which occurred while it is offline. + +Log Gatherer +------------ + +The "Log Gatherer" subscribes to hear about every single event published by +the connected nodes, regardless of severity. This server writes these log +events into a large flogfile that is rotated (closed, compressed, and +replaced with a new one) on a periodic basis. Each flogfile is named +according to the range of time it represents, with names like +"from-2008-08-26-132256--to-2008-08-26-162256.flog.bz2". The flogfiles +contain events from many different sources, making it easier to correlate +things that happened on multiple machines (such as comparing a client node +making a request with the storage servers that respond to that request). + +The Log Gatherer is created with the "flogtool create-gatherer WORKDIR" +command, and started with "tahoe start". The log_gatherer.furl it creates +then needs to be copied into the BASEDIR/log_gatherer.furl file of all nodes +which should be sending it log events. + +The "flogtool filter" command, described above, is useful to cut down the +potentially-large flogfiles into more a narrowly-focussed form. + +Busy nodes, particularly wapi nodes which are performing recursive +deep-size/deep-stats/deep-check operations, can produce a lot of log events. +To avoid overwhelming the node (and using an unbounded amount of memory for +the outbound TCP queue), publishing nodes will start dropping log events when +the outbound queue grows too large. When this occurs, there will be gaps +(non-sequential event numbers) in the log-gatherer's flogfiles. + +Local twistd.log files +====================== + +[TODO: not yet true, requires foolscap-0.3.1 and a change to allmydata.node] + +In addition to the foolscap-based event logs, certain high-level events will +be recorded directly in human-readable text form, in the +BASEDIR/logs/twistd.log file (and its rotated old versions: twistd.log.1, +twistd.log.2, etc). This form does not contain as much information as the +flogfiles available through the means described previously, but they are +immediately available to the curious developer, and are retained until the +twistd.log.NN files are explicitly deleted. + +Only events at the log.OPERATIONAL level or higher are bridged to twistd.log +(i.e. not the log.NOISY debugging events). In addition, foolscap internal +events (like connection negotiation messages) are not bridged to twistd.log . + +Adding log messages +=================== + +When adding new code, the Tahoe developer should add a reasonable number of +new log events. For details, please see the Foolscap logging documentation, +but a few notes are worth stating here: + +* use a facility prefix of "tahoe.", like "tahoe.mutable.publish" + +* assign each severe (log.WEIRD or higher) event a unique message + identifier, as the umid= argument to the log.msg() call. The + misc/coding_tools/make_umid script may be useful for this purpose. This will make it + easier to write a classification function for these messages. + +* use the parent= argument whenever the event is causally/temporally + clustered with its parent. For example, a download process that involves + three sequential hash fetches could announce the send and receipt of those + hash-fetch messages with a parent= argument that ties them to the overall + download process. However, each new wapi download request should be + unparented. + +* use the format= argument in preference to the message= argument. E.g. + use log.msg(format="got %(n)d shares, need %(k)d", n=n, k=k) instead of + log.msg("got %d shares, need %d" % (n,k)). This will allow later tools to + analyze the event without needing to scrape/reconstruct the structured + data out of the formatted string. + +* Pass extra information as extra keyword arguments, even if they aren't + included in the format= string. This information will be displayed in the + "flogtool dump --verbose" output, as well as being available to other + tools. The umid= argument should be passed this way. + +* use log.err for the catch-all addErrback that gets attached to the end of + any given Deferred chain. When used in conjunction with LOGTOTWISTED=1, + log.err() will tell Twisted about the error-nature of the log message, + causing Trial to flunk the test (with an "ERROR" indication that prints a + copy of the Failure, including a traceback). Don't use log.err for events + that are BAD but handled (like hash failures: since these are often + deliberately provoked by test code, they should not cause test failures): + use log.msg(level=BAD) for those instead. + + +Log Messages During Unit Tests +============================== + +If a test is failing and you aren't sure why, start by enabling +FLOGTOTWISTED=1 like this: + + make test FLOGTOTWISTED=1 + +With FLOGTOTWISTED=1, sufficiently-important log events will be written into +_trial_temp/test.log, which may give you more ideas about why the test is +failing. Note, however, that _trial_temp/log.out will not receive messages +below the level=OPERATIONAL threshold, due to this issue: + + + +If that isn't enough, look at the detailed foolscap logging messages instead, +by running the tests like this: + + make test FLOGFILE=flog.out.bz2 FLOGLEVEL=1 FLOGTOTWISTED=1 + +The first environment variable will cause foolscap log events to be written +to ./flog.out.bz2 (instead of merely being recorded in the circular buffers +for the use of remote subscribers or incident reports). The second will cause +all log events to be written out, not just the higher-severity ones. The +third will cause twisted log events (like the markers that indicate when each +unit test is starting and stopping) to be copied into the flogfile, making it +easier to correlate log events with unit tests. + +Enabling this form of logging appears to roughly double the runtime of the +unit tests. The flog.out.bz2 file is approximately 2MB. + +You can then use "flogtool dump" or "flogtool web-viewer" on the resulting +flog.out file. + +("flogtool tail" and the log-gatherer are not useful during unit tests, since +there is no single Tub to which all the log messages are published). + +It is possible for setting these environment variables to cause spurious test +failures in tests with race condition bugs. All known instances of this have +been fixed as of Tahoe-LAFS v1.7.1. diff --git a/docs/logging.txt b/docs/logging.txt deleted file mode 100644 index 641c9ca6..00000000 --- a/docs/logging.txt +++ /dev/null @@ -1,270 +0,0 @@ -= Tahoe Logging = - -1. Overview -2. Realtime Logging -3. Incidents -4. Working with flogfiles -5. Gatherers - 5.1. Incident Gatherer - 5.2. Log Gatherer -6. Local twistd.log files -7. Adding log messages -8. Log Messages During Unit Tests - -== Overview == - -Tahoe uses the Foolscap logging mechanism (known as the "flog" subsystem) to -record information about what is happening inside the Tahoe node. This is -primarily for use by programmers and grid operators who want to find out what -went wrong. - -The foolscap logging system is documented here: - - http://foolscap.lothar.com/docs/logging.html - -The foolscap distribution includes a utility named "flogtool" (usually at -/usr/bin/flogtool) which is used to get access to many foolscap logging -features. - -== Realtime Logging == - -When you are working on Tahoe code, and want to see what the node is doing, -the easiest tool to use is "flogtool tail". This connects to the tahoe node -and subscribes to hear about all log events. These events are then displayed -to stdout, and optionally saved to a file. - -"flogtool tail" connects to the "logport", for which the FURL is stored in -BASEDIR/private/logport.furl . The following command will connect to this -port and start emitting log information: - - flogtool tail BASEDIR/private/logport.furl - -The "--save-to FILENAME" option will save all received events to a file, -where then can be examined later with "flogtool dump" or "flogtool -web-viewer". The --catch-up flag will ask the node to dump all stored events -before subscribing to new ones (without --catch-up, you will only hear about -events that occur after the tool has connected and subscribed). - -== Incidents == - -Foolscap keeps a short list of recent events in memory. When something goes -wrong, it writes all the history it has (and everything that gets logged in -the next few seconds) into a file called an "incident". These files go into -BASEDIR/logs/incidents/ , in a file named -"incident-TIMESTAMP-UNIQUE.flog.bz2". The default definition of "something -goes wrong" is the generation of a log event at the log.WEIRD level or -higher, but other criteria could be implemented. - -The typical "incident report" we've seen in a large Tahoe grid is about 40kB -compressed, representing about 1800 recent events. - -These "flogfiles" have a similar format to the files saved by "flogtool tail ---save-to". They are simply lists of log events, with a small header to -indicate which event triggered the incident. - -The "flogtool dump FLOGFILE" command will take one of these .flog.bz2 files -and print their contents to stdout, one line per event. The raw event -dictionaries can be dumped by using "flogtool dump --verbose FLOGFILE". - -The "flogtool web-viewer" command can be used to examine the flogfile in a -web browser. It runs a small HTTP server and emits the URL on stdout. This -view provides more structure than the output of "flogtool dump": the -parent/child relationships of log events is displayed in a nested format. -"flogtool web-viewer" is still fairly immature. - -== Working with flogfiles == - -The "flogtool filter" command can be used to take a large flogfile (perhaps -one created by the log-gatherer, see below) and copy a subset of events into -a second file. This smaller flogfile may be easier to work with than the -original. The arguments to "flogtool filter" specify filtering criteria: a -predicate that each event must match to be copied into the target file. ---before and --after are used to exclude events outside a given window of -time. --above will retain events above a certain severity level. --from -retains events send by a specific tubid. --strip-facility removes events that -were emitted with a given facility (like foolscap.negotiation or -tahoe.upload). - -== Gatherers == - -In a deployed Tahoe grid, it is useful to get log information automatically -transferred to a central log-gatherer host. This offloads the (admittedly -modest) storage requirements to a different host and provides access to -logfiles from multiple nodes (webapi/storage/helper) nodes in a single place. - -There are two kinds of gatherers. Both produce a FURL which needs to be -placed in the NODEDIR/log_gatherer.furl file (one FURL per line) of the nodes -that are to publish their logs to the gatherer. When the Tahoe node starts, -it will connect to the configured gatherers and offer its logport: the -gatherer will then use the logport to subscribe to hear about events. - -The gatherer will write to files in its working directory, which can then be -examined with tools like "flogtool dump" as described above. - -=== Incident Gatherer === - -The "incident gatherer" only collects Incidents: records of the log events -that occurred just before and slightly after some high-level "trigger event" -was recorded. Each incident is classified into a "category": a short string -that summarizes what sort of problem took place. These classification -functions are written after examining a new/unknown incident. The idea is to -recognize when the same problem is happening multiple times. - -A collection of classification functions that are useful for Tahoe nodes are -provided in misc/incident-gatherer/support_classifiers.py . There is roughly -one category for each log.WEIRD-or-higher level event in the Tahoe source -code. - -The incident gatherer is created with the "flogtool create-incident-gatherer -WORKDIR" command, and started with "tahoe start". The generated -"gatherer.tac" file should be modified to add classifier functions. - -The incident gatherer writes incident names (which are simply the relative -pathname of the incident-*.flog.bz2 file) into classified/CATEGORY. For -example, the classified/mutable-retrieve-uncoordinated-write-error file -contains a list of all incidents which were triggered by an uncoordinated -write that was detected during mutable file retrieval (caused when somebody -changed the contents of the mutable file in between the node's mapupdate step -and the retrieve step). The classified/unknown file contains a list of all -incidents that did not match any of the classification functions. - -At startup, the incident gatherer will automatically reclassify any incident -report which is not mentioned in any of the classified/* files. So the usual -workflow is to examine the incidents in classified/unknown, add a new -classification function, delete classified/unknown, then bound the gatherer -with "tahoe restart WORKDIR". The incidents which can be classified with the -new functions will be added to their own classified/FOO lists, and the -remaining ones will be put in classified/unknown, where the process can be -repeated until all events are classifiable. - -The incident gatherer is still fairly immature: future versions will have a -web interface and an RSS feed, so operations personnel can track problems in -the storage grid. - -In our experience, each Incident takes about two seconds to transfer from the -node which generated it to the gatherer. The gatherer will automatically -catch up to any incidents which occurred while it is offline. - -=== Log Gatherer === - -The "Log Gatherer" subscribes to hear about every single event published by -the connected nodes, regardless of severity. This server writes these log -events into a large flogfile that is rotated (closed, compressed, and -replaced with a new one) on a periodic basis. Each flogfile is named -according to the range of time it represents, with names like -"from-2008-08-26-132256--to-2008-08-26-162256.flog.bz2". The flogfiles -contain events from many different sources, making it easier to correlate -things that happened on multiple machines (such as comparing a client node -making a request with the storage servers that respond to that request). - -The Log Gatherer is created with the "flogtool create-gatherer WORKDIR" -command, and started with "tahoe start". The log_gatherer.furl it creates -then needs to be copied into the BASEDIR/log_gatherer.furl file of all nodes -which should be sending it log events. - -The "flogtool filter" command, described above, is useful to cut down the -potentially-large flogfiles into more a narrowly-focussed form. - -Busy nodes, particularly wapi nodes which are performing recursive -deep-size/deep-stats/deep-check operations, can produce a lot of log events. -To avoid overwhelming the node (and using an unbounded amount of memory for -the outbound TCP queue), publishing nodes will start dropping log events when -the outbound queue grows too large. When this occurs, there will be gaps -(non-sequential event numbers) in the log-gatherer's flogfiles. - -== Local twistd.log files == - -[TODO: not yet true, requires foolscap-0.3.1 and a change to allmydata.node] - -In addition to the foolscap-based event logs, certain high-level events will -be recorded directly in human-readable text form, in the -BASEDIR/logs/twistd.log file (and its rotated old versions: twistd.log.1, -twistd.log.2, etc). This form does not contain as much information as the -flogfiles available through the means described previously, but they are -immediately available to the curious developer, and are retained until the -twistd.log.NN files are explicitly deleted. - -Only events at the log.OPERATIONAL level or higher are bridged to twistd.log -(i.e. not the log.NOISY debugging events). In addition, foolscap internal -events (like connection negotiation messages) are not bridged to twistd.log . - -== Adding log messages == - -When adding new code, the Tahoe developer should add a reasonable number of -new log events. For details, please see the Foolscap logging documentation, -but a few notes are worth stating here: - - * use a facility prefix of "tahoe.", like "tahoe.mutable.publish" - - * assign each severe (log.WEIRD or higher) event a unique message - identifier, as the umid= argument to the log.msg() call. The - misc/coding_tools/make_umid script may be useful for this purpose. This will make it - easier to write a classification function for these messages. - - * use the parent= argument whenever the event is causally/temporally - clustered with its parent. For example, a download process that involves - three sequential hash fetches could announce the send and receipt of those - hash-fetch messages with a parent= argument that ties them to the overall - download process. However, each new wapi download request should be - unparented. - - * use the format= argument in preference to the message= argument. E.g. - use log.msg(format="got %(n)d shares, need %(k)d", n=n, k=k) instead of - log.msg("got %d shares, need %d" % (n,k)). This will allow later tools to - analyze the event without needing to scrape/reconstruct the structured - data out of the formatted string. - - * Pass extra information as extra keyword arguments, even if they aren't - included in the format= string. This information will be displayed in the - "flogtool dump --verbose" output, as well as being available to other - tools. The umid= argument should be passed this way. - - * use log.err for the catch-all addErrback that gets attached to the end of - any given Deferred chain. When used in conjunction with LOGTOTWISTED=1, - log.err() will tell Twisted about the error-nature of the log message, - causing Trial to flunk the test (with an "ERROR" indication that prints a - copy of the Failure, including a traceback). Don't use log.err for events - that are BAD but handled (like hash failures: since these are often - deliberately provoked by test code, they should not cause test failures): - use log.msg(level=BAD) for those instead. - - -== Log Messages During Unit Tests == - -If a test is failing and you aren't sure why, start by enabling -FLOGTOTWISTED=1 like this: - - make test FLOGTOTWISTED=1 - -With FLOGTOTWISTED=1, sufficiently-important log events will be written into -_trial_temp/test.log, which may give you more ideas about why the test is -failing. Note, however, that _trial_temp/log.out will not receive messages -below the level=OPERATIONAL threshold, due to this issue: - - - -If that isn't enough, look at the detailed foolscap logging messages instead, -by running the tests like this: - - make test FLOGFILE=flog.out.bz2 FLOGLEVEL=1 FLOGTOTWISTED=1 - -The first environment variable will cause foolscap log events to be written -to ./flog.out.bz2 (instead of merely being recorded in the circular buffers -for the use of remote subscribers or incident reports). The second will cause -all log events to be written out, not just the higher-severity ones. The -third will cause twisted log events (like the markers that indicate when each -unit test is starting and stopping) to be copied into the flogfile, making it -easier to correlate log events with unit tests. - -Enabling this form of logging appears to roughly double the runtime of the -unit tests. The flog.out.bz2 file is approximately 2MB. - -You can then use "flogtool dump" or "flogtool web-viewer" on the resulting -flog.out file. - -("flogtool tail" and the log-gatherer are not useful during unit tests, since -there is no single Tub to which all the log messages are published). - -It is possible for setting these environment variables to cause spurious test -failures in tests with race condition bugs. All known instances of this have -been fixed as of Tahoe-LAFS v1.7.1. diff --git a/docs/performance.rst b/docs/performance.rst new file mode 100644 index 00000000..4165b776 --- /dev/null +++ b/docs/performance.rst @@ -0,0 +1,162 @@ +============================================ +Performance costs for some common operations +============================================ + +1. `Publishing an A-byte immutable file`_ +2. `Publishing an A-byte mutable file`_ +3. `Downloading B bytes of an A-byte immutable file`_ +4. `Downloading B bytes of an A-byte mutable file`_ +5. `Modifying B bytes of an A-byte mutable file`_ +6. `Inserting/Removing B bytes in an A-byte mutable file`_ +7. `Adding an entry to an A-entry directory`_ +8. `Listing an A entry directory`_ +9. `Performing a file-check on an A-byte file`_ +10. `Performing a file-verify on an A-byte file`_ +11. `Repairing an A-byte file (mutable or immutable)`_ + +Publishing an ``A``-byte immutable file +======================================= + +network: A + +memory footprint: N/k*128KiB + +notes: An immutable file upload requires an additional I/O pass over the entire +source file before the upload process can start, since convergent +encryption derives the encryption key in part from the contents of the +source file. + +Publishing an ``A``-byte mutable file +===================================== + +network: A + +memory footprint: N/k*A + +cpu: O(A) + a large constant for RSA keypair generation + +notes: Tahoe-LAFS generates a new RSA keypair for each mutable file that it +publishes to a grid. This takes up to 1 or 2 seconds on a typical desktop PC. + +Part of the process of encrypting, encoding, and uploading a mutable file to a +Tahoe-LAFS grid requires that the entire file be in memory at once. For larger +files, this may cause Tahoe-LAFS to have an unacceptably large memory footprint +(at least when uploading a mutable file). + +Downloading ``B`` bytes of an ``A``-byte immutable file +======================================================= + +network: B + +memory footprint: 128KiB + +notes: When Tahoe-LAFS 1.8.0 or later is asked to read an arbitrary range +of an immutable file, only the 128-KiB segments that overlap the +requested range will be downloaded. + +(Earlier versions would download from the beginning of the file up +until the end of the requested range, and then continue to download +the rest of the file even after the request was satisfied.) + +Downloading ``B`` bytes of an ``A``-byte mutable file +===================================================== + +network: A + +memory footprint: A + +notes: As currently implemented, mutable files must be downloaded in +their entirety before any part of them can be read. We are +exploring fixes for this; see ticket #393 for more information. + +Modifying ``B`` bytes of an ``A``-byte mutable file +=================================================== + +network: A + +memory footprint: N/k*A + +notes: If you upload a changed version of a mutable file that you +earlier put onto your grid with, say, 'tahoe put --mutable', +Tahoe-LAFS will replace the old file with the new file on the +grid, rather than attempting to modify only those portions of the +file that have changed. Modifying a file in this manner is +essentially uploading the file over again, except that it re-uses +the existing RSA keypair instead of generating a new one. + +Inserting/Removing ``B`` bytes in an ``A``-byte mutable file +============================================================ + +network: A + +memory footprint: N/k*A + +notes: Modifying any part of a mutable file in Tahoe-LAFS requires that +the entire file be downloaded, modified, held in memory while it is +encrypted and encoded, and then re-uploaded. A future version of the +mutable file layout ("LDMF") may provide efficient inserts and +deletes. Note that this sort of modification is mostly used internally +for directories, and isn't something that the WUI, CLI, or other +interfaces will do -- instead, they will simply overwrite the file to +be modified, as described in "Modifying B bytes of an A-byte mutable +file". + +Adding an entry to an ``A``-entry directory +=========================================== + +network: O(A) + +memory footprint: N/k*A + +notes: In Tahoe-LAFS, directories are implemented as specialized mutable +files. So adding an entry to a directory is essentially adding B +(actually, 300-330) bytes somewhere in an existing mutable file. + +Listing an ``A`` entry directory +================================ + +network: O(A) + +memory footprint: N/k*A + +notes: Listing a directory requires that the mutable file storing the +directory be downloaded from the grid. So listing an A entry +directory requires downloading a (roughly) 330 * A byte mutable +file, since each directory entry is about 300-330 bytes in size. + +Performing a file-check on an ``A``-byte file +============================================= + +network: O(S), where S is the number of servers on your grid + +memory footprint: negligible + +notes: To check a file, Tahoe-LAFS queries all the servers that it knows +about. Note that neither of these values directly depend on the size +of the file. This is relatively inexpensive, compared to the verify +and repair operations. + +Performing a file-verify on an ``A``-byte file +============================================== + +network: N/k*A + +memory footprint: N/k*128KiB + +notes: To verify a file, Tahoe-LAFS downloads all of the ciphertext +shares that were originally uploaded to the grid and integrity +checks them. This is, for well-behaved grids, likely to be more +expensive than downloading an A-byte file, since only a fraction +of these shares are necessary to recover the file. + +Repairing an ``A``-byte file (mutable or immutable) +=================================================== + +network: variable; up to around O(A) + +memory footprint: from 128KiB to (1+N/k)*128KiB + +notes: To repair a file, Tahoe-LAFS downloads the file, and generates/uploads +missing shares in the same way as when it initially uploads the file. +So, depending on how many shares are missing, this can be about as +expensive as initially uploading the file in the first place. diff --git a/docs/performance.txt b/docs/performance.txt deleted file mode 100644 index ba9d5055..00000000 --- a/docs/performance.txt +++ /dev/null @@ -1,139 +0,0 @@ -= Performance costs for some common operations = - -1. Publishing an A-byte immutable file -2. Publishing an A-byte mutable file -3. Downloading B bytes of an A-byte immutable file -4. Downloading B bytes of an A-byte mutable file -5. Modifying B bytes of an A-byte mutable file -6. Inserting/Removing B bytes in an A-byte mutable file -7. Adding an entry to an A-entry directory -8. Listing an A entry directory -9. Performing a file-check on an A-byte file -10. Performing a file-verify on an A-byte file -11. Repairing an A-byte file (mutable or immutable) - -== Publishing an A-byte immutable file == - -network: A -memory footprint: N/k*128KiB - -notes: An immutable file upload requires an additional I/O pass over the entire - source file before the upload process can start, since convergent - encryption derives the encryption key in part from the contents of the - source file. - -== Publishing an A-byte mutable file == - -network: A -memory footprint: N/k*A -cpu: O(A) + a large constant for RSA keypair generation - -notes: Tahoe-LAFS generates a new RSA keypair for each mutable file that - it publishes to a grid. This takes up to 1 or 2 seconds on a - typical desktop PC. - - Part of the process of encrypting, encoding, and uploading a - mutable file to a Tahoe-LAFS grid requires that the entire file - be in memory at once. For larger files, this may cause - Tahoe-LAFS to have an unacceptably large memory footprint (at - least when uploading a mutable file). - -== Downloading B bytes of an A-byte immutable file == - -network: B -memory footprint: 128KiB - -notes: When Tahoe-LAFS 1.8.0 or later is asked to read an arbitrary range - of an immutable file, only the 128-KiB segments that overlap the - requested range will be downloaded. - - (Earlier versions would download from the beginning of the file up - until the end of the requested range, and then continue to download - the rest of the file even after the request was satisfied.) - -== Downloading B bytes of an A-byte mutable file == - -network: A -memory footprint: A - -notes: As currently implemented, mutable files must be downloaded in - their entirety before any part of them can be read. We are - exploring fixes for this; see ticket #393 for more information. - -== Modifying B bytes of an A-byte mutable file == - -network: A -memory footprint: N/k*A - -notes: If you upload a changed version of a mutable file that you - earlier put onto your grid with, say, 'tahoe put --mutable', - Tahoe-LAFS will replace the old file with the new file on the - grid, rather than attempting to modify only those portions of the - file that have changed. Modifying a file in this manner is - essentially uploading the file over again, except that it re-uses - the existing RSA keypair instead of generating a new one. - -== Inserting/Removing B bytes in an A-byte mutable file == - -network: A -memory footprint: N/k*A - -notes: Modifying any part of a mutable file in Tahoe-LAFS requires that - the entire file be downloaded, modified, held in memory while it is - encrypted and encoded, and then re-uploaded. A future version of the - mutable file layout ("LDMF") may provide efficient inserts and - deletes. Note that this sort of modification is mostly used internally - for directories, and isn't something that the WUI, CLI, or other - interfaces will do -- instead, they will simply overwrite the file to - be modified, as described in "Modifying B bytes of an A-byte mutable - file". - -== Adding an entry to an A-entry directory == - -network: O(A) -memory footprint: N/k*A - -notes: In Tahoe-LAFS, directories are implemented as specialized mutable - files. So adding an entry to a directory is essentially adding B - (actually, 300-330) bytes somewhere in an existing mutable file. - -== Listing an A entry directory == - -network: O(A) -memory footprint: N/k*A - -notes: Listing a directory requires that the mutable file storing the - directory be downloaded from the grid. So listing an A entry - directory requires downloading a (roughly) 330 * A byte mutable - file, since each directory entry is about 300-330 bytes in size. - -== Performing a file-check on an A-byte file == - -network: O(S), where S is the number of servers on your grid -memory footprint: negligible - -notes: To check a file, Tahoe-LAFS queries all the servers that it knows - about. Note that neither of these values directly depend on the size - of the file. This is relatively inexpensive, compared to the verify - and repair operations. - -== Performing a file-verify on an A-byte file == - -network: N/k*A -memory footprint: N/k*128KiB - -notes: To verify a file, Tahoe-LAFS downloads all of the ciphertext - shares that were originally uploaded to the grid and integrity - checks them. This is, for well-behaved grids, likely to be more - expensive than downloading an A-byte file, since only a fraction - of these shares are necessary to recover the file. - -== Repairing an A-byte file (mutable or immutable) == - -network: variable; up to around O(A) -memory footprint: from 128KiB to (1+N/k)*128KiB - -notes: To repair a file, Tahoe-LAFS downloads the file, and generates/uploads - missing shares in the same way as when it initially uploads the file. - So, depending on how many shares are missing, this can be about as - expensive as initially uploading the file in the first place. diff --git a/docs/stats.rst b/docs/stats.rst new file mode 100644 index 00000000..47f6b04b --- /dev/null +++ b/docs/stats.rst @@ -0,0 +1,337 @@ +================ +Tahoe Statistics +================ + +1. `Overview`_ +2. `Statistics Categories`_ +3. `Running a Tahoe Stats-Gatherer Service`_ +4. `Using Munin To Graph Stats Values`_ + +Overview +======== + +Each Tahoe node collects and publishes statistics about its operations as it +runs. These include counters of how many files have been uploaded and +downloaded, CPU usage information, performance numbers like latency of +storage server operations, and available disk space. + +The easiest way to see the stats for any given node is use the web interface. +From the main "Welcome Page", follow the "Operational Statistics" link inside +the small "This Client" box. If the welcome page lives at +http://localhost:3456/, then the statistics page will live at +http://localhost:3456/statistics . This presents a summary of the stats +block, along with a copy of the raw counters. To obtain just the raw counters +(in JSON format), use /statistics?t=json instead. + +Statistics Categories +===================== + +The stats dictionary contains two keys: 'counters' and 'stats'. 'counters' +are strictly counters: they are reset to zero when the node is started, and +grow upwards. 'stats' are non-incrementing values, used to measure the +current state of various systems. Some stats are actually booleans, expressed +as '1' for true and '0' for false (internal restrictions require all stats +values to be numbers). + +Under both the 'counters' and 'stats' dictionaries, each individual stat has +a key with a dot-separated name, breaking them up into groups like +'cpu_monitor' and 'storage_server'. + +The currently available stats (as of release 1.6.0 or so) are described here: + +**counters.storage_server.\*** + + this group counts inbound storage-server operations. They are not provided + by client-only nodes which have been configured to not run a storage server + (with [storage]enabled=false in tahoe.cfg) + + allocate, write, close, abort + these are for immutable file uploads. 'allocate' is incremented when a + client asks if it can upload a share to the server. 'write' is + incremented for each chunk of data written. 'close' is incremented when + the share is finished. 'abort' is incremented if the client abandons + the upload. + + get, read + these are for immutable file downloads. 'get' is incremented + when a client asks if the server has a specific share. 'read' is + incremented for each chunk of data read. + + readv, writev + these are for immutable file creation, publish, and retrieve. 'readv' + is incremented each time a client reads part of a mutable share. + 'writev' is incremented each time a client sends a modification + request. + + add-lease, renew, cancel + these are for share lease modifications. 'add-lease' is incremented + when an 'add-lease' operation is performed (which either adds a new + lease or renews an existing lease). 'renew' is for the 'renew-lease' + operation (which can only be used to renew an existing one). 'cancel' + is used for the 'cancel-lease' operation. + + bytes_freed + this counts how many bytes were freed when a 'cancel-lease' + operation removed the last lease from a share and the share + was thus deleted. + + bytes_added + this counts how many bytes were consumed by immutable share + uploads. It is incremented at the same time as the 'close' + counter. + +**stats.storage_server.\*** + + allocated + this counts how many bytes are currently 'allocated', which + tracks the space that will eventually be consumed by immutable + share upload operations. The stat is increased as soon as the + upload begins (at the same time the 'allocated' counter is + incremented), and goes back to zero when the 'close' or 'abort' + message is received (at which point the 'disk_used' stat should + incremented by the same amount). + + disk_total, disk_used, disk_free_for_root, disk_free_for_nonroot, disk_avail, reserved_space + these all reflect disk-space usage policies and status. + 'disk_total' is the total size of disk where the storage + server's BASEDIR/storage/shares directory lives, as reported + by /bin/df or equivalent. 'disk_used', 'disk_free_for_root', + and 'disk_free_for_nonroot' show related information. + 'reserved_space' reports the reservation configured by the + tahoe.cfg [storage]reserved_space value. 'disk_avail' + reports the remaining disk space available for the Tahoe + server after subtracting reserved_space from disk_avail. All + values are in bytes. + + accepting_immutable_shares + this is '1' if the storage server is currently accepting uploads of + immutable shares. It may be '0' if a server is disabled by + configuration, or if the disk is full (i.e. disk_avail is less than + reserved_space). + + total_bucket_count + this counts the number of 'buckets' (i.e. unique + storage-index values) currently managed by the storage + server. It indicates roughly how many files are managed + by the server. + + latencies.*.* + these stats keep track of local disk latencies for + storage-server operations. A number of percentile values are + tracked for many operations. For example, + 'storage_server.latencies.readv.50_0_percentile' records the + median response time for a 'readv' request. All values are in + seconds. These are recorded by the storage server, starting + from the time the request arrives (post-deserialization) and + ending when the response begins serialization. As such, they + are mostly useful for measuring disk speeds. The operations + tracked are the same as the counters.storage_server.* counter + values (allocate, write, close, get, read, add-lease, renew, + cancel, readv, writev). The percentile values tracked are: + mean, 01_0_percentile, 10_0_percentile, 50_0_percentile, + 90_0_percentile, 95_0_percentile, 99_0_percentile, + 99_9_percentile. (the last value, 99.9 percentile, means that + 999 out of the last 1000 operations were faster than the + given number, and is the same threshold used by Amazon's + internal SLA, according to the Dynamo paper). + +**counters.uploader.files_uploaded** + +**counters.uploader.bytes_uploaded** + +**counters.downloader.files_downloaded** + +**counters.downloader.bytes_downloaded** + + These count client activity: a Tahoe client will increment these when it + uploads or downloads an immutable file. 'files_uploaded' is incremented by + one for each operation, while 'bytes_uploaded' is incremented by the size of + the file. + +**counters.mutable.files_published** + +**counters.mutable.bytes_published** + +**counters.mutable.files_retrieved** + +**counters.mutable.bytes_retrieved** + + These count client activity for mutable files. 'published' is the act of + changing an existing mutable file (or creating a brand-new mutable file). + 'retrieved' is the act of reading its current contents. + +**counters.chk_upload_helper.\*** + + These count activity of the "Helper", which receives ciphertext from clients + and performs erasure-coding and share upload for files that are not already + in the grid. The code which implements these counters is in + src/allmydata/immutable/offloaded.py . + + upload_requests + incremented each time a client asks to upload a file + upload_already_present: incremented when the file is already in the grid + + upload_need_upload + incremented when the file is not already in the grid + + resumes + incremented when the helper already has partial ciphertext for + the requested upload, indicating that the client is resuming an + earlier upload + + fetched_bytes + this counts how many bytes of ciphertext have been fetched + from uploading clients + + encoded_bytes + this counts how many bytes of ciphertext have been + encoded and turned into successfully-uploaded shares. If no + uploads have failed or been abandoned, encoded_bytes should + eventually equal fetched_bytes. + +**stats.chk_upload_helper.\*** + + These also track Helper activity: + + active_uploads + how many files are currently being uploaded. 0 when idle. + + incoming_count + how many cache files are present in the incoming/ directory, + which holds ciphertext files that are still being fetched + from the client + + incoming_size + total size of cache files in the incoming/ directory + + incoming_size_old + total size of 'old' cache files (more than 48 hours) + + encoding_count + how many cache files are present in the encoding/ directory, + which holds ciphertext files that are being encoded and + uploaded + + encoding_size + total size of cache files in the encoding/ directory + + encoding_size_old + total size of 'old' cache files (more than 48 hours) + +**stats.node.uptime** + how many seconds since the node process was started + +**stats.cpu_monitor.\*** + + 1min_avg, 5min_avg, 15min_avg + estimate of what percentage of system CPU time was consumed by the + node process, over the given time interval. Expressed as a float, 0.0 + for 0%, 1.0 for 100% + + total + estimate of total number of CPU seconds consumed by node since + the process was started. Ticket #472 indicates that .total may + sometimes be negative due to wraparound of the kernel's counter. + +**stats.load_monitor.\*** + + When enabled, the "load monitor" continually schedules a one-second + callback, and measures how late the response is. This estimates system load + (if the system is idle, the response should be on time). This is only + enabled if a stats-gatherer is configured. + + avg_load + average "load" value (seconds late) over the last minute + + max_load + maximum "load" value over the last minute + + +Running a Tahoe Stats-Gatherer Service +====================================== + +The "stats-gatherer" is a simple daemon that periodically collects stats from +several tahoe nodes. It could be useful, e.g., in a production environment, +where you want to monitor dozens of storage servers from a central management +host. It merely gatherers statistics from many nodes into a single place: it +does not do any actual analysis. + +The stats gatherer listens on a network port using the same Foolscap_ +connection library that Tahoe clients use to connect to storage servers. +Tahoe nodes can be configured to connect to the stats gatherer and publish +their stats on a periodic basis. (In fact, what happens is that nodes connect +to the gatherer and offer it a second FURL which points back to the node's +"stats port", which the gatherer then uses to pull stats on a periodic basis. +The initial connection is flipped to allow the nodes to live behind NAT +boxes, as long as the stats-gatherer has a reachable IP address.) + +.. _Foolscap: http://foolscap.lothar.com/trac + +The stats-gatherer is created in the same fashion as regular tahoe client +nodes and introducer nodes. Choose a base directory for the gatherer to live +in (but do not create the directory). Then run: + +:: + + tahoe create-stats-gatherer $BASEDIR + +and start it with "tahoe start $BASEDIR". Once running, the gatherer will +write a FURL into $BASEDIR/stats_gatherer.furl . + +To configure a Tahoe client/server node to contact the stats gatherer, copy +this FURL into the node's tahoe.cfg file, in a section named "[client]", +under a key named "stats_gatherer.furl", like so: + +:: + + [client] + stats_gatherer.furl = pb://qbo4ktl667zmtiuou6lwbjryli2brv6t@192.168.0.8:49997/wxycb4kaexzskubjnauxeoptympyf45y + +or simply copy the stats_gatherer.furl file into the node's base directory +(next to the tahoe.cfg file): it will be interpreted in the same way. + +The first time it is started, the gatherer will listen on a random unused TCP +port, so it should not conflict with anything else that you have running on +that host at that time. On subsequent runs, it will re-use the same port (to +keep its FURL consistent). To explicitly control which port it uses, write +the desired portnumber into a file named "portnum" (i.e. $BASEDIR/portnum), +and the next time the gatherer is started, it will start listening on the +given port. The portnum file is actually a "strports specification string", +as described in docs/configuration.txt . + +Once running, the stats gatherer will create a standard python "pickle" file +in $BASEDIR/stats.pickle . Once a minute, the gatherer will pull stats +information from every connected node and write them into the pickle. The +pickle will contain a dictionary, in which node identifiers (known as "tubid" +strings) are the keys, and the values are a dict with 'timestamp', +'nickname', and 'stats' keys. d[tubid][stats] will contain the stats +dictionary as made available at http://localhost:3456/statistics?t=json . The +pickle file will only contain the most recent update from each node. + +Other tools can be built to examine these stats and render them into +something useful. For example, a tool could sum the +"storage_server.disk_avail' values from all servers to compute a +total-disk-available number for the entire grid (however, the "disk watcher" +daemon, in misc/operations_helpers/spacetime/, is better suited for this specific task). + +Using Munin To Graph Stats Values +================================= + +The misc/munin/ directory contains various plugins to graph stats for Tahoe +nodes. They are intended for use with the Munin_ system-management tool, which +typically polls target systems every 5 minutes and produces a web page with +graphs of various things over multiple time scales (last hour, last month, +last year). + +.. _Munin: http://munin-monitoring.org/ + +Most of the plugins are designed to pull stats from a single Tahoe node, and +are configured with the e.g. http://localhost:3456/statistics?t=json URL. The +"tahoe_stats" plugin is designed to read from the pickle file created by the +stats-gatherer. Some plugins are to be used with the disk watcher, and a few +(like tahoe_nodememory) are designed to watch the node processes directly +(and must therefore run on the same host as the target node). + +Please see the docstrings at the beginning of each plugin for details, and +the "tahoe-conf" file for notes about configuration and installing these +plugins into a Munin environment. diff --git a/docs/stats.txt b/docs/stats.txt deleted file mode 100644 index 8cb8dfb4..00000000 --- a/docs/stats.txt +++ /dev/null @@ -1,276 +0,0 @@ -= Tahoe Statistics = - -1. Overview -2. Statistics Categories -3. Running a Tahoe Stats-Gatherer Service -4. Using Munin To Graph Stats Values - -== Overview == - -Each Tahoe node collects and publishes statistics about its operations as it -runs. These include counters of how many files have been uploaded and -downloaded, CPU usage information, performance numbers like latency of -storage server operations, and available disk space. - -The easiest way to see the stats for any given node is use the web interface. -From the main "Welcome Page", follow the "Operational Statistics" link inside -the small "This Client" box. If the welcome page lives at -http://localhost:3456/, then the statistics page will live at -http://localhost:3456/statistics . This presents a summary of the stats -block, along with a copy of the raw counters. To obtain just the raw counters -(in JSON format), use /statistics?t=json instead. - -== Statistics Categories == - -The stats dictionary contains two keys: 'counters' and 'stats'. 'counters' -are strictly counters: they are reset to zero when the node is started, and -grow upwards. 'stats' are non-incrementing values, used to measure the -current state of various systems. Some stats are actually booleans, expressed -as '1' for true and '0' for false (internal restrictions require all stats -values to be numbers). - -Under both the 'counters' and 'stats' dictionaries, each individual stat has -a key with a dot-separated name, breaking them up into groups like -'cpu_monitor' and 'storage_server'. - -The currently available stats (as of release 1.6.0 or so) are described here: - -counters.storage_server.*: this group counts inbound storage-server - operations. They are not provided by client-only - nodes which have been configured to not run a - storage server (with [storage]enabled=false in - tahoe.cfg) - allocate, write, close, abort: these are for immutable file uploads. - 'allocate' is incremented when a client asks - if it can upload a share to the server. - 'write' is incremented for each chunk of - data written. 'close' is incremented when - the share is finished. 'abort' is - incremented if the client abandons the - uploaed. - get, read: these are for immutable file downloads. 'get' is incremented - when a client asks if the server has a specific share. 'read' is - incremented for each chunk of data read. - readv, writev: these are for immutable file creation, publish, and - retrieve. 'readv' is incremented each time a client reads - part of a mutable share. 'writev' is incremented each time a - client sends a modification request. - add-lease, renew, cancel: these are for share lease modifications. - 'add-lease' is incremented when an 'add-lease' - operation is performed (which either adds a new - lease or renews an existing lease). 'renew' is - for the 'renew-lease' operation (which can only - be used to renew an existing one). 'cancel' is - used for the 'cancel-lease' operation. - bytes_freed: this counts how many bytes were freed when a 'cancel-lease' - operation removed the last lease from a share and the share - was thus deleted. - bytes_added: this counts how many bytes were consumed by immutable share - uploads. It is incremented at the same time as the 'close' - counter. - -stats.storage_server.*: - allocated: this counts how many bytes are currently 'allocated', which - tracks the space that will eventually be consumed by immutable - share upload operations. The stat is increased as soon as the - upload begins (at the same time the 'allocated' counter is - incremented), and goes back to zero when the 'close' or 'abort' - message is received (at which point the 'disk_used' stat should - incremented by the same amount). - disk_total - disk_used - disk_free_for_root - disk_free_for_nonroot - disk_avail - reserved_space: these all reflect disk-space usage policies and status. - 'disk_total' is the total size of disk where the storage - server's BASEDIR/storage/shares directory lives, as reported - by /bin/df or equivalent. 'disk_used', 'disk_free_for_root', - and 'disk_free_for_nonroot' show related information. - 'reserved_space' reports the reservation configured by the - tahoe.cfg [storage]reserved_space value. 'disk_avail' - reports the remaining disk space available for the Tahoe - server after subtracting reserved_space from disk_avail. All - values are in bytes. - accepting_immutable_shares: this is '1' if the storage server is currently - accepting uploads of immutable shares. It may be - '0' if a server is disabled by configuration, or - if the disk is full (i.e. disk_avail is less - than reserved_space). - total_bucket_count: this counts the number of 'buckets' (i.e. unique - storage-index values) currently managed by the storage - server. It indicates roughly how many files are managed - by the server. - latencies.*.*: these stats keep track of local disk latencies for - storage-server operations. A number of percentile values are - tracked for many operations. For example, - 'storage_server.latencies.readv.50_0_percentile' records the - median response time for a 'readv' request. All values are in - seconds. These are recorded by the storage server, starting - from the time the request arrives (post-deserialization) and - ending when the response begins serialization. As such, they - are mostly useful for measuring disk speeds. The operations - tracked are the same as the counters.storage_server.* counter - values (allocate, write, close, get, read, add-lease, renew, - cancel, readv, writev). The percentile values tracked are: - mean, 01_0_percentile, 10_0_percentile, 50_0_percentile, - 90_0_percentile, 95_0_percentile, 99_0_percentile, - 99_9_percentile. (the last value, 99.9 percentile, means that - 999 out of the last 1000 operations were faster than the - given number, and is the same threshold used by Amazon's - internal SLA, according to the Dynamo paper). - -counters.uploader.files_uploaded -counters.uploader.bytes_uploaded -counters.downloader.files_downloaded -counters.downloader.bytes_downloaded - - These count client activity: a Tahoe client will increment these when it - uploads or downloads an immutable file. 'files_uploaded' is incremented by - one for each operation, while 'bytes_uploaded' is incremented by the size of - the file. - -counters.mutable.files_published -counters.mutable.bytes_published -counters.mutable.files_retrieved -counters.mutable.bytes_retrieved - - These count client activity for mutable files. 'published' is the act of - changing an existing mutable file (or creating a brand-new mutable file). - 'retrieved' is the act of reading its current contents. - -counters.chk_upload_helper.* - - These count activity of the "Helper", which receives ciphertext from clients - and performs erasure-coding and share upload for files that are not already - in the grid. The code which implements these counters is in - src/allmydata/immutable/offloaded.py . - - upload_requests: incremented each time a client asks to upload a file - upload_already_present: incremented when the file is already in the grid - upload_need_upload: incremented when the file is not already in the grid - resumes: incremented when the helper already has partial ciphertext for - the requested upload, indicating that the client is resuming an - earlier upload - fetched_bytes: this counts how many bytes of ciphertext have been fetched - from uploading clients - encoded_bytes: this counts how many bytes of ciphertext have been - encoded and turned into successfully-uploaded shares. If no - uploads have failed or been abandoned, encoded_bytes should - eventually equal fetched_bytes. - -stats.chk_upload_helper.* - - These also track Helper activity: - - active_uploads: how many files are currently being uploaded. 0 when idle. - incoming_count: how many cache files are present in the incoming/ directory, - which holds ciphertext files that are still being fetched - from the client - incoming_size: total size of cache files in the incoming/ directory - incoming_size_old: total size of 'old' cache files (more than 48 hours) - encoding_count: how many cache files are present in the encoding/ directory, - which holds ciphertext files that are being encoded and - uploaded - encoding_size: total size of cache files in the encoding/ directory - encoding_size_old: total size of 'old' cache files (more than 48 hours) - -stats.node.uptime: how many seconds since the node process was started - -stats.cpu_monitor.*: - .1min_avg, 5min_avg, 15min_avg: estimate of what percentage of system CPU - time was consumed by the node process, over - the given time interval. Expressed as a - float, 0.0 for 0%, 1.0 for 100% - .total: estimate of total number of CPU seconds consumed by node since - the process was started. Ticket #472 indicates that .total may - sometimes be negative due to wraparound of the kernel's counter. - -stats.load_monitor.*: - When enabled, the "load monitor" continually schedules a one-second - callback, and measures how late the response is. This estimates system load - (if the system is idle, the response should be on time). This is only - enabled if a stats-gatherer is configured. - - .avg_load: average "load" value (seconds late) over the last minute - .max_load: maximum "load" value over the last minute - - -== Running a Tahoe Stats-Gatherer Service == - -The "stats-gatherer" is a simple daemon that periodically collects stats from -several tahoe nodes. It could be useful, e.g., in a production environment, -where you want to monitor dozens of storage servers from a central management -host. It merely gatherers statistics from many nodes into a single place: it -does not do any actual analysis. - -The stats gatherer listens on a network port using the same Foolscap -connection library that Tahoe clients use to connect to storage servers. -Tahoe nodes can be configured to connect to the stats gatherer and publish -their stats on a periodic basis. (in fact, what happens is that nodes connect -to the gatherer and offer it a second FURL which points back to the node's -"stats port", which the gatherer then uses to pull stats on a periodic basis. -The initial connection is flipped to allow the nodes to live behind NAT -boxes, as long as the stats-gatherer has a reachable IP address) - -The stats-gatherer is created in the same fashion as regular tahoe client -nodes and introducer nodes. Choose a base directory for the gatherer to live -in (but do not create the directory). Then run: - - tahoe create-stats-gatherer $BASEDIR - -and start it with "tahoe start $BASEDIR". Once running, the gatherer will -write a FURL into $BASEDIR/stats_gatherer.furl . - -To configure a Tahoe client/server node to contact the stats gatherer, copy -this FURL into the node's tahoe.cfg file, in a section named "[client]", -under a key named "stats_gatherer.furl", like so: - - [client] - stats_gatherer.furl = pb://qbo4ktl667zmtiuou6lwbjryli2brv6t@192.168.0.8:49997/wxycb4kaexzskubjnauxeoptympyf45y - -or simply copy the stats_gatherer.furl file into the node's base directory -(next to the tahoe.cfg file): it will be interpreted in the same way. - -The first time it is started, the gatherer will listen on a random unused TCP -port, so it should not conflict with anything else that you have running on -that host at that time. On subsequent runs, it will re-use the same port (to -keep its FURL consistent). To explicitly control which port it uses, write -the desired portnumber into a file named "portnum" (i.e. $BASEDIR/portnum), -and the next time the gatherer is started, it will start listening on the -given port. The portnum file is actually a "strports specification string", -as described in docs/configuration.txt . - -Once running, the stats gatherer will create a standard python "pickle" file -in $BASEDIR/stats.pickle . Once a minute, the gatherer will pull stats -information from every connected node and write them into the pickle. The -pickle will contain a dictionary, in which node identifiers (known as "tubid" -strings) are the keys, and the values are a dict with 'timestamp', -'nickname', and 'stats' keys. d[tubid][stats] will contain the stats -dictionary as made available at http://localhost:3456/statistics?t=json . The -pickle file will only contain the most recent update from each node. - -Other tools can be built to examine these stats and render them into -something useful. For example, a tool could sum the -"storage_server.disk_avail' values from all servers to compute a -total-disk-available number for the entire grid (however, the "disk watcher" -daemon, in misc/operations_helpers/spacetime/, is better suited for this specific task). - -== Using Munin To Graph Stats Values == - -The misc/munin/ directory contains various plugins to graph stats for Tahoe -nodes. They are intended for use with the Munin system-management tool, which -typically polls target systems every 5 minutes and produces a web page with -graphs of various things over multiple time scales (last hour, last month, -last year). - -Most of the plugins are designed to pull stats from a single Tahoe node, and -are configured with the e.g. http://localhost:3456/statistics?t=json URL. The -"tahoe_stats" plugin is designed to read from the pickle file created by the -stats-gatherer. Some plugins are to be used with the disk watcher, and a few -(like tahoe_nodememory) are designed to watch the node processes directly -(and must therefore run on the same host as the target node). - -Please see the docstrings at the beginning of each plugin for details, and -the "tahoe-conf" file for notes about configuration and installing these -plugins into a Munin environment.