From: Zooko O'Whielacronx Date: Mon, 28 Jan 2008 18:33:46 +0000 (-0700) Subject: docs: edit architecture.txt with Amber's help X-Git-Url: https://git.rkrishnan.org/tahoe_css?a=commitdiff_plain;h=5bc69329fc5a0d324f66eda1a7717ce24ba8848a;p=tahoe-lafs%2Ftahoe-lafs.git docs: edit architecture.txt with Amber's help --- diff --git a/docs/architecture.txt b/docs/architecture.txt index 2da00407..6a83a999 100644 --- a/docs/architecture.txt +++ b/docs/architecture.txt @@ -3,54 +3,43 @@ OVERVIEW -The high-level view of this system consists of three layers: the grid, the -virtual drive, and the application that sits on top. - -The lowest layer is the "grid", basically a DHT (Distributed Hash Table) -which maps capabilities to data. The capabilities are relatively short ascii -strings, and each is used as a reference to an arbitrary-length sequence of -data bytes. This data is encrypted and distributed around the grid across a -large number of nodes, such that a large fraction of the nodes would have to -be unavailable for the data to become unavailable. - -The middle layer is the virtual drive: a directed-acyclic-graph-shaped data -structure in which the intermediate nodes are directories and the leaf nodes -are files. The leaf nodes contain only the file data -- they don't contain -any metadata about the file except for the length. The edges that lead to -leaf nodes have metadata attached to them about the file that they point to. -Therefore, the same file may have different metadata associated with it if it -is dereferenced through different edges. - -The top layer is where the applications that use this virtual drive operate. -Allmydata uses this for a backup service, in which the application copies the -files to be backed up from the local disk into the virtual drive on a -periodic basis. By providing read-only access to the same virtual drive -later, a user can recover older versions of their files. Other sorts of -applications can run on top of the virtual drive, of course -- anything that -has a use for a secure, decentralized, fault-tolerant filesystem. - - -THE BIG GRID OF STORAGE SERVERS - -Underlying the grid is a large collection of peer nodes. These are processes -running on a wide variety of computers, all of which know about each other in -some way or another. They establish TCP connections to one another using -Foolscap, an encrypted+authenticated remote message passing library (using -TLS connections and self-authenticating identifiers called "FURLs"). - -Each peer offers certain services to the others. The primary service is the -StorageServer, which offers to hold data. Each StorageServer has a quota, and -it will reject storage requests that would cause it to consume more space -than it wants to provide. - -This storage is used to hold "shares", which are encoded pieces of files in -the grid. There are many shares for each file, typically between 10 and 100 -(the exact number depends upon the tradeoffs made between reliability, -overhead, and storage space consumed). The files are indexed by a -"StorageIndex", which is derived from the encryption key, which is derived -from the contents of the file. Leases are indexed by StorageIndex, and a -single StorageServer may hold multiple shares for the corresponding -file. Multiple peers can hold leases on the same file. +At a high-level this system consists of three layers: the grid, the +filesystem, and the application. + +The lowest layer is the "grid", a mapping from capabilities to +data. The capabilities are relatively short ascii strings, each used +as a reference to an arbitrary-length sequence of data bytes. This +data is encrypted and distributed across a number of nodes, such that +it will survive the loss of most of the nodes. + +The middle layer is the decentralized filesystem: a directed graph in +which the intermediate nodes are directories and the leaf nodes are +files. The leaf nodes contain only the file data -- they contain no +metadata about the file other than the length. The edges leading to +leaf nodes have metadata attached to them about the file they point +to. Therefore, the same file may be associated with different +metadata if it is dereferenced through different edges. + +The top layer consists of the applications using the filesystem. +Allmydata.com uses it for a backup service: the application +periodically copies files from the local disk onto the decentralized +filesystem We later provide read-only access to those files, allowing +users to recover them. The filesystem can be used by other +applications, too. + + +THE GRID OF STORAGE SERVERS + +Underlying the grid is a collection of peer nodes -- processes running +on computers. They establish TCP connections to each other using +Foolscap, a secure remote message passing library. + +Each peer offers certain services to the others. The primary service +is that of the storage server, which holds data in the form of +"shares". Shares are encoded pieces of files. There are a +configurable number of shares for each file, 12 by default. Normally, +each share is stored on a separate server, but a single server can +hold multiple shares for a single file. Peers learn about each other through the "introducer". Each peer connects to this central introducer at startup, and receives a list of all other peers