From: Brian Warner Date: Mon, 9 Feb 2009 00:47:48 +0000 (-0700) Subject: docs/specifications: add an outline of the spec documents we'd like to have some day X-Git-Tag: allmydata-tahoe-1.3.0~63 X-Git-Url: https://git.rkrishnan.org/specifications/components/com_hotproperty/reliability?a=commitdiff_plain;h=bbef104315956b558ea82a782682b37ef2f14df2;p=tahoe-lafs%2Ftahoe-lafs.git docs/specifications: add an outline of the spec documents we'd like to have some day --- diff --git a/docs/specifications/outline.txt b/docs/specifications/outline.txt new file mode 100644 index 00000000..204878e4 --- /dev/null +++ b/docs/specifications/outline.txt @@ -0,0 +1,210 @@ += Specification Document Outline = + +While we do not yet have a clear set of specification documents for Tahoe +(explaining the file formats, so that others can write interoperable +implementations), this document is intended to lay out an outline for what +these specs ought to contain. Think of this as the ISO 7-Layer Model for +Tahoe. + +We currently imagine 4 documents. + +== #1: Share Format, Encoding Algorithm == + +This document will describe the way that files are encrypted and encoded into +shares. It will include a specification of the share format, and explain both +the encoding and decoding algorithms. It will cover both mutable and +immutable files. + +The immutable encoding algorithm, as described by this document, will start +with a plaintext series of bytes, encoding parameters "k" and "N", and either +an encryption key or a mechanism for deterministically deriving the key from +the plaintext (the CHK specification). The algorithm will end with a set of N +shares, and a set of values that must be included in the filecap to provide +confidentiality (the encryption key) and integrity (the UEB hash). + +The immutable decoding algorithm will start with the filecap values (key and +UEB hash) and "k" shares. It will explain how to validate the shares against +the integrity information, how to reverse the erasure-coding, and how to +decrypt the resulting ciphertext. It will result in the original plaintext +bytes (or some subrange thereof). + +The sections on mutable files will contain similar information. + +This document is *not* responsible for explaining the filecap format, since +full filecaps may need to contain additional information as described in +document #3. Likewise it it not responsible for explaining where to put the +generated shares or where to find them again later. + +It is also not responsible for explaining the access control mechanisms +surrounding share upload, download, or modification ("Accounting" is the +business of controlling share upload to conserve space, and mutable file +shares require some sort of access control to prevent non-writecap holders +from destroying shares). We don't yet have a document dedicated to explaining +these, but let's call it "Access Control" for now. + + +== #2: Share Exchange Protocol == + +This document explains the wire-protocol used to upload, download, and modify +shares on the various storage servers. + +Given the N shares created by the algorithm described in document #1, and a +set of servers who are willing to accept those shares, the protocols in this +document will be sufficient to get the shares onto the servers. Likewise, +given a set of servers who hold at least k shares, these protocols will be +enough to retrieve the shares necessary to begin the decoding process +described in document #1. The notion of a "storage index" is used to +reference a particular share: the storage index is generated by the encoding +process described in document #1. + +This document does *not* describe how to identify or choose those servers, +rather it explains what to do once they have been selected (by the mechanisms +in document #3). + +This document also explains the protocols that a client uses to ask a server +whether or not it is willing to accept an uploaded share, and whether it has +a share available for download. These protocols will be used by the +mechanisms in document #3 to help decide where the shares should be placed. + +Where cryptographic mechanisms are necessary to implement access-control +policy, this document will explain those mechanisms. + +In the future, Tahoe will be able to use multiple protocols to speak to +storage servers. There will be alternative forms of this document, one for +each protocol. The first one to be written will describe the Foolscap-based +protocol that tahoe currently uses, but we anticipate a subsequent one to +describe a more HTTP-based protocol. + +== #3: Server Selection Algorithm, filecap format == + +This document has two interrelated purposes. With a deeper understanding of +the issues, we may be able to separate these more cleanly in the future. + +The first purpose is to explain the server selection algorithm. Given a set +of N shares, where should those shares be uploaded? Given some information +stored about a previously-uploaded file, how should a downloader locate and +recover at least k shares? Given a previously-uploaded mutable file, how +should a modifier locate all (or most of) the shares with a reasonable amount +of work? + +This question implies many things, all of which should be explained in this +document: + + * the notion of a "grid", nominally a set of servers who could potentially + hold shares, which might change over time + * a way to configure which grid should be used + * a way to discover which servers are a part of that grid + * a way to decide which servers are reliable enough to be worth sending + shares + * an algorithm to handle servers which refuse shares + * a way for a downloader to locate which servers have shares + * a way to choose which shares should be used for download + +The server-selection algorithm has several obviously competing goals: + + * minimize the amount of work that must be done during upload + * minimize the total storage resources used + * avoid "hot spots", balance load among multiple servers + * maximize the chance that enough shares will be downloadable later, by + uploading lots of shares, and by placing them on reliable servers + * minimize the work that the future downloader must do + * tolerate temporary server failures, permanent server departure, and new + server insertions + * minimize the amount of information that must be added to the filecap + +The server-selection algorithm is defined in some context: some set of +expectations about the servers or grid with which it is expected to operate. +Different algorithms are appropriate for different situtations, so there will +be multiple alternatives of this document. + +The first version of this document will describe the algorithm that the +current (1.3.0) release uses, which is heavily weighted towards the two main +use case scenarios for which Tahoe has been designed: the small, stable +friendnet, and the allmydata.com managed grid. In both cases, we assume that +the storage servers are online most of the time, they are uniformly highly +reliable, and that the set of servers does not change very rapidly. The +server-selection algorithm for this environment uses a permuted server list +to achieve load-balancing, uses all servers identically, and derives the +permutation key from the storage index to avoid adding a new field to the +filecap. + +An alternative algorithm could give clients more precise control over share +placement, for example by a user who wished to make sure that k+1 shares are +located in each datacenter (to allow downloads to take place using only local +bandwidth). This algorithm could skip the permuted list and use other +mechanisms to accomplish load-balancing (or ignore the issue altogether). It +could add additional information to the filecap (like a list of which servers +received the shares) in lieu of performing a search at download time, perhaps +at the expense of allowing a repairer to move shares to a new server after +the initial upload. It might make up for this by storing "location hints" +next to each share, to indicate where other shares are likely to be found, +and obligating the repairer to update these hints. + +The second purpose of this document is to explain the format of the file +capability string (or "filecap" for short). There are multiple kinds of +capabilties (read-write, read-only, verify-only, repaircap, lease-renewal +cap, traverse-only, etc). There are multiple ways to represent the filecap +(compressed binary, human-readable, clickable-HTTP-URL, "tahoe:" URL, etc), +but they must all contain enough information to reliably retrieve a file +(given some context, of course). It must at least contain the confidentiality +and integrity information from document #1 (i.e. the encryption key and the +UEB hash). It must also contain whatever additional information the +upload-time server-selection algorithm generated that will be required by the +downloader. + +For some server-selection algorithms, the additional information will be +minimal. For example, the 1.3.0 release uses the hash of the encryption key +as a storage index, and uses the storage index to permute the server list, +and uses an Introducer to learn the current list of servers. This allows a +"close-enough" list of servers to be compressed into a filecap field that is +already required anyways (the encryption key). It also adds k and N to the +filecap, to speed up the downloader's search (the downloader knows how many +shares it needs, so it can send out multiple queries in parallel). + +But other server-selection algorithms might require more information. Each +variant of this document will explain how to encode that additional +information into the filecap, and how to extract and use that information at +download time. + +These two purposes are interrelated. A filecap that is interpreted in the +context of the allmydata.com commercial grid, which uses tahoe-1.3.0, implies +a specific peer-selection algorithm, a specific Introducer, and therefore a +fairly-specific set of servers to query for shares. A filecap which is meant +to be interpreted on a different sort of grid would need different +information. + +Some filecap formats can be designed to contain more information (and depend +less upon context), such as the way an HTTP URL implies the existence of a +single global DNS system. Ideally a tahoe filecap should be able to specify +which "grid" it lives in, with enough information to allow a compatible +implementation of Tahoe to locate that grid and retrieve the file (regardless +of which server-selection algorithm was used for upload). + +This more-universal format might come at the expense of reliability, however. +Tahoe-1.3.0 filecaps do not contain hostnames, because the failure of DNS or +an individual host might then impact file availability (however the +Introducer contains DNS names or IP addresses). + +== #4: Directory Format == + +Tahoe directories are a special way of interpreting and managing the contents +of a file (either mutable or immutable). These "dirnode" files are basically +serialized tables that map child name to filecap/dircap. This document +describes the format of these files. + +Tahoe-1.3.0 directories are "transitively readonly", which is accomplished by +applying an additional layer of encryption to the list of child writecaps. +The key for this encryption is derived from the containing file's writecap. +This document must explain how to derive this key and apply it to the +appropriate portion of the table. + +Future versions of the directory format are expected to contain +"deep-traversal caps", which allow verification/repair of files without +exposing their plaintext to the repair agent. This document wil be +responsible for explaining traversal caps too. + +Future versions of the directory format will probably contain an index and +more advanced data structures (for efficiency and fast lookups), instead of a +simple flat list of (childname, childcap). This document will also need to +describe metadata formats, including what access-control policies are defined +for the metadata.