From: Brian Warner <warner@lothar.com>
Date: Mon, 9 Feb 2009 00:47:48 +0000 (-0700)
Subject: docs/specifications: add an outline of the spec documents we'd like to have some day
X-Git-Tag: allmydata-tahoe-1.3.0~63
X-Git-Url: https://git.rkrishnan.org/provisioning?a=commitdiff_plain;h=bbef104315956b558ea82a782682b37ef2f14df2;p=tahoe-lafs%2Ftahoe-lafs.git

docs/specifications: add an outline of the spec documents we'd like to have some day
---

diff --git a/docs/specifications/outline.txt b/docs/specifications/outline.txt
new file mode 100644
index 00000000..204878e4
--- /dev/null
+++ b/docs/specifications/outline.txt
@@ -0,0 +1,210 @@
+= Specification Document Outline =
+
+While we do not yet have a clear set of specification documents for Tahoe
+(explaining the file formats, so that others can write interoperable
+implementations), this document is intended to lay out an outline for what
+these specs ought to contain. Think of this as the ISO 7-Layer Model for
+Tahoe.
+
+We currently imagine 4 documents.
+
+== #1: Share Format, Encoding Algorithm ==
+
+This document will describe the way that files are encrypted and encoded into
+shares. It will include a specification of the share format, and explain both
+the encoding and decoding algorithms. It will cover both mutable and
+immutable files.
+
+The immutable encoding algorithm, as described by this document, will start
+with a plaintext series of bytes, encoding parameters "k" and "N", and either
+an encryption key or a mechanism for deterministically deriving the key from
+the plaintext (the CHK specification). The algorithm will end with a set of N
+shares, and a set of values that must be included in the filecap to provide
+confidentiality (the encryption key) and integrity (the UEB hash).
+
+The immutable decoding algorithm will start with the filecap values (key and
+UEB hash) and "k" shares. It will explain how to validate the shares against
+the integrity information, how to reverse the erasure-coding, and how to
+decrypt the resulting ciphertext. It will result in the original plaintext
+bytes (or some subrange thereof).
+
+The sections on mutable files will contain similar information.
+
+This document is *not* responsible for explaining the filecap format, since
+full filecaps may need to contain additional information as described in
+document #3. Likewise it it not responsible for explaining where to put the
+generated shares or where to find them again later.
+
+It is also not responsible for explaining the access control mechanisms
+surrounding share upload, download, or modification ("Accounting" is the
+business of controlling share upload to conserve space, and mutable file
+shares require some sort of access control to prevent non-writecap holders
+from destroying shares). We don't yet have a document dedicated to explaining
+these, but let's call it "Access Control" for now.
+
+
+== #2: Share Exchange Protocol ==
+
+This document explains the wire-protocol used to upload, download, and modify
+shares on the various storage servers.
+
+Given the N shares created by the algorithm described in document #1, and a
+set of servers who are willing to accept those shares, the protocols in this
+document will be sufficient to get the shares onto the servers. Likewise,
+given a set of servers who hold at least k shares, these protocols will be
+enough to retrieve the shares necessary to begin the decoding process
+described in document #1. The notion of a "storage index" is used to
+reference a particular share: the storage index is generated by the encoding
+process described in document #1.
+
+This document does *not* describe how to identify or choose those servers,
+rather it explains what to do once they have been selected (by the mechanisms
+in document #3).
+
+This document also explains the protocols that a client uses to ask a server
+whether or not it is willing to accept an uploaded share, and whether it has
+a share available for download. These protocols will be used by the
+mechanisms in document #3 to help decide where the shares should be placed.
+
+Where cryptographic mechanisms are necessary to implement access-control
+policy, this document will explain those mechanisms.
+
+In the future, Tahoe will be able to use multiple protocols to speak to
+storage servers. There will be alternative forms of this document, one for
+each protocol. The first one to be written will describe the Foolscap-based
+protocol that tahoe currently uses, but we anticipate a subsequent one to
+describe a more HTTP-based protocol.
+
+== #3: Server Selection Algorithm, filecap format ==
+
+This document has two interrelated purposes. With a deeper understanding of
+the issues, we may be able to separate these more cleanly in the future.
+
+The first purpose is to explain the server selection algorithm. Given a set
+of N shares, where should those shares be uploaded? Given some information
+stored about a previously-uploaded file, how should a downloader locate and
+recover at least k shares? Given a previously-uploaded mutable file, how
+should a modifier locate all (or most of) the shares with a reasonable amount
+of work?
+
+This question implies many things, all of which should be explained in this
+document:
+
+ * the notion of a "grid", nominally a set of servers who could potentially
+   hold shares, which might change over time
+ * a way to configure which grid should be used
+ * a way to discover which servers are a part of that grid
+ * a way to decide which servers are reliable enough to be worth sending
+   shares
+ * an algorithm to handle servers which refuse shares
+ * a way for a downloader to locate which servers have shares
+ * a way to choose which shares should be used for download
+
+The server-selection algorithm has several obviously competing goals:
+
+ * minimize the amount of work that must be done during upload
+ * minimize the total storage resources used
+ * avoid "hot spots", balance load among multiple servers
+ * maximize the chance that enough shares will be downloadable later, by
+   uploading lots of shares, and by placing them on reliable servers
+ * minimize the work that the future downloader must do
+ * tolerate temporary server failures, permanent server departure, and new
+   server insertions
+ * minimize the amount of information that must be added to the filecap
+
+The server-selection algorithm is defined in some context: some set of
+expectations about the servers or grid with which it is expected to operate.
+Different algorithms are appropriate for different situtations, so there will
+be multiple alternatives of this document.
+
+The first version of this document will describe the algorithm that the
+current (1.3.0) release uses, which is heavily weighted towards the two main
+use case scenarios for which Tahoe has been designed: the small, stable
+friendnet, and the allmydata.com managed grid. In both cases, we assume that
+the storage servers are online most of the time, they are uniformly highly
+reliable, and that the set of servers does not change very rapidly. The
+server-selection algorithm for this environment uses a permuted server list
+to achieve load-balancing, uses all servers identically, and derives the
+permutation key from the storage index to avoid adding a new field to the
+filecap.
+
+An alternative algorithm could give clients more precise control over share
+placement, for example by a user who wished to make sure that k+1 shares are
+located in each datacenter (to allow downloads to take place using only local
+bandwidth). This algorithm could skip the permuted list and use other
+mechanisms to accomplish load-balancing (or ignore the issue altogether). It
+could add additional information to the filecap (like a list of which servers
+received the shares) in lieu of performing a search at download time, perhaps
+at the expense of allowing a repairer to move shares to a new server after
+the initial upload. It might make up for this by storing "location hints"
+next to each share, to indicate where other shares are likely to be found,
+and obligating the repairer to update these hints.
+
+The second purpose of this document is to explain the format of the file
+capability string (or "filecap" for short). There are multiple kinds of
+capabilties (read-write, read-only, verify-only, repaircap, lease-renewal
+cap, traverse-only, etc). There are multiple ways to represent the filecap
+(compressed binary, human-readable, clickable-HTTP-URL, "tahoe:" URL, etc),
+but they must all contain enough information to reliably retrieve a file
+(given some context, of course). It must at least contain the confidentiality
+and integrity information from document #1 (i.e. the encryption key and the
+UEB hash). It must also contain whatever additional information the
+upload-time server-selection algorithm generated that will be required by the
+downloader.
+
+For some server-selection algorithms, the additional information will be
+minimal. For example, the 1.3.0 release uses the hash of the encryption key
+as a storage index, and uses the storage index to permute the server list,
+and uses an Introducer to learn the current list of servers. This allows a
+"close-enough" list of servers to be compressed into a filecap field that is
+already required anyways (the encryption key). It also adds k and N to the
+filecap, to speed up the downloader's search (the downloader knows how many
+shares it needs, so it can send out multiple queries in parallel).
+
+But other server-selection algorithms might require more information. Each
+variant of this document will explain how to encode that additional
+information into the filecap, and how to extract and use that information at
+download time.
+
+These two purposes are interrelated. A filecap that is interpreted in the
+context of the allmydata.com commercial grid, which uses tahoe-1.3.0, implies
+a specific peer-selection algorithm, a specific Introducer, and therefore a
+fairly-specific set of servers to query for shares. A filecap which is meant
+to be interpreted on a different sort of grid would need different
+information.
+
+Some filecap formats can be designed to contain more information (and depend
+less upon context), such as the way an HTTP URL implies the existence of a
+single global DNS system. Ideally a tahoe filecap should be able to specify
+which "grid" it lives in, with enough information to allow a compatible
+implementation of Tahoe to locate that grid and retrieve the file (regardless
+of which server-selection algorithm was used for upload).
+
+This more-universal format might come at the expense of reliability, however.
+Tahoe-1.3.0 filecaps do not contain hostnames, because the failure of DNS or
+an individual host might then impact file availability (however the
+Introducer contains DNS names or IP addresses).
+
+== #4: Directory Format ==
+
+Tahoe directories are a special way of interpreting and managing the contents
+of a file (either mutable or immutable). These "dirnode" files are basically
+serialized tables that map child name to filecap/dircap. This document
+describes the format of these files.
+
+Tahoe-1.3.0 directories are "transitively readonly", which is accomplished by
+applying an additional layer of encryption to the list of child writecaps.
+The key for this encryption is derived from the containing file's writecap.
+This document must explain how to derive this key and apply it to the
+appropriate portion of the table.
+
+Future versions of the directory format are expected to contain
+"deep-traversal caps", which allow verification/repair of files without
+exposing their plaintext to the repair agent. This document wil be
+responsible for explaining traversal caps too.
+
+Future versions of the directory format will probably contain an index and
+more advanced data structures (for efficiency and fast lookups), instead of a
+simple flat list of (childname, childcap). This document will also need to
+describe metadata formats, including what access-control policies are defined
+for the metadata.