docs/specifications/outline.rst

   1 .. -*- coding: utf-8-with-signature -*-
   2
   3 ==============================
   4 Specification Document Outline
   5 ==============================
   6
   7 While we do not yet have a clear set of specification documents for Tahoe
   8 (explaining the file formats, so that others can write interoperable
   9 implementations), this document is intended to lay out an outline for what
  10 these specs ought to contain. Think of this as the ISO 7-Layer Model for
  11 Tahoe.
  12
  13 We currently imagine 4 documents.
  14
  15 1.  `#1: Share Format, Encoding Algorithm`_
  16 2.  `#2: Share Exchange Protocol`_
  17 3.  `#3: Server Selection Algorithm, filecap format`_
  18 4.  `#4: Directory Format`_
  19
  20 #1: Share Format, Encoding Algorithm
  21 ====================================
  22
  23 This document will describe the way that files are encrypted and encoded into
  24 shares. It will include a specification of the share format, and explain both
  25 the encoding and decoding algorithms. It will cover both mutable and
  26 immutable files.
  27
  28 The immutable encoding algorithm, as described by this document, will start
  29 with a plaintext series of bytes, encoding parameters "k" and "N", and either
  30 an encryption key or a mechanism for deterministically deriving the key from
  31 the plaintext (the CHK specification). The algorithm will end with a set of N
  32 shares, and a set of values that must be included in the filecap to provide
  33 confidentiality (the encryption key) and integrity (the UEB hash).
  34
  35 The immutable decoding algorithm will start with the filecap values (key and
  36 UEB hash) and "k" shares. It will explain how to validate the shares against
  37 the integrity information, how to reverse the erasure-coding, and how to
  38 decrypt the resulting ciphertext. It will result in the original plaintext
  39 bytes (or some subrange thereof).
  40
  41 The sections on mutable files will contain similar information.
  42
  43 This document is *not* responsible for explaining the filecap format, since
  44 full filecaps may need to contain additional information as described in
  45 document #3. Likewise it it not responsible for explaining where to put the
  46 generated shares or where to find them again later.
  47
  48 It is also not responsible for explaining the access control mechanisms
  49 surrounding share upload, download, or modification ("Accounting" is the
  50 business of controlling share upload to conserve space, and mutable file
  51 shares require some sort of access control to prevent non-writecap holders
  52 from destroying shares). We don't yet have a document dedicated to explaining
  53 these, but let's call it "Access Control" for now.
  54
  55
  56 #2: Share Exchange Protocol
  57 ===========================
  58
  59 This document explains the wire-protocol used to upload, download, and modify
  60 shares on the various storage servers.
  61
  62 Given the N shares created by the algorithm described in document #1, and a
  63 set of servers who are willing to accept those shares, the protocols in this
  64 document will be sufficient to get the shares onto the servers. Likewise,
  65 given a set of servers who hold at least k shares, these protocols will be
  66 enough to retrieve the shares necessary to begin the decoding process
  67 described in document #1. The notion of a "storage index" is used to
  68 reference a particular share: the storage index is generated by the encoding
  69 process described in document #1.
  70
  71 This document does *not* describe how to identify or choose those servers,
  72 rather it explains what to do once they have been selected (by the mechanisms
  73 in document #3).
  74
  75 This document also explains the protocols that a client uses to ask a server
  76 whether or not it is willing to accept an uploaded share, and whether it has
  77 a share available for download. These protocols will be used by the
  78 mechanisms in document #3 to help decide where the shares should be placed.
  79
  80 Where cryptographic mechanisms are necessary to implement access-control
  81 policy, this document will explain those mechanisms.
  82
  83 In the future, Tahoe will be able to use multiple protocols to speak to
  84 storage servers. There will be alternative forms of this document, one for
  85 each protocol. The first one to be written will describe the Foolscap-based
  86 protocol that tahoe currently uses, but we anticipate a subsequent one to
  87 describe a more HTTP-based protocol.
  88
  89 #3: Server Selection Algorithm, filecap format
  90 ==============================================
  91
  92 This document has two interrelated purposes. With a deeper understanding of
  93 the issues, we may be able to separate these more cleanly in the future.
  94
  95 The first purpose is to explain the server selection algorithm. Given a set
  96 of N shares, where should those shares be uploaded? Given some information
  97 stored about a previously-uploaded file, how should a downloader locate and
  98 recover at least k shares? Given a previously-uploaded mutable file, how
  99 should a modifier locate all (or most of) the shares with a reasonable amount
 100 of work?
 101
 102 This question implies many things, all of which should be explained in this
 103 document:
 104
 105 * the notion of a "grid", nominally a set of servers who could potentially
 106   hold shares, which might change over time
 107 * a way to configure which grid should be used
 108 * a way to discover which servers are a part of that grid
 109 * a way to decide which servers are reliable enough to be worth sending
 110   shares
 111 * an algorithm to handle servers which refuse shares
 112 * a way for a downloader to locate which servers have shares
 113 * a way to choose which shares should be used for download
 114
 115 The server-selection algorithm has several obviously competing goals:
 116
 117 * minimize the amount of work that must be done during upload
 118 * minimize the total storage resources used
 119 * avoid "hot spots", balance load among multiple servers
 120 * maximize the chance that enough shares will be downloadable later, by
 121   uploading lots of shares, and by placing them on reliable servers
 122 * minimize the work that the future downloader must do
 123 * tolerate temporary server failures, permanent server departure, and new
 124   server insertions
 125 * minimize the amount of information that must be added to the filecap
 126
 127 The server-selection algorithm is defined in some context: some set of
 128 expectations about the servers or grid with which it is expected to operate.
 129 Different algorithms are appropriate for different situtations, so there will
 130 be multiple alternatives of this document.
 131
 132 The first version of this document will describe the algorithm that the
 133 current (1.3.0) release uses, which is heavily weighted towards the two main
 134 use case scenarios for which Tahoe has been designed: the small, stable
 135 friendnet, and the allmydata.com managed grid. In both cases, we assume that
 136 the storage servers are online most of the time, they are uniformly highly
 137 reliable, and that the set of servers does not change very rapidly. The
 138 server-selection algorithm for this environment uses a permuted server list
 139 to achieve load-balancing, uses all servers identically, and derives the
 140 permutation key from the storage index to avoid adding a new field to the
 141 filecap.
 142
 143 An alternative algorithm could give clients more precise control over share
 144 placement, for example by a user who wished to make sure that k+1 shares are
 145 located in each datacenter (to allow downloads to take place using only local
 146 bandwidth). This algorithm could skip the permuted list and use other
 147 mechanisms to accomplish load-balancing (or ignore the issue altogether). It
 148 could add additional information to the filecap (like a list of which servers
 149 received the shares) in lieu of performing a search at download time, perhaps
 150 at the expense of allowing a repairer to move shares to a new server after
 151 the initial upload. It might make up for this by storing "location hints"
 152 next to each share, to indicate where other shares are likely to be found,
 153 and obligating the repairer to update these hints.
 154
 155 The second purpose of this document is to explain the format of the file
 156 capability string (or "filecap" for short). There are multiple kinds of
 157 capabilties (read-write, read-only, verify-only, repaircap, lease-renewal
 158 cap, traverse-only, etc). There are multiple ways to represent the filecap
 159 (compressed binary, human-readable, clickable-HTTP-URL, "tahoe:" URL, etc),
 160 but they must all contain enough information to reliably retrieve a file
 161 (given some context, of course). It must at least contain the confidentiality
 162 and integrity information from document #1 (i.e. the encryption key and the
 163 UEB hash). It must also contain whatever additional information the
 164 upload-time server-selection algorithm generated that will be required by the
 165 downloader.
 166
 167 For some server-selection algorithms, the additional information will be
 168 minimal. For example, the 1.3.0 release uses the hash of the encryption key
 169 as a storage index, and uses the storage index to permute the server list,
 170 and uses an Introducer to learn the current list of servers. This allows a
 171 "close-enough" list of servers to be compressed into a filecap field that is
 172 already required anyways (the encryption key). It also adds k and N to the
 173 filecap, to speed up the downloader's search (the downloader knows how many
 174 shares it needs, so it can send out multiple queries in parallel).
 175
 176 But other server-selection algorithms might require more information. Each
 177 variant of this document will explain how to encode that additional
 178 information into the filecap, and how to extract and use that information at
 179 download time.
 180
 181 These two purposes are interrelated. A filecap that is interpreted in the
 182 context of the allmydata.com commercial grid, which uses tahoe-1.3.0, implies
 183 a specific peer-selection algorithm, a specific Introducer, and therefore a
 184 fairly-specific set of servers to query for shares. A filecap which is meant
 185 to be interpreted on a different sort of grid would need different
 186 information.
 187
 188 Some filecap formats can be designed to contain more information (and depend
 189 less upon context), such as the way an HTTP URL implies the existence of a
 190 single global DNS system. Ideally a tahoe filecap should be able to specify
 191 which "grid" it lives in, with enough information to allow a compatible
 192 implementation of Tahoe to locate that grid and retrieve the file (regardless
 193 of which server-selection algorithm was used for upload).
 194
 195 This more-universal format might come at the expense of reliability, however.
 196 Tahoe-1.3.0 filecaps do not contain hostnames, because the failure of DNS or
 197 an individual host might then impact file availability (however the
 198 Introducer contains DNS names or IP addresses).
 199
 200 #4: Directory Format
 201 ====================
 202
 203 Tahoe directories are a special way of interpreting and managing the contents
 204 of a file (either mutable or immutable). These "dirnode" files are basically
 205 serialized tables that map child name to filecap/dircap. This document
 206 describes the format of these files.
 207
 208 Tahoe-1.3.0 directories are "transitively readonly", which is accomplished by
 209 applying an additional layer of encryption to the list of child writecaps.
 210 The key for this encryption is derived from the containing file's writecap.
 211 This document must explain how to derive this key and apply it to the
 212 appropriate portion of the table.
 213
 214 Future versions of the directory format are expected to contain
 215 "deep-traversal caps", which allow verification/repair of files without
 216 exposing their plaintext to the repair agent. This document wil be
 217 responsible for explaining traversal caps too.
 218
 219 Future versions of the directory format will probably contain an index and
 220 more advanced data structures (for efficiency and fast lookups), instead of a
 221 simple flat list of (childname, childcap). This document will also need to
 222 describe metadata formats, including what access-control policies are defined
 223 for the metadata.