docs/specifications/file-encoding.rst

   1 .. -*- coding: utf-8-with-signature -*-
   2
   3 =============
   4 File Encoding
   5 =============
   6
   7 When the client wishes to upload an immutable file, the first step is to
   8 decide upon an encryption key. There are two methods: convergent or random.
   9 The goal of the convergent-key method is to make sure that multiple uploads
  10 of the same file will result in only one copy on the grid, whereas the
  11 random-key method does not provide this "convergence" feature.
  12
  13 The convergent-key method computes the SHA-256d hash of a single-purpose tag,
  14 the encoding parameters, a "convergence secret", and the contents of the
  15 file. It uses a portion of the resulting hash as the AES encryption key.
  16 There are security concerns with using convergence this approach (the
  17 "partial-information guessing attack", please see ticket #365 for some
  18 references), so Tahoe uses a separate (randomly-generated) "convergence
  19 secret" for each node, stored in NODEDIR/private/convergence . The encoding
  20 parameters (k, N, and the segment size) are included in the hash to make sure
  21 that two different encodings of the same file will get different keys. This
  22 method requires an extra IO pass over the file, to compute this key, and
  23 encryption cannot be started until the pass is complete. This means that the
  24 convergent-key method will require at least two total passes over the file.
  25
  26 The random-key method simply chooses a random encryption key. Convergence is
  27 disabled, however this method does not require a separate IO pass, so upload
  28 can be done with a single pass. This mode makes it easier to perform
  29 streaming upload.
  30
  31 Regardless of which method is used to generate the key, the plaintext file is
  32 encrypted (using AES in CTR mode) to produce a ciphertext. This ciphertext is
  33 then erasure-coded and uploaded to the servers. Two hashes of the ciphertext
  34 are generated as the encryption proceeds: a flat hash of the whole
  35 ciphertext, and a Merkle tree. These are used to verify the correctness of
  36 the erasure decoding step, and can be used by a "verifier" process to make
  37 sure the file is intact without requiring the decryption key.
  38
  39 The encryption key is hashed (with SHA-256d and a single-purpose tag) to
  40 produce the "Storage Index". This Storage Index (or SI) is used to identify
  41 the shares produced by the method described below. The grid can be thought of
  42 as a large table that maps Storage Index to a ciphertext. Since the
  43 ciphertext is stored as erasure-coded shares, it can also be thought of as a
  44 table that maps SI to shares.
  45
  46 Anybody who knows a Storage Index can retrieve the associated ciphertext:
  47 ciphertexts are not secret.
  48
  49 .. image:: file-encoding1.svg
  50
  51 The ciphertext file is then broken up into segments. The last segment is
  52 likely to be shorter than the rest. Each segment is erasure-coded into a
  53 number of "blocks". This takes place one segment at a time. (In fact,
  54 encryption and erasure-coding take place at the same time, once per plaintext
  55 segment). Larger segment sizes result in less overhead overall, but increase
  56 both the memory footprint and the "alacrity" (the number of bytes we have to
  57 receive before we can deliver validated plaintext to the user). The current
  58 default segment size is 128KiB.
  59
  60 One block from each segment is sent to each shareholder (aka leaseholder,
  61 aka landlord, aka storage node, aka peer). The "share" held by each remote
  62 shareholder is nominally just a collection of these blocks. The file will
  63 be recoverable when a certain number of shares have been retrieved.
  64
  65 .. image:: file-encoding2.svg
  66
  67 The blocks are hashed as they are generated and transmitted. These
  68 block hashes are put into a Merkle hash tree. When the last share has been
  69 created, the merkle tree is completed and delivered to the peer. Later, when
  70 we retrieve these blocks, the peer will send many of the merkle hash tree
  71 nodes ahead of time, so we can validate each block independently.
  72
  73 The root of this block hash tree is called the "block root hash" and
  74 used in the next step.
  75
  76 .. image:: file-encoding3.svg
  77
  78 There is a higher-level Merkle tree called the "share hash tree". Its leaves
  79 are the block root hashes from each share. The root of this tree is called
  80 the "share root hash" and is included in the "URI Extension Block", aka UEB.
  81 The ciphertext hash and Merkle tree are also put here, along with the
  82 original file size, and the encoding parameters. The UEB contains all the
  83 non-secret values that could be put in the URI, but would have made the URI
  84 too big. So instead, the UEB is stored with the share, and the hash of the
  85 UEB is put in the URI.
  86
  87 The URI then contains the secret encryption key and the UEB hash. It also
  88 contains the basic encoding parameters (k and N) and the file size, to make
  89 download more efficient (by knowing the number of required shares ahead of
  90 time, sufficient download queries can be generated in parallel).
  91
  92 The URI (also known as the immutable-file read-cap, since possessing it
  93 grants the holder the capability to read the file's plaintext) is then
  94 represented as a (relatively) short printable string like so::
  95
  96  URI:CHK:auxet66ynq55naiy2ay7cgrshm:6rudoctmbxsmbg7gwtjlimd6umtwrrsxkjzthuldsmo4nnfoc6fa:3:10:1000000
  97
  98 .. image:: file-encoding4.svg
  99
 100 During download, when a peer begins to transmit a share, it first transmits
 101 all of the parts of the share hash tree that are necessary to validate its
 102 block root hash. Then it transmits the portions of the block hash tree
 103 that are necessary to validate the first block. Then it transmits the
 104 first block. It then continues this loop: transmitting any portions of the
 105 block hash tree to validate block#N, then sending block#N.
 106
 107 .. image:: file-encoding5.svg
 108
 109 So the "share" that is sent to the remote peer actually consists of three
 110 pieces, sent in a specific order as they become available, and retrieved
 111 during download in a different order according to when they are needed.
 112
 113 The first piece is the blocks themselves, one per segment. The last
 114 block will likely be shorter than the rest, because the last segment is
 115 probably shorter than the rest. The second piece is the block hash tree,
 116 consisting of a total of two SHA-1 hashes per block. The third piece is a
 117 hash chain from the share hash tree, consisting of log2(numshares) hashes.
 118
 119 During upload, all blocks are sent first, followed by the block hash
 120 tree, followed by the share hash chain. During download, the share hash chain
 121 is delivered first, followed by the block root hash. The client then uses
 122 the hash chain to validate the block root hash. Then the peer delivers
 123 enough of the block hash tree to validate the first block, followed by
 124 the first block itself. The block hash chain is used to validate the
 125 block, then it is passed (along with the first block from several other
 126 peers) into decoding, to produce the first segment of crypttext, which is
 127 then decrypted to produce the first segment of plaintext, which is finally
 128 delivered to the user.
 129
 130 .. image:: file-encoding6.svg
 131
 132
 133 Hashes
 134 ======
 135
 136 All hashes use SHA-256d, as defined in Practical Cryptography (by Ferguson
 137 and Schneier). All hashes use a single-purpose tag, e.g. the hash that
 138 converts an encryption key into a storage index is defined as follows::
 139
 140  SI = SHA256d(netstring("allmydata_immutable_key_to_storage_index_v1") + key)
 141
 142 When two separate values need to be combined together in a hash, we wrap each
 143 in a netstring.
 144
 145 Using SHA-256d (instead of plain SHA-256) guards against length-extension
 146 attacks. Using the tag protects our Merkle trees against attacks in which the
 147 hash of a leaf is confused with a hash of two children (allowing an attacker
 148 to generate corrupted data that nevertheless appears to be valid), and is
 149 simply good "cryptograhic hygiene". The `“Chosen Protocol Attack” by Kelsey,
 150 Schneier, and Wagner`_ is relevant. Putting the tag in a netstring guards
 151 against attacks that seek to confuse the end of the tag with the beginning of
 152 the subsequent value.
 153
 154 .. _“Chosen Protocol Attack” by Kelsey, Schneier, and Wagner: http://www.schneier.com/paper-chosen-protocol.html