From: Brian Warner Date: Tue, 3 Jun 2008 02:58:27 +0000 (-0700) Subject: docs/file-encoding.txt: move this over from the wiki X-Git-Tag: allmydata-tahoe-1.1.0~59 X-Git-Url: https://git.rkrishnan.org/listings/reliability?a=commitdiff_plain;h=f9fe63fd7ab1997ffec2be35f8c44ce56a129a0a;p=tahoe-lafs%2Ftahoe-lafs.git docs/file-encoding.txt: move this over from the wiki --- diff --git a/docs/file-encoding.txt b/docs/file-encoding.txt new file mode 100644 index 00000000..4b0572d5 --- /dev/null +++ b/docs/file-encoding.txt @@ -0,0 +1,148 @@ + +== FileEncoding == + +When the client wishes to upload an immutable file, the first step is to +decide upon an encryption key. There are two methods: convergent or random. +The goal of the convergent-key method is to make sure that multiple uploads +of the same file will result in only one copy on the grid, whereas the +random-key method does not provide this "convergence" feature. + +The convergent-key method computes the SHA-256d hash of a single-purpose tag, +the encoding parameters, a "convergence secret", and the contents of the +file. It uses a portion of the resulting hash as the AES encryption key. +There are security concerns with using convergence this approach (the +"partial-information guessing attack", please see ticket #365 for some +references), so Tahoe uses a separate (randomly-generated) "convergence +secret" for each node, stored in NODEDIR/private/convergence . The encoding +parameters (k, N, and the segment size) are included in the hash to make sure +that two different encodings of the same file will get different keys. This +method requires an extra IO pass over the file, to compute this key, and +encryption cannot be started until the pass is complete. This means that the +convergent-key method will require at least two total passes over the file. + +The random-key method simply chooses a random encryption key. Convergence is +disabled, however this method does not require a separate IO pass, so upload +can be done with a single pass. This mode makes it easier to perform +streaming upload. + +Regardless of which method is used to generate the key, the plaintext file is +encrypted (using AES in CTR mode) to produce a ciphertext. This ciphertext is +then erasure-coded and uploaded to the servers. Two hashes of the ciphertext +are generated as the encryption proceeds: a flat hash of the whole +ciphertext, and a Merkle tree. These are used to verify the correctness of +the erasure decoding step, and can be used by a "verifier" process to make +sure the file is intact without requiring the decryption key. + +The encryption key is hashed (with SHA-256d and a single-purpose tag) to +produce the "Storage Index". This Storage Index (or SI) is used to identify +the shares produced by the method described below. The grid can be thought of +as a large table that maps Storage Index to a ciphertext. Since the +ciphertext is stored as erasure-coded shares, it can also be thought of as a +table that maps SI to shares. + +Anybody who knows a Storage Index can retrieve the associated ciphertext: +ciphertexts are not secret. + + +[[Image(file-encoding1.png)]] + +The ciphertext file is then broken up into segments. The last segment is +likely to be shorter than the rest. Each segment is erasure-coded into a +number of "subshares". This takes place one segment at a time. (In fact, +encryption and erasure-coding take place at the same time, once per plaintext +segment). Larger segment sizes result in less overhead overall, but increase +both the memory footprint and the "alacrity" (the number of bytes we have to +receive before we can deliver validated plaintext to the user). The current +default segment size is 128KiB. + +One subshare from each segment is sent to each shareholder (aka leaseholder, +aka landlord, aka storage node, aka peer). The "share" held by each remote +shareholder is nominally just a collection of these subshares. The file will +be recoverable when a certain number of shares have been retrieved. + +[[Image(file-encoding2.png)]] + +The subshares are hashed as they are generated and transmitted. These +subshare hashes are put into a Merkle hash tree. When the last share has been +created, the merkle tree is completed and delivered to the peer. Later, when +we retrieve these subshares, the peer will send many of the merkle hash tree +nodes ahead of time, so we can validate each subshare independently. + +The root of this subshare hash tree is called the "subshare root hash" and +used in the next step. + +[[Image(file-encoding3.png)]] + +There is a higher-level Merkle tree called the "share hash tree". Its leaves +are the subshare root hashes from each share. The root of this tree is called +the "share root hash" and is included in the "URI Extension Block", aka UEB. +The ciphertext hash and Merkle tree are also put here, along with the +original file size, and the encoding parameters. The UEB contains all the +non-secret values that could be put in the URI, but would have made the URI +too big. So instead, the UEB is stored with the share, and the hash of the +UEB is put in the URI. + +The URI then contains the secret encryption key and the UEB hash. It also +contains the basic encoding parameters (k and N) and the file size, to make +download more efficient (by knowing the number of required shares ahead of +time, sufficient download queries can be generated in parallel). + +The URI (also known as the immutable-file read-cap, since possessing it +grants the holder the capability to read the file's plaintext) is then +represented as a (relatively) short printable string like so: + + URI:CHK:auxet66ynq55naiy2ay7cgrshm:6rudoctmbxsmbg7gwtjlimd6umtwrrsxkjzthuldsmo4nnfoc6fa:3:10:1000000 + +[[Image(file-encoding4.png)]] + +During download, when a peer begins to transmit a share, it first transmits +all of the parts of the share hash tree that are necessary to validate its +subshare root hash. Then it transmits the portions of the subshare hash tree +that are necessary to validate the first subshare. Then it transmits the +first subshare. It then continues this loop: transmitting any portions of the +subshare hash tree to validate subshare#N, then sending subshare#N. + +[[Image(file-encoding5.png)]] + +So the "share" that is sent to the remote peer actually consists of three +pieces, sent in a specific order as they become available, and retrieved +during download in a different order according to when they are needed. + +The first piece is the subshares themselves, one per segment. The last +subshare will likely be shorter than the rest, because the last segment is +probably shorter than the rest. The second piece is the subshare hash tree, +consisting of a total of two SHA-1 hashes per subshare. The third piece is a +hash chain from the share hash tree, consisting of log2(numshares) hashes. + +During upload, all subshares are sent first, followed by the subshare hash +tree, followed by the share hash chain. During download, the share hash chain +is delivered first, followed by the subshare root hash. The client then uses +the hash chain to validate the subshare root hash. Then the peer delivers +enough of the subshare hash tree to validate the first subshare, followed by +the first subshare itself. The subshare hash chain is used to validate the +subshare, then it is passed (along with the first subshare from several other +peers) into decoding, to produce the first segment of crypttext, which is +then decrypted to produce the first segment of plaintext, which is finally +delivered to the user. + +[[Image(file-encoding6.png)]] + +== Hashes == + +All hashes use SHA-256d, as defined in Practical Cryptography (by Ferguson +and Schneier). All hashes use a single-purpose tag, e.g. the hash that +converts an encryption key into a storage index is defined as follows: + + SI = SHA256d(netstring("allmydata_immutable_key_to_storage_index_v1") + key) + +When two separate values need to be combined together in a hash, we wrap each +in a netstring. + +Using SHA-256d (instead of plain SHA-256) guards against length-extension +attacks. Using the tag protects our Merkle trees against attacks in which the +hash of a leaf is confused with a hash of two children (allowing an attacker +to generate corrupted data that nevertheless appears to be valid), and is +simply good "cryptograhic hygiene". The "Chosen Protocol Attack" by Kelsey, +Schneier, and Wagner (http://www.schneier.com/paper-chosen-protocol.html) is +relevant. Putting the tag in a netstring guards against attacks that seek to +confuse the end of the tag with the beginning of the subsequent value.