docs/proposed/mutable-DSA.txt

   1
   2 (protocol proposal, work-in-progress, not authoritative)
   3
   4 (this document describes DSA-based mutable files, as opposed to the RSA-based
   5 mutable files that were introduced in tahoe-0.7.0 . This proposal has not yet
   6 been implemented. Please see mutable-DSA.svg for a quick picture of the
   7 crypto scheme described herein)
   8
   9 This file shows only the differences from RSA-based mutable files to
  10 (EC)DSA-based mutable files.  You have to read and understand
  11 docs/specifications/mutable.rst before reading this file (mutable-DSA.txt).
  12
  13 == new design criteria ==
  14
  15 * provide for variable number of semiprivate sections?
  16 * put e.g. filenames in one section, readcaps in another, writecaps in a third
  17   (ideally, to modify a filename you'd only have to modify one section, and
  18   we'd make encrypting/hashing more efficient by doing it on larger blocks of
  19   data, preferably one segment at a time instead of one writecap at a time)
  20 * cleanly distinguish between "container" (leases, write-enabler) and
  21   "slot contents" (everything that comes from share encoding)
  22 * sign all slot bits (to allow server-side verification)
  23 * someone reading the whole file should be able to read the share in a single
  24   linear pass with just a single seek to zero
  25 * writing the file should occur in two passes (3 seeks) in mostly linear order
  26   1: write version/pubkey/topbits/salt
  27   2: write zeros / seek+prefill where the hashchain/tree goes
  28   3: write blocks
  29   4: seek back
  30   5: write hashchain/tree
  31 * storage format: consider putting container bits in a separate file
  32   - $SI.index (contains list of shnums, leases, other-cabal-members, WE, etc)
  33   - $SI-$shnum.share (actual share data)
  34 * possible layout:
  35     - version
  36     - pubkey
  37     - topbits (k, N, size, segsize, etc)
  38     - salt? (salt tree root?)
  39     - share hash root
  40     - share hash chain
  41     - block hash tree
  42     - (salts?) (salt tree?)
  43     - blocks
  44     - signature (of [version .. share hash root])
  45
  46 === SDMF slots overview ===
  47
  48 Each SDMF slot is created with a DSA public/private key pair, using a
  49 system-wide common modulus and generator, in which the private key is a
  50 random 256 bit number, and the public key is a larger value (about 2048 bits)
  51 that can be derived with a bit of math from the private key. The public key
  52 is known as the "verification key", while the private key is called the
  53 "signature key".
  54
  55 The 256-bit signature key is used verbatim as the "write capability". This
  56 can be converted into the 2048ish-bit verification key through a fairly cheap
  57 set of modular exponentiation operations; this is done any time the holder of
  58 the write-cap wants to read the data. (Note that the signature key can either
  59 be a newly-generated random value, or the hash of something else, if we found
  60 a need for a capability that's stronger than the write-cap).
  61
  62 This results in a write-cap which is 256 bits long and can thus be expressed
  63 in an ASCII/transport-safe encoded form (base62 encoding, fits in 72
  64 characters, including a local-node http: convenience prefix).
  65
  66 The private key is hashed to form a 256-bit "salt". The public key is also
  67 hashed to form a 256-bit "pubkey hash". These two values are concatenated,
  68 hashed, and truncated to 192 bits to form the first 192 bits of the read-cap.
  69 The pubkey hash is hashed by itself and truncated to 64 bits to form the last
  70 64 bits of the read-cap. The full read-cap is 256 bits long, just like the
  71 write-cap.
  72
  73 The first 192 bits of the read-cap are hashed and truncated to form the first
  74 192 bits of the "traversal cap". The last 64 bits of the read-cap are hashed
  75 to form the last 64 bits of the traversal cap. This gives us a 256-bit
  76 traversal cap.
  77
  78 The first 192 bits of the traversal-cap are hashed and truncated to form the
  79 first 64 bits of the storage index. The last 64 bits of the traversal-cap are
  80 hashed to form the last 64 bits of the storage index. This gives us a 128-bit
  81 storage index.
  82
  83 The verification-cap is the first 64 bits of the storage index plus the
  84 pubkey hash, 320 bits total. The verification-cap doesn't need to be
  85 expressed in a printable transport-safe form, so it's ok that it's longer.
  86
  87 The read-cap is hashed one way to form an AES encryption key that is used to
  88 encrypt the salt; this key is called the "salt key". The encrypted salt is
  89 stored in the share. The private key never changes, therefore the salt never
  90 changes, and the salt key is only used for a single purpose, so there is no
  91 need for an IV.
  92
  93 The read-cap is hashed a different way to form the master data encryption
  94 key. A random "data salt" is generated each time the share's contents are
  95 replaced, and the master data encryption key is concatenated with the data
  96 salt, then hashed, to form the AES CTR-mode "read key" that will be used to
  97 encrypt the actual file data. This is to avoid key-reuse. An outstanding
  98 issue is how to avoid key reuse when files are modified in place instead of
  99 being replaced completely; this is not done in SDMF but might occur in MDMF.
 100
 101 The master data encryption key is used to encrypt data that should be visible
 102 to holders of a write-cap or a read-cap, but not to holders of a
 103 traversal-cap.
 104
 105 The private key is hashed one way to form the salt, and a different way to
 106 form the "write enabler master". For each storage server on which a share is
 107 kept, the write enabler master is concatenated with the server's nodeid and
 108 hashed, and the result is called the "write enabler" for that particular
 109 server. Note that multiple shares of the same slot stored on the same server
 110 will all get the same write enabler, i.e. the write enabler is associated
 111 with the "bucket", rather than the individual shares.
 112
 113 The private key is hashed a third way to form the "data write key", which can
 114 be used by applications which wish to store some data in a form that is only
 115 available to those with a write-cap, and not to those with merely a read-cap.
 116 This is used to implement transitive read-onlyness of dirnodes.
 117
 118 The traversal cap is hashed to work the "traversal key", which can be used by
 119 applications that wish to store data in a form that is available to holders
 120 of a write-cap, read-cap, or traversal-cap.
 121
 122 The idea is that dirnodes will store child write-caps under the writekey,
 123 child names and read-caps under the read-key, and verify-caps (for files) or
 124 deep-verify-caps (for directories) under the traversal key. This would give
 125 the holder of a root deep-verify-cap the ability to create a verify manifest
 126 for everything reachable from the root, but not the ability to see any
 127 plaintext or filenames. This would make it easier to delegate filechecking
 128 and repair to a not-fully-trusted agent.
 129
 130 The public key is stored on the servers, as is the encrypted salt, the
 131 (non-encrypted) data salt, the encrypted data, and a signature. The container
 132 records the write-enabler, but of course this is not visible to readers. To
 133 make sure that every byte of the share can be verified by a holder of the
 134 verify-cap (and also by the storage server itself), the signature covers the
 135 version number, the sequence number, the root hash "R" of the share merkle
 136 tree, the encoding parameters, and the encrypted salt. "R" itself covers the
 137 hash trees and the share data.
 138
 139 The read-write URI is just the private key. The read-only URI is the read-cap
 140 key. The deep-verify URI is the traversal-cap. The verify-only URI contains
 141 the the pubkey hash and the first 64 bits of the storage index.
 142
 143  FMW:b2a(privatekey)
 144  FMR:b2a(readcap)
 145  FMT:b2a(traversalcap)
 146  FMV:b2a(storageindex[:64])b2a(pubkey-hash)
 147
 148 Note that this allows the read-only, deep-verify, and verify-only URIs to be
 149 derived from the read-write URI without actually retrieving any data from the
 150 share, but instead by regenerating the public key from the private one. Users
 151 of the read-only, deep-verify, or verify-only caps must validate the public
 152 key against their pubkey hash (or its derivative) the first time they
 153 retrieve the pubkey, before trusting any signatures they see.
 154
 155 The SDMF slot is allocated by sending a request to the storage server with a
 156 desired size, the storage index, and the write enabler for that server's
 157 nodeid. If granted, the write enabler is stashed inside the slot's backing
 158 store file. All further write requests must be accompanied by the write
 159 enabler or they will not be honored. The storage server does not share the
 160 write enabler with anyone else.
 161
 162 The SDMF slot structure will be described in more detail below. The important
 163 pieces are:
 164
 165   * a sequence number
 166   * a root hash "R"
 167   * the data salt
 168   * the encoding parameters (including k, N, file size, segment size)
 169   * a signed copy of [seqnum,R,data_salt,encoding_params] (using signature key)
 170   * the verification key (not encrypted)
 171   * the share hash chain (part of a Merkle tree over the share hashes)
 172   * the block hash tree (Merkle tree over blocks of share data)
 173   * the share data itself (erasure-coding of read-key-encrypted file data)
 174   * the salt, encrypted with the salt key
 175
 176 The access pattern for read (assuming we hold the write-cap) is:
 177  * generate public key from the private one
 178  * hash private key to get the salt, hash public key, form read-cap
 179  * form storage-index
 180  * use storage-index to locate 'k' shares with identical 'R' values
 181    * either get one share, read 'k' from it, then read k-1 shares
 182    * or read, say, 5 shares, discover k, either get more or be finished
 183    * or copy k into the URIs
 184  * .. jump to "COMMON READ", below
 185
 186 To read (assuming we only hold the read-cap), do:
 187  * hash read-cap pieces to generate storage index and salt key
 188  * use storage-index to locate 'k' shares with identical 'R' values
 189  * retrieve verification key and encrypted salt
 190  * decrypt salt
 191  * hash decrypted salt and pubkey to generate another copy of the read-cap,
 192    make sure they match (this validates the pubkey)
 193  * .. jump to "COMMON READ"
 194
 195  * COMMON READ:
 196  * read seqnum, R, data salt, encoding parameters, signature
 197  * verify signature against verification key
 198  * hash data salt and read-cap to generate read-key
 199  * read share data, compute block-hash Merkle tree and root "r"
 200  * read share hash chain (leading from "r" to "R")
 201  * validate share hash chain up to the root "R"
 202  * submit share data to erasure decoding
 203  * decrypt decoded data with read-key
 204  * submit plaintext to application
 205
 206 The access pattern for write is:
 207  * generate pubkey, salt, read-cap, storage-index as in read case
 208  * generate data salt for this update, generate read-key
 209  * encrypt plaintext from application with read-key
 210    * application can encrypt some data with the data-write-key to make it
 211      only available to writers (used for transitively-readonly dirnodes)
 212  * erasure-code crypttext to form shares
 213  * split shares into blocks
 214  * compute Merkle tree of blocks, giving root "r" for each share
 215  * compute Merkle tree of shares, find root "R" for the file as a whole
 216  * create share data structures, one per server:
 217    * use seqnum which is one higher than the old version
 218    * share hash chain has log(N) hashes, different for each server
 219    * signed data is the same for each server
 220    * include pubkey, encrypted salt, data salt
 221  * now we have N shares and need homes for them
 222  * walk through peers
 223    * if share is not already present, allocate-and-set
 224    * otherwise, try to modify existing share:
 225    * send testv_and_writev operation to each one
 226    * testv says to accept share if their(seqnum+R) <= our(seqnum+R)
 227    * count how many servers wind up with which versions (histogram over R)
 228    * keep going until N servers have the same version, or we run out of servers
 229      * if any servers wound up with a different version, report error to
 230        application
 231      * if we ran out of servers, initiate recovery process (described below)
 232
 233 ==== Cryptographic Properties ====
 234
 235 This scheme protects the data's confidentiality with 192 bits of key
 236 material, since the read-cap contains 192 secret bits (derived from an
 237 encrypted salt, which is encrypted using those same 192 bits plus some
 238 additional public material).
 239
 240 The integrity of the data (assuming that the signature is valid) is protected
 241 by the 256-bit hash which gets included in the signature. The privilege of
 242 modifying the data (equivalent to the ability to form a valid signature) is
 243 protected by a 256 bit random DSA private key, and the difficulty of
 244 computing a discrete logarithm in a 2048-bit field.
 245
 246 There are a few weaker denial-of-service attacks possible. If N-k+1 of the
 247 shares are damaged or unavailable, the client will be unable to recover the
 248 file. Any coalition of more than N-k shareholders will be able to effect this
 249 attack by merely refusing to provide the desired share. The "write enabler"
 250 shared secret protects existing shares from being displaced by new ones,
 251 except by the holder of the write-cap. One server cannot affect the other
 252 shares of the same file, once those other shares are in place.
 253
 254 The worst DoS attack is the "roadblock attack", which must be made before
 255 those shares get placed. Storage indexes are effectively random (being
 256 derived from the hash of a random value), so they are not guessable before
 257 the writer begins their upload, but there is a window of vulnerability during
 258 the beginning of the upload, when some servers have heard about the storage
 259 index but not all of them.
 260
 261 The roadblock attack we want to prevent is when the first server that the
 262 uploader contacts quickly runs to all the other selected servers and places a
 263 bogus share under the same storage index, before the uploader can contact
 264 them. These shares will normally be accepted, since storage servers create
 265 new shares on demand. The bogus shares would have randomly-generated
 266 write-enablers, which will of course be different than the real uploader's
 267 write-enabler, since the malicious server does not know the write-cap.
 268
 269 If this attack were successful, the uploader would be unable to place any of
 270 their shares, because the slots have already been filled by the bogus shares.
 271 The uploader would probably try for peers further and further away from the
 272 desired location, but eventually they will hit a preconfigured distance limit
 273 and give up. In addition, the further the writer searches, the less likely it
 274 is that a reader will search as far. So a successful attack will either cause
 275 the file to be uploaded but not be reachable, or it will cause the upload to
 276 fail.
 277
 278 If the uploader tries again (creating a new privkey), they may get lucky and
 279 the malicious servers will appear later in the query list, giving sufficient
 280 honest servers a chance to see their share before the malicious one manages
 281 to place bogus ones.
 282
 283 The first line of defense against this attack is the timing challenges: the
 284 attacking server must be ready to act the moment a storage request arrives
 285 (which will only occur for a certain percentage of all new-file uploads), and
 286 only has a few seconds to act before the other servers will have allocated
 287 the shares (and recorded the write-enabler, terminating the window of
 288 vulnerability).
 289
 290 The second line of defense is post-verification, and is possible because the
 291 storage index is partially derived from the public key hash. A storage server
 292 can, at any time, verify every public bit of the container as being signed by
 293 the verification key (this operation is recommended as a continual background
 294 process, when disk usage is minimal, to detect disk errors). The server can
 295 also hash the verification key to derive 64 bits of the storage index. If it
 296 detects that these 64 bits do not match (but the rest of the share validates
 297 correctly), then the implication is that this share was stored to the wrong
 298 storage index, either due to a bug or a roadblock attack.
 299
 300 If an uploader finds that they are unable to place their shares because of
 301 "bad write enabler errors" (as reported by the prospective storage servers),
 302 it can "cry foul", and ask the storage server to perform this verification on
 303 the share in question. If the pubkey and storage index do not match, the
 304 storage server can delete the bogus share, thus allowing the real uploader to
 305 place their share. Of course the origin of the offending bogus share should
 306 be logged and reported to a central authority, so corrective measures can be
 307 taken. It may be necessary to have this "cry foul" protocol include the new
 308 write-enabler, to close the window during which the malicious server can
 309 re-submit the bogus share during the adjudication process.
 310
 311 If the problem persists, the servers can be placed into pre-verification
 312 mode, in which this verification is performed on all potential shares before
 313 being committed to disk. This mode is more CPU-intensive (since normally the
 314 storage server ignores the contents of the container altogether), but would
 315 solve the problem completely.
 316
 317 The mere existence of these potential defenses should be sufficient to deter
 318 any actual attacks. Note that the storage index only has 64 bits of
 319 pubkey-derived data in it, which is below the usual crypto guidelines for
 320 security factors. In this case it's a pre-image attack which would be needed,
 321 rather than a collision, and the actual attack would be to find a keypair for
 322 which the public key can be hashed three times to produce the desired portion
 323 of the storage index. We believe that 64 bits of material is sufficiently
 324 resistant to this form of pre-image attack to serve as a suitable deterrent.
 325
 326 === SMDF Slot Format ===
 327
 328 This SMDF data lives inside a server-side MutableSlot container. The server
 329 is generally oblivious to this format, but it may look inside the container
 330 when verification is desired.
 331
 332 This data is tightly packed. There are no gaps left between the different
 333 fields, and the offset table is mainly present to allow future flexibility of
 334 key sizes.
 335
 336  #   offset   size    name
 337  1    0        1       version byte, \x01 for this format
 338  2    1        8       sequence number. 2^64-1 must be handled specially, TBD
 339  3    9        32      "R" (root of share hash Merkle tree)
 340  4    41       32      data salt (readkey is H(readcap+data_salt))
 341  5    73       32      encrypted salt (AESenc(key=H(readcap), salt)
 342  6    105       18      encoding parameters:
 343        105      1        k
 344        106      1        N
 345        107      8        segment size
 346        115      8        data length (of original plaintext)
 347  7    123      36      offset table:
 348        127      4        (9) signature
 349        131      4        (10) share hash chain
 350        135      4        (11) block hash tree
 351        139      4        (12) share data
 352        143      8        (13) EOF
 353  8    151      256     verification key (2048bit DSA key)
 354  9    407      40      signature=DSAsig(H([1,2,3,4,5,6]))
 355 10    447      (a)     share hash chain, encoded as:
 356                         "".join([pack(">H32s", shnum, hash)
 357                                  for (shnum,hash) in needed_hashes])
 358 11    ??       (b)     block hash tree, encoded as:
 359                         "".join([pack(">32s",hash) for hash in block_hash_tree])
 360 12    ??       LEN     share data
 361 13    ??       --      EOF
 362
 363 (a) The share hash chain contains ceil(log(N)) hashes, each 32 bytes long.
 364     This is the set of hashes necessary to validate this share's leaf in the
 365     share Merkle tree. For N=10, this is 4 hashes, i.e. 128 bytes.
 366 (b) The block hash tree contains ceil(length/segsize) hashes, each 32 bytes
 367     long. This is the set of hashes necessary to validate any given block of
 368     share data up to the per-share root "r". Each "r" is a leaf of the share
 369     has tree (with root "R"), from which a minimal subset of hashes is put in
 370     the share hash chain in (8).
 371
 372 == TODO ==
 373
 374 Every node in a given tahoe grid must have the same common DSA moduli and
 375 exponent, but different grids could use different parameters. We haven't
 376 figured out how to define a "grid id" yet, but I think the DSA parameters
 377 should be part of that identifier. In practical terms, this might mean that
 378 the Introducer tells each node what parameters to use, or perhaps the node
 379 could have a config file which specifies them instead.
 380
 381 The shares MUST have a ciphertext hash of some sort (probably a merkle tree
 382 over the blocks, and/or a flat hash of the ciphertext), just like immutable
 383 files do. Without this, a malicious publisher could produce some shares that
 384 result in file A, and other shares that result in file B, and upload both of
 385 them (incorporating both into the share hash tree). The result would be a
 386 read-cap that would sometimes resolve to file A, and sometimes to file B,
 387 depending upon which servers were used for the download. By including a
 388 ciphertext hash in the SDMF data structure, the publisher must commit to just
 389 a single ciphertext, closing this hole. See ticket #492 for more details.
 390