From: Daira Hopwood Date: Thu, 5 Sep 2013 18:17:15 +0000 (+0100) Subject: Sun Jun 24 00:13:38 BST 2012 david-sarah@jacaranda.org X-Git-Url: https://git.rkrishnan.org/%5B/frontends/%22news.html/%22doc.html/reliability?a=commitdiff_plain;h=0d632529fcee2e49eede190c62571793a2f8c26a;p=tahoe-lafs%2Ftahoe-lafs.git Sun Jun 24 00:13:38 BST 2012 david-sarah@jacaranda.org * Update docs, notably performance.rst, to include MDMF. fixes #1772 --- diff --git a/docs/configuration.rst b/docs/configuration.rst index 0c6ede0b..31962d47 100644 --- a/docs/configuration.rst +++ b/docs/configuration.rst @@ -365,10 +365,13 @@ Client Configuration mutable-type parameter in the webapi. If you do not specify a value here, Tahoe-LAFS will use SDMF for all newly-created mutable files. - Note that this parameter only applies to mutable files. Mutable - directories, which are stored as mutable files, are not controlled by - this parameter and will always use SDMF. We may revisit this decision in - future versions of Tahoe-LAFS. + Note that this parameter applies only to files, not to directories. + Mutable directories, which are stored in mutable files, are not + controlled by this parameter and will always use SDMF. We may revisit + this decision in future versions of Tahoe-LAFS. + + See ``_ for details about mutable + file formats. Frontend Configuration ====================== diff --git a/docs/performance.rst b/docs/performance.rst index 1766d381..d9bd28ff 100644 --- a/docs/performance.rst +++ b/docs/performance.rst @@ -10,8 +10,8 @@ Performance costs for some common operations 6. `Inserting/Removing B bytes in an A-byte mutable file`_ 7. `Adding an entry to an A-entry directory`_ 8. `Listing an A entry directory`_ -9. `Performing a file-check on an A-byte file`_ -10. `Performing a file-verify on an A-byte file`_ +9. `Checking an A-byte file`_ +10. `Verifying an A-byte file (immutable)`_ 11. `Repairing an A-byte file (mutable or immutable)`_ ``K`` indicates the number of shares required to reconstruct the file @@ -23,7 +23,7 @@ Performance costs for some common operations ``A`` indicates the number of bytes in a file -``B`` indicates the number of bytes of a file which are being read or +``B`` indicates the number of bytes of a file that are being read or written ``G`` indicates the number of storage servers on your grid @@ -179,8 +179,8 @@ directory be downloaded from the grid. So listing an A entry directory requires downloading a (roughly) 330 * A byte mutable file, since each directory entry is about 300-330 bytes in size. -Performing a file-check on an ``A``-byte file -============================================= +Checking an ``A``-byte file +=========================== cpu: ~G @@ -193,8 +193,8 @@ about. Note that neither of these values directly depend on the size of the file. This is relatively inexpensive, compared to the verify and repair operations. -Performing a file-verify on an ``A``-byte file -============================================== +Verifying an A-byte file (immutable) +==================================== cpu: ~N/K*A @@ -204,9 +204,24 @@ memory footprint: N/K*S notes: To verify a file, Tahoe-LAFS downloads all of the ciphertext shares that were originally uploaded to the grid and integrity checks -them. This is (for well-behaved grids) more expensive than downloading -an A-byte file, since only a fraction of these shares are necessary to -recover the file. +them. This is (for grids with good redundancy) more expensive than +downloading an A-byte file, since only a fraction of these shares would +be necessary to recover the file. + +Verifying an A-byte file (mutable) +================================== + +cpu: ~N/K*A + +network: N/K*A + +memory footprint: N/K*A + +notes: To verify a file, Tahoe-LAFS downloads all of the ciphertext +shares that were originally uploaded to the grid and integrity checks +them. This is (for grids with good redundancy) more expensive than +downloading an A-byte file, since only a fraction of these shares would +be necessary to recover the file. Repairing an ``A``-byte file (mutable or immutable) =================================================== diff --git a/docs/specifications/mutable.rst b/docs/specifications/mutable.rst index 298e3a9e..ea7ac49b 100644 --- a/docs/specifications/mutable.rst +++ b/docs/specifications/mutable.rst @@ -2,8 +2,6 @@ Mutable Files ============= -This describes the "RSA-based mutable files" which were shipped in Tahoe v0.8.0. - 1. `Mutable Formats`_ 2. `Consistency vs. Availability`_ 3. `The Prime Coordination Directive: "Don't Do That"`_ @@ -19,33 +17,38 @@ This describes the "RSA-based mutable files" which were shipped in Tahoe v0.8.0. 6. `Large Distributed Mutable Files`_ 7. `TODO`_ -Mutable File Slots are places with a stable identifier that can hold data -that changes over time. In contrast to CHK slots, for which the -URI/identifier is derived from the contents themselves, the Mutable File Slot -URI remains fixed for the life of the slot, regardless of what data is placed -inside it. +Mutable files are places with a stable identifier that can hold data that +changes over time. In contrast to immutable slots, for which the +identifier/capability is derived from the contents themselves, the mutable +file identifier remains fixed for the life of the slot, regardless of what +data is placed inside it. -Each mutable slot is referenced by two different URIs. The "read-write" URI +Each mutable file is referenced by two different caps. The "read-write" cap grants read-write access to its holder, allowing them to put whatever -contents they like into the slot. The "read-only" URI is less powerful, only +contents they like into the slot. The "read-only" cap is less powerful, only granting read access, and not enabling modification of the data. The -read-write URI can be turned into the read-only URI, but not the other way +read-write cap can be turned into the read-only cap, but not the other way around. -The data in these slots is distributed over a number of servers, using the -same erasure coding that CHK files use, with 3-of-10 being a typical choice -of encoding parameters. The data is encrypted and signed in such a way that -only the holders of the read-write URI will be able to set the contents of -the slot, and only the holders of the read-only URI will be able to read -those contents. Holders of either URI will be able to validate the contents -as being written by someone with the read-write URI. The servers who hold the -shares cannot read or modify them: the worst they can do is deny service (by -deleting or corrupting the shares), or attempt a rollback attack (which can -only succeed with the cooperation of at least k servers). +The data in these files is distributed over a number of servers, using the +same erasure coding that immutable files use, with 3-of-10 being a typical +choice of encoding parameters. The data is encrypted and signed in such a way +that only the holders of the read-write cap will be able to set the contents +of the slot, and only the holders of the read-only cap will be able to read +those contents. Holders of either cap will be able to validate the contents +as being written by someone with the read-write cap. The servers who hold the +shares are not automatically given the ability read or modify them: the worst +they can do is deny service (by deleting or corrupting the shares), or +attempt a rollback attack (which can only succeed with the cooperation of at +least k servers). + Mutable Formats =============== +History +------- + When mutable files first shipped in Tahoe-0.8.0 (15-Feb-2008), the only version available was "SDMF", described below. This was a limited-functionality placeholder, intended to be replaced with @@ -75,8 +78,11 @@ SDMF a clean subset of MDMF, where any single-segment MDMF file could be handled by the old SDMF code). In the fall of 2011, Kevan's code was finally integrated, and first made available in the Tahoe-1.9.0 release. -The main improvement of MDMF is the use of multiple segments: individual -128KiB sections of the file can be retrieved or modified independently. The +SDMF vs. MDMF +------------- + +The improvement of MDMF is the use of multiple segments: individual 128-KiB +sections of the file can be retrieved or modified independently. The improvement can be seen when fetching just a portion of the file (using a Range: header on the webapi), or when modifying a portion (again with a Range: header). It can also be seen indirectly when fetching the whole file: @@ -84,12 +90,14 @@ the first segment of data should be delivered faster from a large MDMF file than from an SDMF file, although the overall download will then proceed at the same rate. -We've decided to make it opt-in for the first release while we shake out the -bugs, just in case a problem is found which requires an incompatible format -change. All new mutable files will be in SDMF format unless the user -specifically chooses to use MDMF instead. The code can read and modify -existing files of either format without user intervention. We expect to make -MDMF the default in a subsequent release, perhaps 2.0. +We've decided to make it opt-in for now: mutable files default to +SDMF format unless explicitly configured to use MDMF, either in ``tahoe.cfg`` +(see ``__) or in the WUI or CLI command that created a +new mutable file. + +The code can read and modify existing files of either format without user +intervention. We expect to make MDMF the default in a subsequent release, +perhaps 2.0. Which format should you use? SDMF works well for files up to a few MB, and can be handled by older versions (Tahoe-1.8.3 and earlier). If you do not @@ -114,8 +122,9 @@ As we develop more sophisticated mutable slots, the API may expose multiple read versions to the application layer. The tahoe philosophy is to defer most consistency recovery logic to the higher layers. Some applications have effective ways to merge multiple versions, so inconsistency is not -necessarily a problem (i.e. directory nodes can usually merge multiple "add -child" operations). +necessarily a problem (i.e. directory nodes can usually merge multiple +"add child" operations). + The Prime Coordination Directive: "Don't Do That" ================================================= @@ -697,38 +706,30 @@ Medium Distributed Mutable Files These are just like the SDMF case, but: -* we actually take advantage of the Merkle hash tree over the blocks, by +* We actually take advantage of the Merkle hash tree over the blocks, by reading a single segment of data at a time (and its necessary hashes), to - reduce the read-time alacrity -* we allow arbitrary writes to the file (i.e. seek() is provided, and - O_TRUNC is no longer required) -* we write more code on the client side (in the MutableFileNode class), to - first read each segment that a write must modify. This looks exactly like - the way a normal filesystem uses a block device, or how a CPU must perform - a cache-line fill before modifying a single word. -* we might implement some sort of copy-based atomic update server call, + reduce the read-time alacrity. +* We allow arbitrary writes to any range of the file. +* We add more code to first read each segment that a write must modify. + This looks exactly like the way a normal filesystem uses a block device, + or how a CPU must perform a cache-line fill before modifying a single word. +* We might implement some sort of copy-based atomic update server call, to allow multiple writev() calls to appear atomic to any readers. MDMF slots provide fairly efficient in-place edits of very large files (a few -GB). Appending data is also fairly efficient, although each time a power of 2 -boundary is crossed, the entire file must effectively be re-uploaded (because -the size of the block hash tree changes), so if the filesize is known in -advance, that space ought to be pre-allocated (by leaving extra space between -the block hash tree and the actual data). +GB). Appending data is also fairly efficient. -MDMF1 uses the Merkle tree to enable low-alacrity random-access reads. MDMF2 -adds cache-line reads to allow random-access writes. Large Distributed Mutable Files =============================== -LDMF slots use a fundamentally different way to store the file, inspired by -Mercurial's "revlog" format. They enable very efficient insert/remove/replace -editing of arbitrary spans. Multiple versions of the file can be retained, in -a revision graph that can have multiple heads. Each revision can be -referenced by a cryptographic identifier. There are two forms of the URI, one -that means "most recent version", and a longer one that points to a specific -revision. +LDMF slots (not implemented) would use a fundamentally different way to store +the file, inspired by Mercurial's "revlog" format. This would enable very +efficient insert/remove/replace editing of arbitrary spans. Multiple versions +of the file can be retained, in a revision graph that can have multiple heads. +Each revision can be referenced by a cryptographic identifier. There are two +forms of the URI, one that means "most recent version", and a longer one that +points to a specific revision. Metadata can be attached to the revisions, like timestamps, to enable rolling back an entire tree to a specific point in history. @@ -736,6 +737,7 @@ back an entire tree to a specific point in history. LDMF1 provides deltas but tries to avoid dealing with multiple heads. LDMF2 provides explicit support for revision identifiers and branching. + TODO ====