From: zooko Date: Sun, 20 Jan 2008 18:39:38 +0000 (+0530) Subject: docs: README.txt: reflow to 80 cols and strip trailing whitespace X-Git-Url: https://git.rkrishnan.org/specifications/components/com_hotproperty/architecture.txt?a=commitdiff_plain;h=dda098a7b957774a085f4de7b92ef5caaa21c657;p=tahoe-lafs%2Fzfec.git docs: README.txt: reflow to 80 cols and strip trailing whitespace darcs-hash:a19d7ac2afbef62f914f36ff5c199a0892c0bd16 --- diff --git a/zfec/README.txt b/zfec/README.txt index 9bddf88..9789ddd 100644 --- a/zfec/README.txt +++ b/zfec/README.txt @@ -1,7 +1,6 @@ * Intro and Licence -This package implements an "erasure code", or "forward error correction -code". +This package implements an "erasure code", or "forward error correction code". You may use this package under the GNU General Public License, version 2 or, at your option, any later version. You may use this package under the Transitive @@ -11,12 +10,12 @@ for the terms of the GNU General Public License, version 2. See the file COPYING.TGPPL.html for the terms of the Transitive Grace Period Public Licence, version 1.0. -The most widely known example of an erasure code is the RAID-5 algorithm -which makes it so that in the event of the loss of any one hard drive, the -stored data can be completely recovered. The algorithm in the zfec package -has a similar effect, but instead of recovering from the loss of only a -single element, it can be parameterized to choose in advance the number of -elements whose loss it can tolerate. +The most widely known example of an erasure code is the RAID-5 algorithm which +makes it so that in the event of the loss of any one hard drive, the stored data +can be completely recovered. The algorithm in the zfec package has a similar +effect, but instead of recovering from the loss of only a single element, it can +be parameterized to choose in advance the number of elements whose loss it can +tolerate. This package is largely based on the old "fec" library by Luigi Rizzo et al., which is a mature and optimized implementation of erasure coding. The zfec @@ -28,17 +27,16 @@ addition of a command-line tool named "zfec". * Installation -This package is managed with the "setuptools" package management tool. To -build and install the package directly into your system, just run "python -./setup.py install". If you prefer to keep the package limited to a specific -directory so that you can manage it yourself (perhaps by using the "GNU -stow") tool, then give it these arguments: "python ./setup.py install +This package is managed with the "setuptools" package management tool. To build +and install the package directly into your system, just run "python ./setup.py +install". If you prefer to keep the package limited to a specific directory so +that you can manage it yourself (perhaps by using the "GNU stow") tool, then +give it these arguments: "python ./setup.py install --single-version-externally-managed --record=${specificdirectory}/zfec-install.log --prefix=${specificdirectory}" -To run the self-tests, execute "python ./setup.py test" (or if you have -Twisted Python installed, you can run "trial zfec" for nicer output and test -options.) +To run the self-tests, execute "python ./setup.py test" (or if you have Twisted +Python installed, you can run "trial zfec" for nicer output and test options.) * Community @@ -71,21 +69,21 @@ and k is required to be at least 1 and at most m. degenerates to the equivalent of the Unix "split" utility which simply splits the input into successive segments. Similarly, when k == 1 it degenerates to the equivalent of the unix "cp" utility -- each block is a complete copy of the -input data. The "zfec" command-line tool does not implement these degenerate +input data. The "zfec" command-line tool does not implement these degenerate cases.) -Note that each "primary block" is a segment of the original data, so its size -is 1/k'th of the size of original data, and each "secondary block" is of the -same size, so the total space used by all the blocks is m/k times the size of -the original data (plus some padding to fill out the last primary block to be -the same size as all the others). In addition to the data contained in the -blocks themselves there are also a few pieces of metadata which are necessary -for later reconstruction. Those pieces are: 1. the value of K, 2. the value -of M, 3. the sharenum of each block, 4. the number of bytes of padding -that were used. The "zfec" command-line tool compresses these pieces of data -and prepends them to the beginning of each share, so each the sharefile -produced by the "zfec" command-line tool is between one and four bytes larger -than the share data alone. +Note that each "primary block" is a segment of the original data, so its size is +1/k'th of the size of original data, and each "secondary block" is of the same +size, so the total space used by all the blocks is m/k times the size of the +original data (plus some padding to fill out the last primary block to be the +same size as all the others). In addition to the data contained in the blocks +themselves there are also a few pieces of metadata which are necessary for later +reconstruction. Those pieces are: 1. the value of K, 2. the value of M, 3. +the sharenum of each block, 4. the number of bytes of padding that were used. +The "zfec" command-line tool compresses these pieces of data and prepends them +to the beginning of each share, so each the sharefile produced by the "zfec" +command-line tool is between one and four bytes larger than the share data +alone. The decoding step requires as input k of the blocks which were produced by the encoding step. The decoding step produces as output the data that was earlier @@ -94,38 +92,37 @@ input to the encoding step. * Command-Line Tool -NOTE: the format of the sharefiles was changed in zfec v1.1 to allow K == 1 -and K == M. This change of the format of sharefiles means that zfec >= v1.1 -cannot read sharefiles produced by zfec < v1.1. +NOTE: the format of the sharefiles was changed in zfec v1.1 to allow K == 1 and +K == M. This change of the format of sharefiles means that zfec >= v1.1 cannot +read sharefiles produced by zfec < v1.1. -The bin/ directory contains two Unix-style, command-line tools "zfec" and +The bin/ directory contains two Unix-style, command-line tools "zfec" and "zunfec". Execute "zfec --help" or "zunfec --help" for usage instructions. -Note: a Unix-style tool like "zfec" does only one thing -- in this case -erasure coding -- and leaves other tasks to other tools. Other Unix-style -tools that go well with zfec include "GNU tar" for archiving multiple files -and directories into one file, "rzip" or "lrzip" for compression, and "GNU -Privacy Guard" for encryption or "sha256sum" for integrity. It is important -to do things in order: first archive, then compress, then either encrypt or -sha256sum, then erasure code. Note that if GNU Privacy Guard is used for -privacy, then it will also ensure integrity, so the use of sha256sum is -unnecessary in that case. +Note: a Unix-style tool like "zfec" does only one thing -- in this case erasure +coding -- and leaves other tasks to other tools. Other Unix-style tools that go +well with zfec include "GNU tar" for archiving multiple files and directories +into one file, "rzip" or "lrzip" for compression, and "GNU Privacy Guard" for +encryption or "sha256sum" for integrity. It is important to do things in order: +first archive, then compress, then either encrypt or sha256sum, then erasure +code. Note that if GNU Privacy Guard is used for privacy, then it will also +ensure integrity, so the use of sha256sum is unnecessary in that case. * Performance Measurements On my Athlon 64 2.4 GHz workstation (running Linux), the "zfec" command-line tool encoded a 160 MB file with m=100, k=94 (about 6% redundancy) in 3.9 -seconds, where the "par2" tool encoded the file with about 6% redundancy in -27 seconds. zfec encoded the same file with m=12, k=6 (100% redundancy) in -4.1 seconds, where par2 encoded it with about 100% redundancy in 7 minutes -and 56 seconds. +seconds, where the "par2" tool encoded the file with about 6% redundancy in 27 +seconds. zfec encoded the same file with m=12, k=6 (100% redundancy) in 4.1 +seconds, where par2 encoded it with about 100% redundancy in 7 minutes and 56 +seconds. -The underlying C library in benchmark mode encoded from a file at about -4.9 million bytes per second and decoded at about 5.8 million bytes per second. +The underlying C library in benchmark mode encoded from a file at about 4.9 +million bytes per second and decoded at about 5.8 million bytes per second. -On Peter's fancy Intel Mac laptop (2.16 GHz Core Duo), it encoded from a file -at about 6.2 million bytes per second. +On Peter's fancy Intel Mac laptop (2.16 GHz Core Duo), it encoded from a file at +about 6.2 million bytes per second. On my even fancier Intel Mac laptop (2.33 GHz Core Duo), it encoded from a file at about 6.8 million bytes per second. @@ -148,19 +145,19 @@ inclusive.) ** C API -fec_encode() takes as input an array of k pointers, where each pointer points -to a memory buffer containing the input data (i.e., the i'th buffer contains -the i'th primary block). There is also a second parameter which is an array of -the blocknums of the secondary blocks which are to be produced. (Each element -in that array is required to be the blocknum of a secondary block, i.e. it is +fec_encode() takes as input an array of k pointers, where each pointer points to +a memory buffer containing the input data (i.e., the i'th buffer contains the +i'th primary block). There is also a second parameter which is an array of the +blocknums of the secondary blocks which are to be produced. (Each element in +that array is required to be the blocknum of a secondary block, i.e. it is required to be >= k and < m.) The output from fec_encode() is the requested set of secondary blocks which are written into output buffers provided by the caller. -fec_decode() takes as input an array of k pointers, where each pointer points -to a buffer containing a block. There is also a separate input parameter which -is an array of blocknums, indicating the blocknum of each of the blocks which is +fec_decode() takes as input an array of k pointers, where each pointer points to +a buffer containing a block. There is also a separate input parameter which is +an array of blocknums, indicating the blocknum of each of the blocks which is being passed in. The output from fec_decode() is the set of primary blocks which were missing @@ -205,9 +202,9 @@ objects (e.g. Python strings) to hold the data that you pass to zfec. * Utilities -The filefec.py module has a utility function for efficiently reading a file -and encoding it piece by piece. This module is used by the "zfec" and -"zunfec" command-line tools from the bin/ directory. +The filefec.py module has a utility function for efficiently reading a file and +encoding it piece by piece. This module is used by the "zfec" and "zunfec" +command-line tools from the bin/ directory. * Dependencies @@ -221,15 +218,15 @@ v2.5. Thanks to the author of the original fec lib, Luigi Rizzo, and the folks that contributed to it: Phil Karn, Robert Morelos-Zaragoza, Hari Thirumoorthy, and -Dan Rubenstein. Thanks to the Mnet hackers who wrote an earlier Python -wrapper, especially Myers Carpenter and Hauke Johannknecht. Thanks to Brian -Warner and Amber O'Whielacronx for help with the API, documentation, -debugging, compression, and unit tests. Thanks to the creators of GCC -(starting with Richard M. Stallman) and Valgrind (starting with Julian Seward) -for a pair of excellent tools. Thanks to my coworkers at Allmydata -- -http://allmydata.com -- Fabrice Grinda, Peter Secor, Rob Kinninmont, Brian -Warner, Zandr Milewski, Justin Boreta, Mark Meras for sponsoring this work and -releasing it under a Free Software licence. +Dan Rubenstein. Thanks to the Mnet hackers who wrote an earlier Python wrapper, +especially Myers Carpenter and Hauke Johannknecht. Thanks to Brian Warner and +Amber O'Whielacronx for help with the API, documentation, debugging, +compression, and unit tests. Thanks to the creators of GCC (starting with +Richard M. Stallman) and Valgrind (starting with Julian Seward) for a pair of +excellent tools. Thanks to my coworkers at Allmydata -- http://allmydata.com -- +Fabrice Grinda, Peter Secor, Rob Kinninmont, Brian Warner, Zandr Milewski, +Justin Boreta, Mark Meras for sponsoring this work and releasing it under a Free +Software licence. Enjoy!