* Intro and Licence
-This package implements an "erasure code", or "forward error correction
-code".
+This package implements an "erasure code", or "forward error correction code".
You may use this package under the GNU General Public License, version 2 or, at
your option, any later version. You may use this package under the Transitive
COPYING.TGPPL.html for the terms of the Transitive Grace Period Public Licence,
version 1.0.
-The most widely known example of an erasure code is the RAID-5 algorithm
-which makes it so that in the event of the loss of any one hard drive, the
-stored data can be completely recovered. The algorithm in the zfec package
-has a similar effect, but instead of recovering from the loss of only a
-single element, it can be parameterized to choose in advance the number of
-elements whose loss it can tolerate.
+The most widely known example of an erasure code is the RAID-5 algorithm which
+makes it so that in the event of the loss of any one hard drive, the stored data
+can be completely recovered. The algorithm in the zfec package has a similar
+effect, but instead of recovering from the loss of only a single element, it can
+be parameterized to choose in advance the number of elements whose loss it can
+tolerate.
This package is largely based on the old "fec" library by Luigi Rizzo et al.,
which is a mature and optimized implementation of erasure coding. The zfec
* Installation
-This package is managed with the "setuptools" package management tool. To
-build and install the package directly into your system, just run "python
-./setup.py install". If you prefer to keep the package limited to a specific
-directory so that you can manage it yourself (perhaps by using the "GNU
-stow") tool, then give it these arguments: "python ./setup.py install
+This package is managed with the "setuptools" package management tool. To build
+and install the package directly into your system, just run "python ./setup.py
+install". If you prefer to keep the package limited to a specific directory so
+that you can manage it yourself (perhaps by using the "GNU stow") tool, then
+give it these arguments: "python ./setup.py install
--single-version-externally-managed
--record=${specificdirectory}/zfec-install.log --prefix=${specificdirectory}"
-To run the self-tests, execute "python ./setup.py test" (or if you have
-Twisted Python installed, you can run "trial zfec" for nicer output and test
-options.)
+To run the self-tests, execute "python ./setup.py test" (or if you have Twisted
+Python installed, you can run "trial zfec" for nicer output and test options.)
* Community
degenerates to the equivalent of the Unix "split" utility which simply splits
the input into successive segments. Similarly, when k == 1 it degenerates to
the equivalent of the unix "cp" utility -- each block is a complete copy of the
-input data. The "zfec" command-line tool does not implement these degenerate
+input data. The "zfec" command-line tool does not implement these degenerate
cases.)
-Note that each "primary block" is a segment of the original data, so its size
-is 1/k'th of the size of original data, and each "secondary block" is of the
-same size, so the total space used by all the blocks is m/k times the size of
-the original data (plus some padding to fill out the last primary block to be
-the same size as all the others). In addition to the data contained in the
-blocks themselves there are also a few pieces of metadata which are necessary
-for later reconstruction. Those pieces are: 1. the value of K, 2. the value
-of M, 3. the sharenum of each block, 4. the number of bytes of padding
-that were used. The "zfec" command-line tool compresses these pieces of data
-and prepends them to the beginning of each share, so each the sharefile
-produced by the "zfec" command-line tool is between one and four bytes larger
-than the share data alone.
+Note that each "primary block" is a segment of the original data, so its size is
+1/k'th of the size of original data, and each "secondary block" is of the same
+size, so the total space used by all the blocks is m/k times the size of the
+original data (plus some padding to fill out the last primary block to be the
+same size as all the others). In addition to the data contained in the blocks
+themselves there are also a few pieces of metadata which are necessary for later
+reconstruction. Those pieces are: 1. the value of K, 2. the value of M, 3.
+the sharenum of each block, 4. the number of bytes of padding that were used.
+The "zfec" command-line tool compresses these pieces of data and prepends them
+to the beginning of each share, so each the sharefile produced by the "zfec"
+command-line tool is between one and four bytes larger than the share data
+alone.
The decoding step requires as input k of the blocks which were produced by the
encoding step. The decoding step produces as output the data that was earlier
* Command-Line Tool
-NOTE: the format of the sharefiles was changed in zfec v1.1 to allow K == 1
-and K == M. This change of the format of sharefiles means that zfec >= v1.1
-cannot read sharefiles produced by zfec < v1.1.
+NOTE: the format of the sharefiles was changed in zfec v1.1 to allow K == 1 and
+K == M. This change of the format of sharefiles means that zfec >= v1.1 cannot
+read sharefiles produced by zfec < v1.1.
-The bin/ directory contains two Unix-style, command-line tools "zfec" and
+The bin/ directory contains two Unix-style, command-line tools "zfec" and
"zunfec". Execute "zfec --help" or "zunfec --help" for usage instructions.
-Note: a Unix-style tool like "zfec" does only one thing -- in this case
-erasure coding -- and leaves other tasks to other tools. Other Unix-style
-tools that go well with zfec include "GNU tar" for archiving multiple files
-and directories into one file, "rzip" or "lrzip" for compression, and "GNU
-Privacy Guard" for encryption or "sha256sum" for integrity. It is important
-to do things in order: first archive, then compress, then either encrypt or
-sha256sum, then erasure code. Note that if GNU Privacy Guard is used for
-privacy, then it will also ensure integrity, so the use of sha256sum is
-unnecessary in that case.
+Note: a Unix-style tool like "zfec" does only one thing -- in this case erasure
+coding -- and leaves other tasks to other tools. Other Unix-style tools that go
+well with zfec include "GNU tar" for archiving multiple files and directories
+into one file, "rzip" or "lrzip" for compression, and "GNU Privacy Guard" for
+encryption or "sha256sum" for integrity. It is important to do things in order:
+first archive, then compress, then either encrypt or sha256sum, then erasure
+code. Note that if GNU Privacy Guard is used for privacy, then it will also
+ensure integrity, so the use of sha256sum is unnecessary in that case.
* Performance Measurements
On my Athlon 64 2.4 GHz workstation (running Linux), the "zfec" command-line
tool encoded a 160 MB file with m=100, k=94 (about 6% redundancy) in 3.9
-seconds, where the "par2" tool encoded the file with about 6% redundancy in
-27 seconds. zfec encoded the same file with m=12, k=6 (100% redundancy) in
-4.1 seconds, where par2 encoded it with about 100% redundancy in 7 minutes
-and 56 seconds.
+seconds, where the "par2" tool encoded the file with about 6% redundancy in 27
+seconds. zfec encoded the same file with m=12, k=6 (100% redundancy) in 4.1
+seconds, where par2 encoded it with about 100% redundancy in 7 minutes and 56
+seconds.
-The underlying C library in benchmark mode encoded from a file at about
-4.9 million bytes per second and decoded at about 5.8 million bytes per second.
+The underlying C library in benchmark mode encoded from a file at about 4.9
+million bytes per second and decoded at about 5.8 million bytes per second.
-On Peter's fancy Intel Mac laptop (2.16 GHz Core Duo), it encoded from a file
-at about 6.2 million bytes per second.
+On Peter's fancy Intel Mac laptop (2.16 GHz Core Duo), it encoded from a file at
+about 6.2 million bytes per second.
On my even fancier Intel Mac laptop (2.33 GHz Core Duo), it encoded from a file
at about 6.8 million bytes per second.
** C API
-fec_encode() takes as input an array of k pointers, where each pointer points
-to a memory buffer containing the input data (i.e., the i'th buffer contains
-the i'th primary block). There is also a second parameter which is an array of
-the blocknums of the secondary blocks which are to be produced. (Each element
-in that array is required to be the blocknum of a secondary block, i.e. it is
+fec_encode() takes as input an array of k pointers, where each pointer points to
+a memory buffer containing the input data (i.e., the i'th buffer contains the
+i'th primary block). There is also a second parameter which is an array of the
+blocknums of the secondary blocks which are to be produced. (Each element in
+that array is required to be the blocknum of a secondary block, i.e. it is
required to be >= k and < m.)
The output from fec_encode() is the requested set of secondary blocks which are
written into output buffers provided by the caller.
-fec_decode() takes as input an array of k pointers, where each pointer points
-to a buffer containing a block. There is also a separate input parameter which
-is an array of blocknums, indicating the blocknum of each of the blocks which is
+fec_decode() takes as input an array of k pointers, where each pointer points to
+a buffer containing a block. There is also a separate input parameter which is
+an array of blocknums, indicating the blocknum of each of the blocks which is
being passed in.
The output from fec_decode() is the set of primary blocks which were missing
* Utilities
-The filefec.py module has a utility function for efficiently reading a file
-and encoding it piece by piece. This module is used by the "zfec" and
-"zunfec" command-line tools from the bin/ directory.
+The filefec.py module has a utility function for efficiently reading a file and
+encoding it piece by piece. This module is used by the "zfec" and "zunfec"
+command-line tools from the bin/ directory.
* Dependencies
Thanks to the author of the original fec lib, Luigi Rizzo, and the folks that
contributed to it: Phil Karn, Robert Morelos-Zaragoza, Hari Thirumoorthy, and
-Dan Rubenstein. Thanks to the Mnet hackers who wrote an earlier Python
-wrapper, especially Myers Carpenter and Hauke Johannknecht. Thanks to Brian
-Warner and Amber O'Whielacronx for help with the API, documentation,
-debugging, compression, and unit tests. Thanks to the creators of GCC
-(starting with Richard M. Stallman) and Valgrind (starting with Julian Seward)
-for a pair of excellent tools. Thanks to my coworkers at Allmydata --
-http://allmydata.com -- Fabrice Grinda, Peter Secor, Rob Kinninmont, Brian
-Warner, Zandr Milewski, Justin Boreta, Mark Meras for sponsoring this work and
-releasing it under a Free Software licence.
+Dan Rubenstein. Thanks to the Mnet hackers who wrote an earlier Python wrapper,
+especially Myers Carpenter and Hauke Johannknecht. Thanks to Brian Warner and
+Amber O'Whielacronx for help with the API, documentation, debugging,
+compression, and unit tests. Thanks to the creators of GCC (starting with
+Richard M. Stallman) and Valgrind (starting with Julian Seward) for a pair of
+excellent tools. Thanks to my coworkers at Allmydata -- http://allmydata.com --
+Fabrice Grinda, Peter Secor, Rob Kinninmont, Brian Warner, Zandr Milewski,
+Justin Boreta, Mark Meras for sponsoring this work and releasing it under a Free
+Software licence.
Enjoy!