zfec/README.txt

   1  * Intro and Licence
   2
   3 This package implements an "erasure code", or "forward error correction code".
   4
   5 You may use this package under the GNU General Public License, version 2 or, at
   6 your option, any later version.  You may use this package under the Transitive
   7 Grace Period Public Licence, version 1.0.  (You may choose to use this package
   8 under the terms of either licence, at your option.)  See the file COPYING.GPL
   9 for the terms of the GNU General Public License, version 2.  See the file
  10 COPYING.TGPPL.html for the terms of the Transitive Grace Period Public Licence,
  11 version 1.0.  In addition, Allmydata, Inc. offers other licensing terms.  If you
  12 would like to inquire about a commercial relationship with Allmydata, Inc.,
  13 please contact partnerships@allmydata.com and visit http://allmydata.com .
  14
  15 The most widely known example of an erasure code is the RAID-5 algorithm which
  16 makes it so that in the event of the loss of any one hard drive, the stored data
  17 can be completely recovered.  The algorithm in the zfec package has a similar
  18 effect, but instead of recovering from the loss of only a single element, it can
  19 be parameterized to choose in advance the number of elements whose loss it can
  20 tolerate.
  21
  22 This package is largely based on the old "fec" library by Luigi Rizzo et al.,
  23 which is a mature and optimized implementation of erasure coding.  The zfec
  24 package makes several changes from the original "fec" package, including
  25 addition of the Python API, refactoring of the C API to support zero-copy
  26 operation, a few clean-ups and optimizations of the core code itself, and the
  27 addition of a command-line tool named "zfec".
  28
  29
  30  * Installation
  31
  32 This package is managed with the "setuptools" package management tool.  To build
  33 and install the package directly into your system, just run "python ./setup.py
  34 install".  If you prefer to keep the package limited to a specific directory so
  35 that you can manage it yourself (perhaps by using the "GNU stow") tool, then
  36 give it these arguments: "python ./setup.py install
  37 --single-version-externally-managed
  38 --record=${specificdirectory}/zfec-install.log --prefix=${specificdirectory}"
  39
  40 To run the self-tests, execute "python ./setup.py test" (or if you have Twisted
  41 Python installed, you can run "trial zfec" for nicer output and test options.)
  42 This will run the tests of the C API, the Python API, and the command-line
  43 tools.
  44
  45 To run the tests of the Haskell API:
  46   % runhaskell haskell/test/FECTest.hs
  47
  48 Note that in order to run the Haskell API tests you must have installed the
  49 library first due to the fact that the interpreter cannot process FEC.hs as it
  50 takes a reference to an FFI function.
  51
  52
  53  * Community
  54
  55 The source is currently available via darcs on the web with the command:
  56
  57 darcs get http://allmydata.org/source/zfec
  58
  59 More information on darcs is available at http://darcs.net
  60
  61 Please join the zfec mailing list and submit patches:
  62
  63 <http://allmydata.org/cgi-bin/mailman/listinfo/zfec-dev>
  64
  65
  66  * Overview
  67
  68 This package performs two operations, encoding and decoding.  Encoding takes
  69 some input data and expands its size by producing extra "check blocks", also
  70 called "secondary blocks".  Decoding takes some data -- any combination of
  71 blocks of the original data (called "primary blocks") and "secondary blocks",
  72 and produces the original data.
  73
  74 The encoding is parameterized by two integers, k and m.  m is the total number
  75 of blocks produced, and k is how many of those blocks are necessary to
  76 reconstruct the original data.  m is required to be at least 1 and at most 256,
  77 and k is required to be at least 1 and at most m.
  78
  79 (Note that when k == m then there is no point in doing erasure coding -- it
  80 degenerates to the equivalent of the Unix "split" utility which simply splits
  81 the input into successive segments.  Similarly, when k == 1 it degenerates to
  82 the equivalent of the unix "cp" utility -- each block is a complete copy of the
  83 input data.)
  84
  85 Note that each "primary block" is a segment of the original data, so its size is
  86 1/k'th of the size of original data, and each "secondary block" is of the same
  87 size, so the total space used by all the blocks is m/k times the size of the
  88 original data (plus some padding to fill out the last primary block to be the
  89 same size as all the others).  In addition to the data contained in the blocks
  90 themselves there are also a few pieces of metadata which are necessary for later
  91 reconstruction.  Those pieces are: 1.  the value of K, 2.  the value of M, 3.
  92 the sharenum of each block, 4.  the number of bytes of padding that were used.
  93 The "zfec" command-line tool compresses these pieces of data and prepends them
  94 to the beginning of each share, so each the sharefile produced by the "zfec"
  95 command-line tool is between one and four bytes larger than the share data
  96 alone.
  97
  98 The decoding step requires as input k of the blocks which were produced by the
  99 encoding step.  The decoding step produces as output the data that was earlier
 100 input to the encoding step.
 101
 102
 103  * Command-Line Tool
 104
 105 The bin/ directory contains two Unix-style, command-line tools "zfec" and
 106 "zunfec".  Execute "zfec --help" or "zunfec --help" for usage instructions.
 107
 108 Note: a Unix-style tool like "zfec" does only one thing -- in this case
 109 erasure coding -- and leaves other tasks to other tools.  Other Unix-style
 110 tools that go well with zfec include "GNU tar" for archiving multiple files
 111 and directories into one file, "rzip" or "lrzip" for compression, and "GNU
 112 Privacy Guard" for encryption or "sha256sum" for integrity.  It is important
 113 to do things in order: first archive, then compress, then either encrypt or
 114 sha256sum, then erasure code.  Note that if GNU Privacy Guard is used for
 115 privacy, then it will also ensure integrity, so the use of sha256sum is
 116 unnecessary in that case.
 117
 118
 119  * Performance Measurements
 120
 121 On my Athlon 64 2.4 GHz workstation (running Linux), the "zfec" command-line
 122 tool encoded a 160 MB file with m=100, k=94 (about 6% redundancy) in 3.9
 123 seconds, where the "par2" tool encoded the file with about 6% redundancy in 27
 124 seconds.  zfec encoded the same file with m=12, k=6 (100% redundancy) in 4.1
 125 seconds, where par2 encoded it with about 100% redundancy in 7 minutes and 56
 126 seconds.
 127
 128 The underlying C library in benchmark mode encoded from a file at about 4.9
 129 million bytes per second and decoded at about 5.8 million bytes per second.
 130
 131 On Peter's fancy Intel Mac laptop (2.16 GHz Core Duo), it encoded from a file at
 132 about 6.2 million bytes per second.
 133
 134 On my even fancier Intel Mac laptop (2.33 GHz Core Duo), it encoded from a file
 135 at about 6.8 million bytes per second.
 136
 137 On my old PowerPC G4 867 MHz Mac laptop, it encoded from a file at about 1.3
 138 million bytes per second.
 139
 140
 141  * API
 142
 143 Each block is associated with "blocknum".  The blocknum of each primary block is
 144 its index (starting from zero), so the 0'th block is the first primary block,
 145 which is the first few bytes of the file, the 1'st block is the next primary
 146 block, which is the next few bytes of the file, and so on.  The last primary
 147 block has blocknum k-1.  The blocknum of each secondary block is an arbitrary
 148 integer between k and 255 inclusive.  (When using the Python API, if you don't
 149 specify which secondary blocks you want when invoking encode(), then it will by
 150 default provide the blocks with ids from k to m-1 inclusive.)
 151
 152  ** C API
 153
 154 fec_encode() takes as input an array of k pointers, where each pointer points to
 155 a memory buffer containing the input data (i.e., the i'th buffer contains the
 156 i'th primary block).  There is also a second parameter which is an array of the
 157 blocknums of the secondary blocks which are to be produced.  (Each element in
 158 that array is required to be the blocknum of a secondary block, i.e. it is
 159 required to be >= k and < m.)
 160
 161 The output from fec_encode() is the requested set of secondary blocks which are
 162 written into output buffers provided by the caller.
 163
 164 Note that this fec_encode() is a "low-level" API in that it requires the input
 165 data to be provided in a set of memory buffers of exactly the right sizes.  If
 166 you are starting instead with a single buffer containing all of the data then
 167 please see easyfec.py's "class Encoder" as an example of how to split a single
 168 large buffer into the appropriate set of input buffers for fec_encode().  If you
 169 are starting with a file on disk, then please see filefec.py's
 170 encode_file_stringy_easyfec() for an example of how to read the data from a file
 171 and pass it to "class Encoder".  The Python interface provides these
 172 higher-level operations, as does the Haskell interface.  If you implement
 173 functions to do these higher-level tasks in other languages than Python or
 174 Haskell, then please send a patch to zfec-dev@allmydata.org so that your API can
 175 be included in future releases of zfec.
 176
 177
 178 fec_decode() takes as input an array of k pointers, where each pointer points to
 179 a buffer containing a block.  There is also a separate input parameter which is
 180 an array of blocknums, indicating the blocknum of each of the blocks which is
 181 being passed in.
 182
 183 The output from fec_decode() is the set of primary blocks which were missing
 184 from the input and had to be reconstructed.  These reconstructed blocks are
 185 written into output buffers provided by the caller.
 186
 187
 188  ** Python API
 189
 190 encode() and decode() take as input a sequence of k buffers, where a "sequence"
 191 is any object that implements the Python sequence protocol (such as a list or
 192 tuple) and a "buffer" is any object that implements the Python buffer protocol
 193 (such as a string or array).  The contents that are required to be present in
 194 these buffers are the same as for the C API.
 195
 196 encode() also takes a list of desired blocknums.  Unlike the C API, the Python
 197 API accepts blocknums of primary blocks as well as secondary blocks in its list
 198 of desired blocknums.  encode() returns a list of buffer objects which contain
 199 the blocks requested.  For each requested block which is a primary block, the
 200 resulting list contains a reference to the apppropriate primary block from the
 201 input list.  For each requested block which is a secondary block, the list
 202 contains a newly created string object containing that block.
 203
 204 decode() also takes a list of integers indicating the blocknums of the blocks
 205 being passed int.  decode() returns a list of buffer objects which contain all
 206 of the primary blocks of the original data (in order).  For each primary block
 207 which was present in the input list, then the result list simply contains a
 208 reference to the object that was passed in the input list.  For each primary
 209 block which was not present in the input, the result list contains a newly
 210 created string object containing that primary block.
 211
 212 Beware of a "gotcha" that can result from the combination of mutable data and
 213 the fact that the Python API returns references to inputs when possible.
 214
 215 Returning references to its inputs is efficient since it avoids making an
 216 unnecessary copy of the data, but if the object which was passed as input is
 217 mutable and if that object is mutated after the call to zfec returns, then the
 218 result from zfec -- which is just a reference to that same object -- will also
 219 be mutated.  This subtlety is the price you pay for avoiding data copying.  If
 220 you don't want to have to worry about this then you can simply use immutable
 221 objects (e.g. Python strings) to hold the data that you pass to zfec.
 222
 223  ** Haskell API
 224
 225 The Haskell code is fully Haddocked, to generate the documentation, run
 226   % runhaskell Setup.lhs haddock
 227
 228
 229  * Utilities
 230
 231 The filefec.py module has a utility function for efficiently reading a file and
 232 encoding it piece by piece.  This module is used by the "zfec" and "zunfec"
 233 command-line tools from the bin/ directory.
 234
 235
 236  * Dependencies
 237
 238 A C compiler is required.  To use the Python API or the command-line tools a
 239 Python interpreter is also required.  We have tested it with Python v2.4 and
 240 v2.5.  For the Haskell interface, GHC >= 6.8.1 is required.
 241
 242
 243  * Acknowledgements
 244
 245 Thanks to the author of the original fec lib, Luigi Rizzo, and the folks that
 246 contributed to it: Phil Karn, Robert Morelos-Zaragoza, Hari Thirumoorthy, and
 247 Dan Rubenstein.  Thanks to the Mnet hackers who wrote an earlier Python wrapper,
 248 especially Myers Carpenter and Hauke Johannknecht.  Thanks to Brian Warner and
 249 Amber O'Whielacronx for help with the API, documentation, debugging,
 250 compression, and unit tests.  Thanks to Adam Langley for improving the C API and
 251 contributing the Haskell API.  Thanks to the creators of GCC (starting with
 252 Richard M. Stallman) and Valgrind (starting with Julian Seward) for a pair of
 253 excellent tools.  Thanks to my coworkers at Allmydata -- http://allmydata.com --
 254 Fabrice Grinda, Peter Secor, Rob Kinninmont, Brian Warner, Zandr Milewski,
 255 Justin Boreta, Mark Meras for sponsoring this work and releasing it under a Free
 256 Software licence.
 257
 258
 259 Enjoy!
 260
 261 Zooko Wilcox-O'Hearn
 262 2008-01-20
 263 Boulder, Colorado