README.rst

   1
   2
   3 zfec -- efficient, portable erasure coding tool
   4 ===============================================
   5
   6 Generate redundant blocks of information such that if some of the blocks are
   7 lost then the original data can be recovered from the remaining blocks. This
   8 package includes command-line tools, C API, Python API, and Haskell API.
   9
  10
  11 Intro and Licence
  12 -----------------
  13
  14 This package implements an "erasure code", or "forward error correction
  15 code".
  16
  17 You may use this package under the GNU General Public License, version 2 or,
  18 at your option, any later version.  You may use this package under the
  19 Transitive Grace Period Public Licence, version 1.0 or, at your option, any
  20 later version.  (You may choose to use this package under the terms of either
  21 licence, at your option.)  See the file COPYING.GPL for the terms of the GNU
  22 General Public License, version 2.  See the file COPYING.TGPPL.rst for the
  23 terms of the Transitive Grace Period Public Licence, version 1.0.
  24
  25 The most widely known example of an erasure code is the RAID-5 algorithm
  26 which makes it so that in the event of the loss of any one hard drive, the
  27 stored data can be completely recovered.  The algorithm in the zfec package
  28 has a similar effect, but instead of recovering from the loss of only a
  29 single element, it can be parameterized to choose in advance the number of
  30 elements whose loss it can tolerate.
  31
  32 This package is largely based on the old "fec" library by Luigi Rizzo et al.,
  33 which is a mature and optimized implementation of erasure coding.  The zfec
  34 package makes several changes from the original "fec" package, including
  35 addition of the Python API, refactoring of the C API to support zero-copy
  36 operation, a few clean-ups and optimizations of the core code itself, and the
  37 addition of a command-line tool named "zfec".
  38
  39
  40 Installation
  41 ------------
  42
  43 This package is managed with the "setuptools" package management tool.  To
  44 build and install the package directly into your system, just run ``python
  45 ./setup.py install``.  If you prefer to keep the package limited to a
  46 specific directory so that you can manage it yourself (perhaps by using the
  47 "GNU stow") tool, then give it these arguments: ``python ./setup.py install
  48 --single-version-externally-managed
  49 --record=${specificdirectory}/zfec-install.log
  50 --prefix=${specificdirectory}``
  51
  52 To run the self-tests, execute ``python ./setup.py test`` (or if you have
  53 Twisted Python installed, you can run ``trial zfec`` for nicer output and
  54 test options.)  This will run the tests of the C API, the Python API, and the
  55 command-line tools.
  56
  57 To run the tests of the Haskell API: ``runhaskell haskell/test/FECTest.hs``
  58
  59 Note that in order to run the Haskell API tests you must have installed the
  60 library first due to the fact that the interpreter cannot process FEC.hs as
  61 it takes a reference to an FFI function.
  62
  63
  64 Community
  65 ---------
  66
  67 The source is currently available via darcs on the web with the command:
  68
  69 darcs get https://tahoe-lafs.org/source/zfec/trunk
  70
  71 More information on darcs is available at http://darcs.net
  72
  73 Please post about zfec to the Tahoe-LAFS mailing list and contribute patches:
  74
  75 <https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev>
  76
  77
  78 Overview
  79 --------
  80
  81 This package performs two operations, encoding and decoding.  Encoding takes
  82 some input data and expands its size by producing extra "check blocks", also
  83 called "secondary blocks".  Decoding takes some data -- any combination of
  84 blocks of the original data (called "primary blocks") and "secondary blocks",
  85 and produces the original data.
  86
  87 The encoding is parameterized by two integers, k and m.  m is the total
  88 number of blocks produced, and k is how many of those blocks are necessary to
  89 reconstruct the original data.  m is required to be at least 1 and at most
  90 256, and k is required to be at least 1 and at most m.
  91
  92 (Note that when k == m then there is no point in doing erasure coding -- it
  93 degenerates to the equivalent of the Unix "split" utility which simply splits
  94 the input into successive segments.  Similarly, when k == 1 it degenerates to
  95 the equivalent of the unix "cp" utility -- each block is a complete copy of
  96 the input data.)
  97
  98 Note that each "primary block" is a segment of the original data, so its size
  99 is 1/k'th of the size of original data, and each "secondary block" is of the
 100 same size, so the total space used by all the blocks is m/k times the size of
 101 the original data (plus some padding to fill out the last primary block to be
 102 the same size as all the others).  In addition to the data contained in the
 103 blocks themselves there are also a few pieces of metadata which are necessary
 104 for later reconstruction.  Those pieces are: 1.  the value of K, 2.  the
 105 value of M, 3.  the sharenum of each block, 4.  the number of bytes of
 106 padding that were used.  The "zfec" command-line tool compresses these pieces
 107 of data and prepends them to the beginning of each share, so each the
 108 sharefile produced by the "zfec" command-line tool is between one and four
 109 bytes larger than the share data alone.
 110
 111 The decoding step requires as input k of the blocks which were produced by
 112 the encoding step.  The decoding step produces as output the data that was
 113 earlier input to the encoding step.
 114
 115
 116 Command-Line Tool
 117 -----------------
 118
 119 The bin/ directory contains two Unix-style, command-line tools "zfec" and
 120 "zunfec".  Execute ``zfec --help`` or ``zunfec --help`` for usage
 121 instructions.
 122
 123
 124 Performance
 125 -----------
 126
 127 To run the benchmarks, execute the included bench/bench_zfec.py script with
 128 optional --k= and --m= arguments.
 129
 130 On my Athlon 64 2.4 GHz workstation (running Linux), the "zfec" command-line
 131 tool encoded a 160 MB file with m=100, k=94 (about 6% redundancy) in 3.9
 132 seconds, where the "par2" tool encoded the file with about 6% redundancy in
 133 27 seconds.  zfec encoded the same file with m=12, k=6 (100% redundancy) in
 134 4.1 seconds, where par2 encoded it with about 100% redundancy in 7 minutes
 135 and 56 seconds.
 136
 137 The underlying C library in benchmark mode encoded from a file at about 4.9
 138 million bytes per second and decoded at about 5.8 million bytes per second.
 139
 140 On Peter's fancy Intel Mac laptop (2.16 GHz Core Duo), it encoded from a file
 141 at about 6.2 million bytes per second.
 142
 143 On my even fancier Intel Mac laptop (2.33 GHz Core Duo), it encoded from a
 144 file at about 6.8 million bytes per second.
 145
 146 On my old PowerPC G4 867 MHz Mac laptop, it encoded from a file at about 1.3
 147 million bytes per second.
 148
 149 Here is a paper analyzing the performance of various erasure codes and their
 150 implementations, including zfec:
 151
 152 http://www.usenix.org/events/fast09/tech/full_papers/plank/plank.pdf
 153
 154 Zfec shows good performance on different machines and with different values
 155 of K and M. It also has a nice small memory footprint.
 156
 157
 158 API
 159 ---
 160
 161 Each block is associated with "blocknum".  The blocknum of each primary block
 162 is its index (starting from zero), so the 0'th block is the first primary
 163 block, which is the first few bytes of the file, the 1'st block is the next
 164 primary block, which is the next few bytes of the file, and so on.  The last
 165 primary block has blocknum k-1.  The blocknum of each secondary block is an
 166 arbitrary integer between k and 255 inclusive.  (When using the Python API,
 167 if you don't specify which secondary blocks you want when invoking encode(),
 168 then it will by default provide the blocks with ids from k to m-1 inclusive.)
 169
 170 - C API
 171
 172   fec_encode() takes as input an array of k pointers, where each pointer
 173   points to a memory buffer containing the input data (i.e., the i'th buffer
 174   contains the i'th primary block).  There is also a second parameter which
 175   is an array of the blocknums of the secondary blocks which are to be
 176   produced.  (Each element in that array is required to be the blocknum of a
 177   secondary block, i.e. it is required to be >= k and < m.)
 178
 179   The output from fec_encode() is the requested set of secondary blocks which
 180   are written into output buffers provided by the caller.
 181
 182   Note that this fec_encode() is a "low-level" API in that it requires the
 183   input data to be provided in a set of memory buffers of exactly the right
 184   sizes.  If you are starting instead with a single buffer containing all of
 185   the data then please see easyfec.py's "class Encoder" as an example of how
 186   to split a single large buffer into the appropriate set of input buffers
 187   for fec_encode().  If you are starting with a file on disk, then please see
 188   filefec.py's encode_file_stringy_easyfec() for an example of how to read
 189   the data from a file and pass it to "class Encoder".  The Python interface
 190   provides these higher-level operations, as does the Haskell interface.  If
 191   you implement functions to do these higher-level tasks in other languages,
 192   please send a patch to tahoe-dev@tahoe-lafs.org so that your API can be
 193   included in future releases of zfec.
 194
 195   fec_decode() takes as input an array of k pointers, where each pointer
 196   points to a buffer containing a block.  There is also a separate input
 197   parameter which is an array of blocknums, indicating the blocknum of each
 198   of the blocks which is being passed in.
 199
 200   The output from fec_decode() is the set of primary blocks which were
 201   missing from the input and had to be reconstructed.  These reconstructed
 202   blocks are written into output buffers provided by the caller.
 203
 204
 205 - Python API
 206
 207   encode() and decode() take as input a sequence of k buffers, where a
 208   "sequence" is any object that implements the Python sequence protocol (such
 209   as a list or tuple) and a "buffer" is any object that implements the Python
 210   buffer protocol (such as a string or array).  The contents that are
 211   required to be present in these buffers are the same as for the C API.
 212
 213   encode() also takes a list of desired blocknums.  Unlike the C API, the
 214   Python API accepts blocknums of primary blocks as well as secondary blocks
 215   in its list of desired blocknums.  encode() returns a list of buffer
 216   objects which contain the blocks requested.  For each requested block which
 217   is a primary block, the resulting list contains a reference to the
 218   apppropriate primary block from the input list.  For each requested block
 219   which is a secondary block, the list contains a newly created string object
 220   containing that block.
 221
 222   decode() also takes a list of integers indicating the blocknums of the
 223   blocks being passed int.  decode() returns a list of buffer objects which
 224   contain all of the primary blocks of the original data (in order).  For
 225   each primary block which was present in the input list, then the result
 226   list simply contains a reference to the object that was passed in the input
 227   list.  For each primary block which was not present in the input, the
 228   result list contains a newly created string object containing that primary
 229   block.
 230
 231   Beware of a "gotcha" that can result from the combination of mutable data
 232   and the fact that the Python API returns references to inputs when
 233   possible.
 234
 235   Returning references to its inputs is efficient since it avoids making an
 236   unnecessary copy of the data, but if the object which was passed as input
 237   is mutable and if that object is mutated after the call to zfec returns,
 238   then the result from zfec -- which is just a reference to that same object
 239   -- will also be mutated.  This subtlety is the price you pay for avoiding
 240   data copying.  If you don't want to have to worry about this then you can
 241   simply use immutable objects (e.g. Python strings) to hold the data that
 242   you pass to zfec.
 243
 244 - Haskell API
 245
 246   The Haskell code is fully Haddocked, to generate the documentation, run
 247   ``runhaskell Setup.lhs haddock``.
 248
 249
 250 Utilities
 251 ---------
 252
 253 The filefec.py module has a utility function for efficiently reading a file
 254 and encoding it piece by piece.  This module is used by the "zfec" and
 255 "zunfec" command-line tools from the bin/ directory.
 256
 257
 258 Dependencies
 259 ------------
 260
 261 A C compiler is required.  To use the Python API or the command-line tools a
 262 Python interpreter is also required.  We have tested it with Python v2.4,
 263 v2.5, v2.6, and v2.7.  For the Haskell interface, GHC >= 6.8.1 is required.
 264
 265
 266 Acknowledgements
 267 ----------------
 268
 269 Thanks to the author of the original fec lib, Luigi Rizzo, and the folks that
 270 contributed to it: Phil Karn, Robert Morelos-Zaragoza, Hari Thirumoorthy, and
 271 Dan Rubenstein.  Thanks to the Mnet hackers who wrote an earlier Python
 272 wrapper, especially Myers Carpenter and Hauke Johannknecht.  Thanks to Brian
 273 Warner and Amber O'Whielacronx for help with the API, documentation,
 274 debugging, compression, and unit tests.  Thanks to Adam Langley for improving
 275 the C API and contributing the Haskell API.  Thanks to the creators of GCC
 276 (starting with Richard M. Stallman) and Valgrind (starting with Julian
 277 Seward) for a pair of excellent tools.  Thanks to my coworkers at Allmydata
 278 -- http://allmydata.com -- Fabrice Grinda, Peter Secor, Rob Kinninmont, Brian
 279 Warner, Zandr Milewski, Justin Boreta, Mark Meras for sponsoring this work
 280 and releasing it under a Free Software licence. Thanks to Jack Lloyd, Samuel
 281 Neves, and David-Sarah Hopwood.
 282
 283
 284 Related Works
 285 -------------
 286
 287 Note: a Unix-style tool like "zfec" does only one thing -- in this case
 288 erasure coding -- and leaves other tasks to other tools.  Other Unix-style
 289 tools that go well with zfec include `GNU tar`_ for archiving multiple files
 290 and directories into one file, `lzip`_ for compression, and `GNU Privacy
 291 Guard`_ for encryption or `b2sum`_ for integrity.  It is important to do
 292 things in order: first archive, then compress, then either encrypt or
 293 integrity-check, then erasure code.  Note that if GNU Privacy Guard is used
 294 for privacy, then it will also ensure integrity, so the use of b2sum is
 295 unnecessary in that case. Note also that you also need to do integrity
 296 checking (such as with b2sum) on the blocks that result from the erasure
 297 coding in *addition* to doing it on the file contents! (There are two
 298 different subtle failure modes -- see "more than one file can match an
 299 immutable file cap" on the `Hack Tahoe-LAFS!`_ Hall of Fame.)
 300
 301 The `Tahoe-LAFS`_ project uses zfec as part of a complete distributed
 302 filesystem with integrated encryption, integrity, remote distribution of the
 303 blocks, directory structure, backup of changed files or directories, access
 304 control, immutable files and directories, proof-of-retrievability, and repair
 305 of damaged files and directories.
 306
 307 `fecpp`_ is an alternative to zfec. It implements a bitwise-compatible
 308 algorithm to zfec and is BSD-licensed.
 309
 310 .. _GNU tar: http://directory.fsf.org/project/tar/
 311 .. _lzip: http://www.nongnu.org/lzip/lzip.html
 312 .. _GNU Privacy Guard: http://gnupg.org/
 313 .. _b2sum: https://blake2.net/
 314 .. _Tahoe-LAFS: https://tahoe-lafs.org
 315 .. _Hack Tahoe-LAFS!: https://tahoe-lafs.org/hacktahoelafs/
 316 .. _fecpp: http://www.randombit.net/code/fecpp/
 317
 318
 319 Enjoy!
 320
 321 Zooko Wilcox-O'Hearn
 322
 323 2013-05-15
 324
 325 Boulder, Colorado