From: zooko Date: Fri, 30 Mar 2007 18:52:43 +0000 (+0530) Subject: pyfec: rename and clarify -- "blocks" are the units of input/output of the codec... X-Git-Url: https://git.rkrishnan.org/specifications/components/something?a=commitdiff_plain;h=e3b10ed34a0b3afa34a7b7d85bcc9b48ce574fcc;p=tahoe-lafs%2Fzfec.git pyfec: rename and clarify -- "blocks" are the units of input/output of the codec, "shares" are sequences of blocks (used to process arbitrary-length files) darcs-hash:8125140adbe753ca182fb227049b264c344f1719 --- diff --git a/pyfec/README.txt b/pyfec/README.txt index d57532d..59695fc 100644 --- a/pyfec/README.txt +++ b/pyfec/README.txt @@ -39,63 +39,63 @@ Please join the pyfec mailing list and submit patches: This package performs two operations, encoding and decoding. Encoding takes some input data and expands its size by producing extra "check blocks", also -called "secondary shares". Decoding takes some data -- any combination of -blocks of the original data (called "primary shares") and "secondary shares", +called "secondary blocks". Decoding takes some data -- any combination of +blocks of the original data (called "primary blocks") and "secondary blocks", and produces the original data. The encoding is parameterized by two integers, k and m. m is the total number -of shares produced, and k is how many of those shares are necessary to +of blocks produced, and k is how many of those blocks are necessary to reconstruct the original data. m is required to be at least 1 and at most 256, and k is required to be at least 1 and at most m. (Note that when k == m then there is no point in doing erasure coding -- it degenerates to the equivalent of the Unix "split" utility which simply splits the input into successive segments. Similarly, when k == 1 it degenerates to -the equivalent of the unix "cp" utility -- each share is a complete copy of the +the equivalent of the unix "cp" utility -- each block is a complete copy of the input data.) -Note that each "primary share" is a segment of the original data, so its size -is 1/k'th of the size of original data, and each "secondary share" is of the -same size, so the total space used by all the shares is m/k times the size of -the original data (plus some padding to fill out the last primary share to be +Note that each "primary block" is a segment of the original data, so its size +is 1/k'th of the size of original data, and each "secondary block" is of the +same size, so the total space used by all the blocks is m/k times the size of +the original data (plus some padding to fill out the last primary block to be the same size as all the others). -The decoding step requires as input k of the shares which were produced by the +The decoding step requires as input k of the blocks which were produced by the encoding step. The decoding step produces as output the data that was earlier input to the encoding step. * API -Each share is associated with "shareid". The shareid of each primary share is -its index (starting from zero), so the 0'th share is the first primary share, -which is the first few bytes of the file, the 1'st share is the next primary -share, which is the next few bytes of the file, and so on. The last primary -share has shareid k-1. The shareid of each secondary share is an arbitrary +Each block is associated with "blocknum". The blocknum of each primary block is +its index (starting from zero), so the 0'th block is the first primary block, +which is the first few bytes of the file, the 1'st block is the next primary +block, which is the next few bytes of the file, and so on. The last primary +block has blocknum k-1. The blocknum of each secondary block is an arbitrary integer between k and 256 inclusive. (When using the Python API, if you don't -specify which shareids you want for your secondary shares when invoking -encode(), then it will by default provide the shares with ids from k to m-1 +specify which blocknums you want for your secondary blocks when invoking +encode(), then it will by default provide the blocks with ids from k to m-1 inclusive.) ** C API fec_encode() takes as input an array of k pointers, where each pointer points to a memory buffer containing the input data (i.e., the i'th buffer contains -the i'th primary share). There is also a second parameter which is an array of -the shareids of the secondary shares which are to be produced. (Each element -in that array is required to be the shareid of a secondary share, i.e. it is +the i'th primary block). There is also a second parameter which is an array of +the blocknums of the secondary blocks which are to be produced. (Each element +in that array is required to be the blocknum of a secondary block, i.e. it is required to be >= k and < m.) -The output from fec_encode() is the requested set of secondary shares which are +The output from fec_encode() is the requested set of secondary blocks which are written into output buffers provided by the caller. fec_decode() takes as input an array of k pointers, where each pointer points -to a buffer containing a share. There is also a separate input parameter which -is an array of shareids, indicating the shareid of each of the shares which is +to a buffer containing a block. There is also a separate input parameter which +is an array of blocknums, indicating the blocknum of each of the blocks which is being passed in. -The output from fec_decode() is the set of primary shares which were missing -from the input and had to be reconstructed. These reconstructed shares are +The output from fec_decode() is the set of primary blocks which were missing +from the input and had to be reconstructed. These reconstructed blocks are written into putput buffers provided by the caller. ** Python API @@ -106,21 +106,21 @@ tuple) and a "buffer" is any object that implements the Python buffer protocol (such as a string or array). The contents that are required to be present in these buffers are the same as for the C API. -encode() also takes a list of desired shareids. Unlike the C API, the Python -API accepts shareids of primary shares as well as secondary shares in its list -of desired shareids. encode() returns a list of buffer objects which contain -the shares requested. For each requested share which is a primary share, the -resulting list contains a reference to the apppropriate primary share from the -input list. For each requested share which is a secondary share, the list -contains a newly created string object containing that share. +encode() also takes a list of desired blocknums. Unlike the C API, the Python +API accepts blocknums of primary blocks as well as secondary blocks in its list +of desired blocknums. encode() returns a list of buffer objects which contain +the blocks requested. For each requested block which is a primary block, the +resulting list contains a reference to the apppropriate primary block from the +input list. For each requested block which is a secondary block, the list +contains a newly created string object containing that block. -decode() also takes a list of integers indicating the shareids of the shares +decode() also takes a list of integers indicating the blocknums of the blocks being passed int. decode() returns a list of buffer objects which contain all -of the primary shares of the original data (in order). For each primary share +of the primary blocks of the original data (in order). For each primary block which was present in the input list, then the result list simply contains a reference to the object that was passed in the input list. For each primary -share which was not present in the input, the result list contains a newly -created string object containing that primary share. +block which was not present in the input, the result list contains a newly +created string object containing that primary block. Beware of a "gotcha" that can result from the combination of mutable data and the fact that the Python API returns references to inputs when possible. diff --git a/pyfec/fec/_fecmodule.c b/pyfec/fec/_fecmodule.c index eed4bcd..929bfe1 100644 --- a/pyfec/fec/_fecmodule.c +++ b/pyfec/fec/_fecmodule.c @@ -124,131 +124,131 @@ Encoder_init(Encoder *self, PyObject *args, PyObject *kwdict) { static char Encoder_encode__doc__[] = "\ Encode data into m packets.\n\ \n\ -@param inshares: a sequence of k buffers of data to encode -- these are the k primary shares, i.e. the input data split into k pieces (for best performance, make it a tuple instead of a list); All shares are required to be the same length.\n\ -@param desired_shares_ids optional sequence of shareids indicating which shares to produce and return; If None, all m shares will be returned (in order). (For best performance, make it a tuple instead of a list.)\n\ -@returns: a list of buffers containing the requested shares; Note that if any of the input shares were 'primary shares', i.e. their shareid was < k, then the result sequence will contain a Python reference to the same Python object as was passed in. As long as the Python object in question is immutable (i.e. a string) then you don't have to think about this detail, but if it is mutable (i.e. an array), then you have to be aware that if you subsequently mutate the contents of that object then that will also change the contents of the sequence that was returned from this call to encode().\n\ +@param inblocks: a sequence of k buffers of data to encode -- these are the k primary blocks, i.e. the input data split into k pieces (for best performance, make it a tuple instead of a list); All blocks are required to be the same length.\n\ +@param desired_blocks_nums optional sequence of blocknums indicating which blocks to produce and return; If None, all m blocks will be returned (in order). (For best performance, make it a tuple instead of a list.)\n\ +@returns: a list of buffers containing the requested blocks; Note that if any of the input blocks were 'primary blocks', i.e. their blocknum was < k, then the result sequence will contain a Python reference to the same Python object as was passed in. As long as the Python object in question is immutable (i.e. a string) then you don't have to think about this detail, but if it is mutable (i.e. an array), then you have to be aware that if you subsequently mutate the contents of that object then that will also change the contents of the sequence that was returned from this call to encode().\n\ "; static PyObject * Encoder_encode(Encoder *self, PyObject *args) { - PyObject* inshares; - PyObject* desired_shares_ids = NULL; /* The shareids of the shares that should be returned. */ + PyObject* inblocks; + PyObject* desired_blocks_nums = NULL; /* The blocknums of the blocks that should be returned. */ PyObject* result = NULL; - if (!PyArg_ParseTuple(args, "O|O", &inshares, &desired_shares_ids)) + if (!PyArg_ParseTuple(args, "O|O", &inblocks, &desired_blocks_nums)) return NULL; - gf* check_shares_produced[self->mm - self->kk]; /* This is an upper bound -- we will actually use only num_check_shares_produced of these elements (see below). */ - PyObject* pystrs_produced[self->mm - self->kk]; /* This is an upper bound -- we will actually use only num_check_shares_produced of these elements (see below). */ - unsigned num_check_shares_produced = 0; /* The first num_check_shares_produced elements of the check_shares_produced array and of the pystrs_produced array will be used. */ - const gf* incshares[self->kk]; - unsigned num_desired_shares; - PyObject* fast_desired_shares_ids = NULL; - PyObject** fast_desired_shares_ids_items; - unsigned c_desired_shares_ids[self->mm]; - unsigned c_desired_checkshares_ids[self->mm - self->kk]; + gf* check_blocks_produced[self->mm - self->kk]; /* This is an upper bound -- we will actually use only num_check_blocks_produced of these elements (see below). */ + PyObject* pystrs_produced[self->mm - self->kk]; /* This is an upper bound -- we will actually use only num_check_blocks_produced of these elements (see below). */ + unsigned num_check_blocks_produced = 0; /* The first num_check_blocks_produced elements of the check_blocks_produced array and of the pystrs_produced array will be used. */ + const gf* incblocks[self->kk]; + unsigned num_desired_blocks; + PyObject* fast_desired_blocks_nums = NULL; + PyObject** fast_desired_blocks_nums_items; + unsigned c_desired_blocks_nums[self->mm]; + unsigned c_desired_checkblocks_ids[self->mm - self->kk]; unsigned i; - PyObject* fastinshares = NULL; + PyObject* fastinblocks = NULL; for (i=0; imm - self->kk; i++) pystrs_produced[i] = NULL; - if (desired_shares_ids) { - fast_desired_shares_ids = PySequence_Fast(desired_shares_ids, "Second argument (optional) was not a sequence."); - if (!fast_desired_shares_ids) + if (desired_blocks_nums) { + fast_desired_blocks_nums = PySequence_Fast(desired_blocks_nums, "Second argument (optional) was not a sequence."); + if (!fast_desired_blocks_nums) goto err; - num_desired_shares = PySequence_Fast_GET_SIZE(fast_desired_shares_ids); - fast_desired_shares_ids_items = PySequence_Fast_ITEMS(fast_desired_shares_ids); - for (i=0; i= self->kk) - num_check_shares_produced++; + c_desired_blocks_nums[i] = PyInt_AsLong(fast_desired_blocks_nums_items[i]); + if (c_desired_blocks_nums[i] >= self->kk) + num_check_blocks_produced++; } } else { - num_desired_shares = self->mm; - for (i=0; imm - self->kk; + num_desired_blocks = self->mm; + for (i=0; imm - self->kk; } - fastinshares = PySequence_Fast(inshares, "First argument was not a sequence."); - if (!fastinshares) + fastinblocks = PySequence_Fast(inblocks, "First argument was not a sequence."); + if (!fastinblocks) goto err; - if (PySequence_Fast_GET_SIZE(fastinshares) != self->kk) { - py_raise_fec_error("Precondition violation: Wrong length -- first argument is required to contain exactly k shares. len(first): %d, k: %d", PySequence_Fast_GET_SIZE(fastinshares), self->kk); + if (PySequence_Fast_GET_SIZE(fastinblocks) != self->kk) { + py_raise_fec_error("Precondition violation: Wrong length -- first argument is required to contain exactly k blocks. len(first): %d, k: %d", PySequence_Fast_GET_SIZE(fastinblocks), self->kk); goto err; } /* Construct a C array of gf*'s of the input data. */ - PyObject** fastinsharesitems = PySequence_Fast_ITEMS(fastinshares); - if (!fastinsharesitems) + PyObject** fastinblocksitems = PySequence_Fast_ITEMS(fastinblocks); + if (!fastinblocksitems) goto err; Py_ssize_t sz, oldsz = 0; for (i=0; ikk; i++) { - if (!PyObject_CheckReadBuffer(fastinsharesitems[i])) { + if (!PyObject_CheckReadBuffer(fastinblocksitems[i])) { py_raise_fec_error("Precondition violation: %u'th item is required to offer the single-segment read character buffer protocol, but it does not.\n", i); goto err; } - if (PyObject_AsReadBuffer(fastinsharesitems[i], (const void**)&(incshares[i]), &sz)) + if (PyObject_AsReadBuffer(fastinblocksitems[i], (const void**)&(incblocks[i]), &sz)) goto err; if (oldsz != 0 && oldsz != sz) { - py_raise_fec_error("Precondition violation: Input shares are required to be all the same length. oldsz: %Zu, sz: %Zu\n", oldsz, sz); + py_raise_fec_error("Precondition violation: Input blocks are required to be all the same length. oldsz: %Zu, sz: %Zu\n", oldsz, sz); goto err; } oldsz = sz; } - /* Allocate space for all of the check shares. */ - unsigned check_share_index = 0; /* index into the check_shares_produced and (parallel) pystrs_produced arrays */ - for (i=0; i= self->kk) { - c_desired_checkshares_ids[check_share_index] = c_desired_shares_ids[i]; - pystrs_produced[check_share_index] = PyString_FromStringAndSize(NULL, sz); - if (pystrs_produced[check_share_index] == NULL) + /* Allocate space for all of the check blocks. */ + unsigned check_block_index = 0; /* index into the check_blocks_produced and (parallel) pystrs_produced arrays */ + for (i=0; i= self->kk) { + c_desired_checkblocks_ids[check_block_index] = c_desired_blocks_nums[i]; + pystrs_produced[check_block_index] = PyString_FromStringAndSize(NULL, sz); + if (pystrs_produced[check_block_index] == NULL) goto err; - check_shares_produced[check_share_index] = (gf*)PyString_AsString(pystrs_produced[check_share_index]); - if (check_shares_produced[check_share_index] == NULL) + check_blocks_produced[check_block_index] = (gf*)PyString_AsString(pystrs_produced[check_block_index]); + if (check_blocks_produced[check_block_index] == NULL) goto err; - check_share_index++; + check_block_index++; } } - assert (check_share_index == num_check_shares_produced); + assert (check_block_index == num_check_blocks_produced); - /* Encode any check shares that are needed. */ - fec_encode(self->fec_matrix, incshares, check_shares_produced, c_desired_checkshares_ids, num_check_shares_produced, sz); + /* Encode any check blocks that are needed. */ + fec_encode(self->fec_matrix, incblocks, check_blocks_produced, c_desired_checkblocks_ids, num_check_blocks_produced, sz); - /* Wrap all requested shares up into a Python list of Python strings. */ - result = PyList_New(num_desired_shares); + /* Wrap all requested blocks up into a Python list of Python strings. */ + result = PyList_New(num_desired_blocks); if (result == NULL) goto err; - check_share_index = 0; - for (i=0; ikk) { - Py_INCREF(fastinsharesitems[c_desired_shares_ids[i]]); - if (PyList_SetItem(result, i, fastinsharesitems[c_desired_shares_ids[i]]) == -1) { - Py_DECREF(fastinsharesitems[c_desired_shares_ids[i]]); + check_block_index = 0; + for (i=0; ikk) { + Py_INCREF(fastinblocksitems[c_desired_blocks_nums[i]]); + if (PyList_SetItem(result, i, fastinblocksitems[c_desired_blocks_nums[i]]) == -1) { + Py_DECREF(fastinblocksitems[c_desired_blocks_nums[i]]); goto err; } } else { - if (PyList_SetItem(result, i, pystrs_produced[check_share_index]) == -1) + if (PyList_SetItem(result, i, pystrs_produced[check_block_index]) == -1) goto err; - pystrs_produced[check_share_index] = NULL; - check_share_index++; + pystrs_produced[check_block_index] = NULL; + check_block_index++; } } goto cleanup; err: - for (i=0; ikk]; - unsigned cshareids[self->kk]; + const gf*restrict cblocks[self->kk]; + unsigned cblocknums[self->kk]; gf*restrict recoveredcstrs[self->kk]; /* self->kk is actually an upper bound -- we probably won't need all of this space. */ PyObject*restrict recoveredpystrs[self->kk]; /* self->kk is actually an upper bound -- we probably won't need all of this space. */ unsigned i; for (i=0; ikk; i++) recoveredpystrs[i] = NULL; - PyObject*restrict fastshareids = NULL; - PyObject*restrict fastshares = PySequence_Fast(shares, "First argument was not a sequence."); - if (!fastshares) + PyObject*restrict fastblocknums = NULL; + PyObject*restrict fastblocks = PySequence_Fast(blocks, "First argument was not a sequence."); + if (!fastblocks) goto err; - fastshareids = PySequence_Fast(shareids, "Second argument was not a sequence."); - if (!fastshareids) + fastblocknums = PySequence_Fast(blocknums, "Second argument was not a sequence."); + if (!fastblocknums) goto err; - if (PySequence_Fast_GET_SIZE(fastshares) != self->kk) { - py_raise_fec_error("Precondition violation: Wrong length -- first argument is required to contain exactly k shares. len(first): %d, k: %d", PySequence_Fast_GET_SIZE(fastshares), self->kk); + if (PySequence_Fast_GET_SIZE(fastblocks) != self->kk) { + py_raise_fec_error("Precondition violation: Wrong length -- first argument is required to contain exactly k blocks. len(first): %d, k: %d", PySequence_Fast_GET_SIZE(fastblocks), self->kk); goto err; } - if (PySequence_Fast_GET_SIZE(fastshareids) != self->kk) { - py_raise_fec_error("Precondition violation: Wrong length -- shareids is required to contain exactly k shares. len(shareids): %d, k: %d", PySequence_Fast_GET_SIZE(fastshareids), self->kk); + if (PySequence_Fast_GET_SIZE(fastblocknums) != self->kk) { + py_raise_fec_error("Precondition violation: Wrong length -- blocknums is required to contain exactly k blocks. len(blocknums): %d, k: %d", PySequence_Fast_GET_SIZE(fastblocknums), self->kk); goto err; } - /* Construct a C array of gf*'s of the data and another of C ints of the shareids. */ + /* Construct a C array of gf*'s of the data and another of C ints of the blocknums. */ unsigned needtorecover=0; - PyObject** fastshareidsitems = PySequence_Fast_ITEMS(fastshareids); - if (!fastshareidsitems) + PyObject** fastblocknumsitems = PySequence_Fast_ITEMS(fastblocknums); + if (!fastblocknumsitems) goto err; - PyObject** fastsharesitems = PySequence_Fast_ITEMS(fastshares); - if (!fastsharesitems) + PyObject** fastblocksitems = PySequence_Fast_ITEMS(fastblocks); + if (!fastblocksitems) goto err; Py_ssize_t sz, oldsz = 0; for (i=0; ikk; i++) { - if (!PyInt_Check(fastshareidsitems[i])) { + if (!PyInt_Check(fastblocknumsitems[i])) { py_raise_fec_error("Precondition violation: second argument is required to contain int."); goto err; } - long tmpl = PyInt_AsLong(fastshareidsitems[i]); + long tmpl = PyInt_AsLong(fastblocknumsitems[i]); if (tmpl < 0 || tmpl > 255) { - py_raise_fec_error("Precondition violation: Share ids can't be less than zero or greater than 255. %ld\n", tmpl); + py_raise_fec_error("Precondition violation: block nums can't be less than zero or greater than 255. %ld\n", tmpl); goto err; } - cshareids[i] = (unsigned)tmpl; - if (cshareids[i] >= self->kk) + cblocknums[i] = (unsigned)tmpl; + if (cblocknums[i] >= self->kk) needtorecover+=1; - if (!PyObject_CheckReadBuffer(fastsharesitems[i])) { + if (!PyObject_CheckReadBuffer(fastblocksitems[i])) { py_raise_fec_error("Precondition violation: %u'th item is required to offer the single-segment read character buffer protocol, but it does not.\n", i); goto err; } - if (PyObject_AsReadBuffer(fastsharesitems[i], (const void**)&(cshares[i]), &sz)) + if (PyObject_AsReadBuffer(fastblocksitems[i], (const void**)&(cblocks[i]), &sz)) goto err; if (oldsz != 0 && oldsz != sz) { - py_raise_fec_error("Precondition violation: Input shares are required to be all the same length. oldsz: %Zu, sz: %Zu\n", oldsz, sz); + py_raise_fec_error("Precondition violation: Input blocks are required to be all the same length. oldsz: %Zu, sz: %Zu\n", oldsz, sz); goto err; } oldsz = sz; @@ -455,19 +455,19 @@ Decoder_decode(Decoder *self, PyObject *args) { /* move src packets into position */ for (i=0; ikk;) { - if (cshareids[i] >= self->kk || cshareids[i] == i) + if (cblocknums[i] >= self->kk || cblocknums[i] == i) i++; else { /* put pkt in the right position. */ - unsigned c = cshareids[i]; + unsigned c = cblocknums[i]; - SWAP (cshareids[i], cshareids[c], int); - SWAP (cshares[i], cshares[c], const gf*); - SWAP (fastsharesitems[i], fastsharesitems[c], PyObject*); + SWAP (cblocknums[i], cblocknums[c], int); + SWAP (cblocks[i], cblocks[c], const gf*); + SWAP (fastblocksitems[i], fastblocksitems[c], PyObject*); } } - /* Allocate space for all of the recovered shares. */ + /* Allocate space for all of the recovered blocks. */ for (i=0; ifec_matrix, cshares, recoveredcstrs, cshareids, sz); + /* Decode any recovered blocks that are needed. */ + fec_decode(self->fec_matrix, cblocks, recoveredcstrs, cblocknums, sz); - /* Wrap up both original primary shares and decoded shares into a Python list of Python strings. */ + /* Wrap up both original primary blocks and decoded blocks into a Python list of Python strings. */ unsigned nextrecoveredix=0; result = PyList_New(self->kk); if (result == NULL) goto err; for (i=0; ikk; i++) { - if (cshareids[i] == i) { - /* Original primary share. */ - Py_INCREF(fastsharesitems[i]); - if (PyList_SetItem(result, i, fastsharesitems[i]) == -1) { - Py_DECREF(fastsharesitems[i]); + if (cblocknums[i] == i) { + /* Original primary block. */ + Py_INCREF(fastblocksitems[i]); + if (PyList_SetItem(result, i, fastblocksitems[i]) == -1) { + Py_DECREF(fastblocksitems[i]); goto err; } } else { - /* Recovered share. */ + /* Recovered block. */ if (PyList_SetItem(result, i, recoveredpystrs[nextrecoveredix]) == -1) goto err; recoveredpystrs[nextrecoveredix] = NULL; @@ -508,8 +508,8 @@ Decoder_decode(Decoder *self, PyObject *args) { Py_XDECREF(recoveredpystrs[i]); Py_XDECREF(result); result = NULL; cleanup: - Py_XDECREF(fastshares); fastshares=NULL; - Py_XDECREF(fastshareids); fastshareids=NULL; + Py_XDECREF(fastblocks); fastblocks=NULL; + Py_XDECREF(fastblocknums); fastblocknums=NULL; return result; } diff --git a/pyfec/fec/easyfec.py b/pyfec/fec/easyfec.py index a2b5ec4..de7e8b9 100644 --- a/pyfec/fec/easyfec.py +++ b/pyfec/fec/easyfec.py @@ -24,6 +24,6 @@ class Encoder(object): l[-1] = l[-1] + ('\x00'*(len(l[0])-len(l[-1]))) return self.fec.encode(l) - def decode(self, shares): - return self.fec.decode(shares) + def decode(self, blocks): + return self.fec.decode(blocks) diff --git a/pyfec/fec/fec.c b/pyfec/fec/fec.c index f563b86..24c0ba3 100644 --- a/pyfec/fec/fec.c +++ b/pyfec/fec/fec.c @@ -564,13 +564,13 @@ fec_new(unsigned k, unsigned n) { } void -fec_encode(const fec_t* code, const gf*restrict const*restrict const src, gf*restrict const*restrict const fecs, const unsigned*restrict const share_ids, size_t num_share_ids, size_t sz) { +fec_encode(const fec_t* code, const gf*restrict const*restrict const src, gf*restrict const*restrict const fecs, const unsigned*restrict const block_nums, size_t num_block_nums, size_t sz) { unsigned i, j; unsigned fecnum; gf* p; - for (i=0; i= code->k); memset(fecs[i], 0, sz); p = &(code->enc_matrix[fecnum * code->k]); diff --git a/pyfec/fec/fec.h b/pyfec/fec/fec.h index abce468..2c689c1 100644 --- a/pyfec/fec/fec.h +++ b/pyfec/fec/fec.h @@ -76,24 +76,24 @@ typedef struct { } fec_t; /** - * param k the number of shares required to reconstruct - * param m the total number of share created + * param k the number of blocks required to reconstruct + * param m the total number of blocks created */ fec_t* fec_new(unsigned k, unsigned m); void fec_free(fec_t* p); /** - * @param inpkts the "primary shares" i.e. the chunks of the input data - * @param fecs buffers into which the secondary shares will be written - * @param share_ids the numbers of the desired shares -- including both primary shares (the id < k) which fec_encode() ignores and check shares (the id >= k) which fec_encode() will produce and store into the buffers of the fecs parameter - * @param num_share_ids the length of the share_ids array + * @param inpkts the "primary blocks" i.e. the chunks of the input data + * @param fecs buffers into which the secondary blocks will be written + * @param block_nums the numbers of the desired blocks -- including both primary blocks (the id < k) which fec_encode() ignores and check blocks (the id >= k) which fec_encode() will produce and store into the buffers of the fecs parameter + * @param num_block_nums the length of the block_nums array */ -void fec_encode(const fec_t* code, const gf*restrict const*restrict const src, gf*restrict const*restrict const fecs, const unsigned*restrict const share_ids, size_t num_share_ids, size_t sz); +void fec_encode(const fec_t* code, const gf*restrict const*restrict const src, gf*restrict const*restrict const fecs, const unsigned*restrict const block_nums, size_t num_block_nums, size_t sz); /** * @param inpkts an array of packets (size k) * @param outpkts an array of buffers into which the reconstructed output packets will be written (only packets which are not present in the inpkts input will be reconstructed and written to outpkts) - * @param index an array of the shareids of the packets in inpkts + * @param index an array of the blocknums of the packets in inpkts * @param sz size of a packet in bytes */ void fec_decode(const fec_t* code, const gf*restrict const*restrict const inpkts, gf*restrict const*restrict const outpkts, const unsigned*restrict const index, size_t sz); diff --git a/pyfec/fec/filefec.py b/pyfec/fec/filefec.py index c857427..6c9ae13 100644 --- a/pyfec/fec/filefec.py +++ b/pyfec/fec/filefec.py @@ -29,37 +29,37 @@ import array, random def encode_to_files_easyfec(inf, prefix, k, m): """ - Encode inf, writing the shares to named $prefix+$shareid. + Encode inf, writing the shares to a file named $prefix+$sharenum. """ - l = [ open(prefix+str(shareid), "wb") for shareid in range(m) ] - def cb(shares, length): - assert len(shares) == len(l) - for i in range(len(shares)): - l[i].write(shares[i]) + l = [ open(prefix+str(sharenum), "wb") for sharenum in range(m) ] + def cb(blocks, length): + assert len(blocks) == len(l) + for i in range(len(blocks)): + l[i].write(blocks[i]) encode_file_stringy_easyfec(inf, cb, k, m, chunksize=4096) def encode_to_files_stringy(inf, prefix, k, m): """ - Encode inf, writing the shares to named $prefix+$shareid. + Encode inf, writing the shares to a file named named $prefix+$sharenum. """ - l = [ open(prefix+str(shareid), "wb") for shareid in range(m) ] - def cb(shares, length): - assert len(shares) == len(l) - for i in range(len(shares)): - l[i].write(shares[i]) + l = [ open(prefix+str(sharenum), "wb") for sharenum in range(m) ] + def cb(blocks, length): + assert len(blocks) == len(l) + for i in range(len(blocks)): + l[i].write(blocks[i]) encode_file_stringy(inf, cb, k, m, chunksize=4096) def encode_to_files(inf, prefix, k, m): """ - Encode inf, writing the shares to named $prefix+$shareid. + Encode inf, writing the shares to named $prefix+$sharenum. """ - l = [ open(prefix+str(shareid), "wb") for shareid in range(m) ] - def cb(shares, length): - assert len(shares) == len(l) - for i in range(len(shares)): - l[i].write(shares[i]) + l = [ open(prefix+str(sharenum), "wb") for sharenum in range(m) ] + def cb(blocks, length): + assert len(blocks) == len(l) + for i in range(len(blocks)): + l[i].write(blocks[i]) encode_file(inf, cb, k, m, chunksize=4096) @@ -70,13 +70,13 @@ def decode_from_files(outf, filesize, prefix, k, m): """ import os infs = [] - shareids = [] + sharenums = [] listd = os.listdir(".") random.shuffle(listd) for f in listd: if f.startswith(prefix): infs.append(open(f, "rb")) - shareids.append(int(f[len(prefix):])) + sharenums.append(int(f[len(prefix):])) if len(infs) == k: break @@ -84,31 +84,31 @@ def decode_from_files(outf, filesize, prefix, k, m): dec = fec.Decoder(k, m) while True: x = [ inf.read(CHUNKSIZE) for inf in infs ] - decshares = dec.decode(x, shareids) - for decshare in decshares: - if len(decshare) == 0: + decblocks = dec.decode(x, sharenums) + for decblock in decblocks: + if len(decblock) == 0: raise "error -- probably share was too short -- was it stored in a file which got truncated? chunksizes: %s" % ([len(chunk) for chunk in x],) - if filesize >= len(decshare): - outf.write(decshare) - filesize -= len(decshare) - # print "filesize is now %s after subtracting %s" % (filesize, len(decshare),) + if filesize >= len(decblock): + outf.write(decblock) + filesize -= len(decblock) + # print "filesize is now %s after subtracting %s" % (filesize, len(decblock),) else: - outf.write(decshare[:filesize]) + outf.write(decblock[:filesize]) return def encode_file(inf, cb, k, m, chunksize=4096): """ Read in the contents of inf, encode, and call cb with the results. - First, k "input shares" will be read from inf, each input share being of - size chunksize. Then these k shares will be encoded into m "result - shares". Then cb will be invoked, passing a list of the m result shares + First, k "input blocks" will be read from inf, each input block being of + size chunksize. Then these k blocks will be encoded into m "result + blocks". Then cb will be invoked, passing a list of the m result blocks as its first argument, and the length of the encoded data as its second argument. (The length of the encoded data is always equal to k*chunksize, until the last iteration, when the end of the file has been reached and less than k*chunksize bytes could be read from the file.) This procedure is iterated until the end of the file is reached, in which case the space - of the input shares that is unused is filled with zeroes before encoding. + of the input blocks that is unused is filled with zeroes before encoding. Note that the sequence passed in calls to cb() contains mutable array objects in its first k elements whose contents will be overwritten when @@ -123,7 +123,7 @@ def encode_file(inf, cb, k, m, chunksize=4096): @param k the number of shares required to reconstruct the file @param m the total number of shares created @param chunksize how much data to read from inf for each of the k input - shares + blocks """ enc = fec.Encoder(k, m) l = tuple([ array.array('c') for i in range(k) ]) @@ -158,14 +158,14 @@ def encode_file_stringy(inf, cb, k, m, chunksize=4096): """ Read in the contents of inf, encode, and call cb with the results. - First, k "input shares" will be read from inf, each input share being of - size chunksize. Then these k shares will be encoded into m "result - shares". Then cb will be invoked, passing a list of the m result shares + First, k "input blocks" will be read from inf, each input block being of + size chunksize. Then these k blocks will be encoded into m "result + blocks". Then cb will be invoked, passing a list of the m result blocks as its first argument, and the length of the encoded data as its second argument. (The length of the encoded data is always equal to k*chunksize, until the last iteration, when the end of the file has been reached and less than k*chunksize bytes could be read from the file.) This procedure - is iterated until the end of the file is reached, in which case the space + is iterated until the end of the file is reached, in which case the part of the input shares that is unused is filled with zeroes before encoding. @param inf the file object from which to read the data @@ -173,7 +173,7 @@ def encode_file_stringy(inf, cb, k, m, chunksize=4096): @param k the number of shares required to reconstruct the file @param m the total number of shares created @param chunksize how much data to read from inf for each of the k input - shares + blocks """ enc = fec.Encoder(k, m) indatasize = k*chunksize # will be reset to shorter upon EOF @@ -209,7 +209,7 @@ def encode_file_not_really(inf, cb, k, m, chunksize=4096): @param k the number of shares required to reconstruct the file @param m the total number of shares created @param chunksize how much data to read from inf for each of the k input - shares + blocks """ enc = fec.Encoder(k, m) l = tuple([ array.array('c') for i in range(k) ]) @@ -244,8 +244,8 @@ def encode_file_stringy_easyfec(inf, cb, k, m, chunksize=4096): Read in the contents of inf, encode, and call cb with the results. First, chunksize*k bytes will be read from inf, then encoded into m - "result shares". Then cb will be invoked, passing a list of the m result - shares as its first argument, and the length of the encoded data as its + "result blocks". Then cb will be invoked, passing a list of the m result + blocks as its first argument, and the length of the encoded data as its second argument. (The length of the encoded data is always equal to k*chunksize, until the last iteration, when the end of the file has been reached and less than k*chunksize bytes could be read from the file.) @@ -258,7 +258,7 @@ def encode_file_stringy_easyfec(inf, cb, k, m, chunksize=4096): @param k the number of shares required to reconstruct the file @param m the total number of shares created @param chunksize how much data to read from inf for each of the k input - shares + blocks """ enc = easyfec.Encoder(k, m) diff --git a/pyfec/fec/test/test_pyfec.py b/pyfec/fec/test/test_pyfec.py index f957891..e4a1a24 100755 --- a/pyfec/fec/test/test_pyfec.py +++ b/pyfec/fec/test/test_pyfec.py @@ -45,17 +45,17 @@ def _h(k, m, ss): # sys.stdout.write("k: %s, m: %s, len(ss): %r, len(ss[0]): %r" % (k, m, len(ss), len(ss[0]),)) ; sys.stdout.flush() encer = fec.Encoder(k, m) # sys.stdout.write("constructed.\n") ; sys.stdout.flush() - nums_and_shares = list(enumerate(encer.encode(ss))) + nums_and_blocks = list(enumerate(encer.encode(ss))) # sys.stdout.write("encoded.\n") ; sys.stdout.flush() - assert isinstance(nums_and_shares, list), nums_and_shares - assert len(nums_and_shares) == m, (len(nums_and_shares), m,) - nums_and_shares = random.sample(nums_and_shares, k) - shares = [ x[1] for x in nums_and_shares ] - nums = [ x[0] for x in nums_and_shares ] + assert isinstance(nums_and_blocks, list), nums_and_blocks + assert len(nums_and_blocks) == m, (len(nums_and_blocks), m,) + nums_and_blocks = random.sample(nums_and_blocks, k) + blocks = [ x[1] for x in nums_and_blocks ] + nums = [ x[0] for x in nums_and_blocks ] # sys.stdout.write("about to construct Decoder.\n") ; sys.stdout.flush() decer = fec.Decoder(k, m) # sys.stdout.write("about to decode from %s.\n"%nums) ; sys.stdout.flush() - decoded = decer.decode(shares, nums) + decoded = decer.decode(blocks, nums) # sys.stdout.write("decoded.\n") ; sys.stdout.flush() assert len(decoded) == len(ss), (len(decoded), len(ss),) assert tuple([str(s) for s in decoded]) == tuple([str(s) for s in ss]), (tuple([ab(str(s)) for s in decoded]), tuple([ab(str(s)) for s in ss]),) @@ -101,7 +101,7 @@ def test_random(): def test_bad_args_enc(): encer = fec.Encoder(2, 4) try: - encer.encode(["a", "b", ], ["c", "I am not an integer shareid",]) + encer.encode(["a", "b", ], ["c", "I am not an integer blocknum",]) except fec.Error, e: assert "Precondition violation: second argument is required to contain int" in str(e), e else: