The lowest layer is the "grid", basically a DHT (Distributed Hash Table)
which maps URIs to data. The URIs are relatively short ascii strings
(currently about 140 bytes), and each is used as a reference to an immutable
-arbitrary-length sequence of data bytes. This data is distributed around the
-grid across a large number of nodes, such that a statistically unlikely number
-of nodes would have to be unavailable for the data to become unavailable.
+arbitrary-length sequence of data bytes. This data is encrypted and
+distributed around the grid across a large number of nodes, such that a
+statistically unlikely number of nodes would have to be unavailable for the
+data to become unavailable.
The middle layer is the virtual drive: a tree-shaped data structure in which
the intermediate nodes are directories and the leaf nodes are files. Each
applications can run on top of the virtual drive, of course -- anything that
has a use for a secure, robust, distributed filestore.
-Note: some of the description below indicates design targets rather than
-actual code present in the current release. Please take a look at roadmap.txt
-to get an idea of how much of this has been implemented so far.
+Note: some of the text below describes design targets rather than actual code
+present in the current release. Please take a look at roadmap.txt to get an
+idea of how much of this has been implemented so far.
THE BIG GRID OF PEERS
that would cause it to consume more space than it wants to provide. When a
lease expires, the data is deleted. Peers might renew their leases.
-This storage is used to hold "shares", which are themselves used to store
-files in the grid. There are many shares for each file, typically between 10
-and 100 (the exact number depends upon the tradeoffs made between
-reliability, overhead, and storage space consumed). The files are indexed by
-a "StorageIndex", which is derived from the encryption key, which may be
+This storage is used to hold "shares", which are encoded pieces of files in
+the grid. There are many shares for each file, typically between 10 and 100
+(the exact number depends upon the tradeoffs made between reliability,
+overhead, and storage space consumed). The files are indexed by a
+"StorageIndex", which is derived from the encryption key, which may be
randomly generated or it may be derived from the contents of the file. Leases
are indexed by StorageIndex, and a single StorageServer may hold multiple
shares for the corresponding file. Multiple peers can hold leases on the same
StorageServers in other peers.
A tagged hash of the encryption key is used to form the "storage index",
-which is used for both peer selection (described below) and to index shares
+which is used for both server selection (described below) and to index shares
within the StorageServers on the selected peers.
A variety of hashes are computed while the shares are being produced, to
drive structure.
-PEER SELECTION
+SERVER SELECTION
When a file is uploaded, the encoded shares are sent to other peers. But to
-which ones? The "peer selection" algorithm is used to make this choice.
+which ones? The "server selection" algorithm is used to make this choice.
In the current version, the verifierid is used to consistently-permute the
set of all peers (by sorting the peers by HASH(verifierid+peerid)). Each file
A brief map to where the code lives in this distribution:
- src/zfec: the erasure-coding library, turns data into shares and back again.
- When installed, this provides the 'zfec' package.
-
- src/allmydata: the bulk of the code for this project. When installed, this
- provides the 'allmydata' package. This includes a few pieces
- copied from the PyCrypto package, in allmydata/Crypto/* .
+ src/allmydata: the code for this project. When installed, this provides the
+ 'allmydata' package. This includes a few pieces copied from
+ the PyCrypto package, in allmydata/Crypto/* .
Within src/allmydata/ :
storageserver.py: provides storage services to other nodes
- codec.py: low-level erasure coding, wraps zfec
+ codec.py: low-level erasure coding, wraps the zfec library
encode.py: handles turning data into shares and blocks, computes hash trees
- upload.py: upload-side peer selection, reading data from upload sources
- download.py: download-side peer selection, share retrieval, decoding
+ upload.py: upload server selection, reading data from upload sources
+
+ download.py: download server selection, share retrieval, decoding
dirnode.py: implements the directory nodes. One part runs on the
global vdrive server, the other runs inside a client