Welcome to the AllMyData "tahoe" project. This project implements a
-secure, distributed, fault-tolerant storage mesh.
+secure, distributed, fault-tolerant storage grid.
-The basic idea is that the data in this storage mesh is spread over all
+The basic idea is that the data in this storage grid is spread over all
participating nodes, using an algorithm that can recover the data even if a
majority of the nodes are no longer available.
-The interface to the storage mesh allows you to store and fetch files, either
+The interface to the storage grid allows you to store and fetch files, either
by self-authenticating cryptographic identifier or by filename and path.
directory, inside of which you can add files to configure and control the
node. Nodes also read and write files within that directory.
- A mesh consists of a single central 'introducer and vdrive' node and a large
- number of 'client' nodes. If you are joining an existing mesh, the
+ A grid consists of a single central 'introducer and vdrive' node and a large
+ number of 'client' nodes. If you are joining an existing grid, the
introducer-and-vdrive node will already be running, and you'll just need to
- create a client node. If you're creating a brand new mesh, you'll need to
+ create a client node. If you're creating a brand new grid, you'll need to
create both an introducer-and-vdrive and a client (and then invite other
- people to create their own client nodes and join your mesh).
+ people to create their own client nodes and join your grid).
The introducer (-and-vdrive) node is constructed by running 'allmydata-tahoe
create-introducer --basedir $HERE'. Once constructed, you can start the
from the introducer into this new directory, then run 'allmydata-tahoe start
--basedir $HERE'. After that, the client node should be off and running. The
first thing it will do is connect to the introducer and introduce itself to
- all other nodes on the mesh. You can follow its progress by looking at the
+ all other nodes on the grid. You can follow its progress by looking at the
$HERE/twistd.log file.
To actually use the client, enable the web interface by writing a port
To stop it again, use 'make stop-client'. Similar makefile targets exist for
making and running an introducer node.
- There is a public mesh available for testing. Look at the wiki page
+ There is a public grid available for testing. Look at the wiki page
(http://allmydata.org) for the necessary .furl data.
OVERVIEW
-The high-level view of this system consists of three layers: the mesh, the
+The high-level view of this system consists of three layers: the grid, the
virtual drive, and the application that sits on top.
-The lowest layer is the "mesh", basically a DHT (Distributed Hash Table)
+The lowest layer is the "grid", basically a DHT (Distributed Hash Table)
which maps URIs to data. The URIs are relatively short ascii strings
(currently about 140 bytes), and each is used as references to an immutable
arbitrary-length sequence of data bytes. This data is distributed around the
-mesh in a large number of nodes, such that a statistically unlikely number
+grid in a large number of nodes, such that a statistically unlikely number
of nodes would have to be unavailable for the data to become unavailable.
The middle layer is the virtual drive: a tree-shaped data structure in which
to get an idea of how much of this has been implemented so far.
-THE BIG MESH OF PEERS
+THE BIG GRID OF PEERS
-Underlying the mesh is a large collection of peer nodes. These are processes
+Underlying the grid is a large collection of peer nodes. These are processes
running on a wide variety of computers, all of which know about each other in
some way or another. They establish TCP connections to one another using
Foolscap, an encrypted+authenticated remote message passing library (using
lease expires, the data is deleted. Peers might renew their leases.
This storage is used to hold "shares", which are themselves used to store
-files in the mesh. There are many shares for each file, typically around 100
+files in the grid. There are many shares for each file, typically around 100
(the exact number depends upon the tradeoffs made between reliability,
overhead, and storage space consumed). The files are indexed by a piece of
the URI called the "verifierid", which is derived from the contents of the
peer connects to this central introducer at startup, and receives a list of
all other peers from it. Each peer then connects to all other peers, creating
a fully-connected topology. Future versions will reduce the number of
-connections considerably, to enable the mesh to scale to larger sizes: the
+connections considerably, to enable the grid to scale to larger sizes: the
design target is one million nodes. In addition, future versions will offer
relay and NAT-traversal services to allow nodes without full internet
-connectivity to participate. In the current release, only one node may be
-behind a NAT box and still permit the mesh to achieve full connectivity.
+connectivity to participate. In the current release, nodes behind NAT boxes
+will connect to all nodes that they can open connections to, but they cannot
+open connections to other nodes behind NAT boxes. Therefore, the more nodes
+there are behind NAT boxes the less the topology resembles the intended
+fully-connected mesh topology.
FILE ENCODING
-When a file is to be added to the mesh, it is first encrypted using a key
+When a file is to be added to the grid, it is first encrypted using a key
that is derived from the hash of the file itself. The encrypted file is then
broken up into segments so it can be processed in small pieces (to minimize
the memory footprint of both encode and decode operations, and to increase
looking for.
URIs refer to an immutable set of bytes. If you modify a file and upload the
-new version to the mesh, you will get a different URI. URIs do not represent
+new version to the grid, you will get a different URI. URIs do not represent
filenames at all, just the data that a filename might point to at some given
-point in time. This is why the "mesh" layer is insufficient to provide a
+point in time. This is why the "grid" layer is insufficient to provide a
virtual drive: an actual filesystem requires human-meaningful names and
mutability, while URIs provide neither. URIs sit on the "global+secure" edge
of Zooko's Triangle[1]. They are self-authenticating, meaning that nobody can
In the current version, the verifierid is used to consistently-permute the
set of all peers (by sorting the peers by HASH(verifierid+peerid)). Each file
gets a different permutation, which (on average) will evenly distribute
-shares among the mesh and avoid hotspots.
+shares among the grid and avoid hotspots.
This permutation places the peers around a 2^256-sized ring, like the rim of
a big clock. The 100-or-so shares are then placed around the same ring (at 0,
of which 25 must be retrievable to recover the file, and the peer selection
algorithm is happy if it was able to place at least 75 shares. These numbers
are adjustable: 25-out-of-100 means an expansion factor of 4x (every file in
-the mesh consumes four times as much space when totalled across all
+the grid consumes four times as much space when totalled across all
StorageServers), but is highly reliable (the actual reliability is a binomial
distribution function of the expected availability of the individual peers,
but in general it goes up very quickly with the expansion factor).
them quickly if there is significant overlap between the set of peers that
were present when the file was uploaded and the set of peers that are present
as it is downloaded (i.e. if the "peerlist stability" is high). Some limits
-may be imposed in large meshes to avoid querying a million peers; this
+may be imposed in large grids to avoid querying a million peers; this
provides a tradeoff between the work spent to discover that a file is
unrecoverable and the probability that a retrieval will fail when it could
have succeeded if we had just tried a little bit harder. The appropriate
-value of this tradeoff will depend upon the size of the mesh, and will change
+value of this tradeoff will depend upon the size of the grid, and will change
over time.
Other peer selection algorithms are being evaluated. One of them (known as
There are a number of engineering issues to be resolved here. The bandwidth,
disk IO, and CPU time consumed by the verification/repair process must be
-balanced against the robustness that it provides to the mesh. The nodes
+balanced against the robustness that it provides to the grid. The nodes
involved in repair will have very different access patterns than normal
nodes, such that these processes may need to be run on hosts with more memory
or network connectivity than usual. The frequency of repair runs directly
URI. Data security (the promise that the data is only readable by people with
the URI) is provided by the encryption key embedded in the URI. Data
availability (the hope that data which has been uploaded in the past will be
-downloadable in the future) is provided by the mesh, which distributes
+downloadable in the future) is provided by the grid, which distributes
failures in a way that reduces the correlation between individual node
failure and overall file recovery failure.
many tradeoffs.
First, some terms: the erasure-coding algorithm is described as K-out-of-N
-(for this release, the default values are K=25 and N=100). Each mesh will
+(for this release, the default values are K=25 and N=100). Each grid will
have some number of peers; this number will rise and fall over time as peers
join, drop out, come back, and leave forever. Files are of various sizes,
some are popular, others are rare. Peers have various capacities, variable
The ratio of N/K is the "expansion factor". Higher expansion factors improve
reliability very quickly (the binomial distribution curve is very sharp), but
-consumes much more mesh capacity. The absolute value of K affects the
+consumes much more grid capacity. The absolute value of K affects the
granularity of the binomial curve (1-out-of-2 is much worse than
50-out-of-100), but high values asymptotically approach a constant that
depends upon 'P' (i.e. 500-of-1000 is not much better than 50-of-100).
attacker convinces you that they are actually multiple servers, so that you
think you are using a large number of independent peers, but in fact you have
a single point of failure (where the attacker turns off all their machines at
-once). Large meshes, with lots of truly-independent peers, will enable the
+once). Large grids, with lots of truly-independent peers, will enable the
use of lower expansion factors to achieve the same reliability, but increase
overhead because each peer needs to know something about every other, and the
rate at which peers come and go will be higher (requiring network maintenance
-traffic). Also, the File Repairer work will increase with larger meshes,
+traffic). Also, the File Repairer work will increase with larger grids,
although then the job can be distributed out to more peers.
Higher values of N increase overhead: more shares means more Merkle hashes
to the target sooner), but also increase overhead (because more blocks means
more Merkle hashes to validate them).
-In general, small private meshes should work well, but the participants will
-have to decide between storage overhead and reliability. Large stable meshes
+In general, small private grids should work well, but the participants will
+have to decide between storage overhead and reliability. Large stable grids
will be able to reduce the expansion factor down to a bare minimum while
-still retaining high reliability, but large unstable meshes (where nodes are
+still retaining high reliability, but large unstable grids (where nodes are
coming and going very quickly) may require more repair/verification bandwidth
than actual upload/download traffic.
Allmydata, Inc. [1], provider of the "Allmydata" consumer backup product, is
pleased announce the first public release of "Tahoe", a secure, distributed
-storage mesh with a free-software licence.
+storage grid with a free-software licence.
The source code that we are releasing is the current working prototype for
Allmydata's next-generation product. This release is targeted at hackers who
This prototype is not recommended for storage of confidential data nor for
data which is not otherwise backed up, but it already implements a functional
-distributed storage mesh and is useful for experimentation, prototyping, and
+distributed storage grid and is useful for experimentation, prototyping, and
extension.
Once installed, create a "client node" as described in the README. Instruct
this client node to connect to a specific "introducer node" by means of
-config files in the client node's working directory. To join a public mesh,
-copy in the .furl files for that mesh. To create a private mesh, run your
+config files in the client node's working directory. To join a public grid,
+copy in the .furl files for that grid. To create a private grid, run your
own introducer, and copy its .furl files.
Each client node runs a local webserver (enabled by writing the desired port
We are well aware of the limitations of decentralization and scalability
inherent in this prototype. In particular, the completely-connected property
-of the mesh and the requirement of a single distinct introducer and vdrive
-server limits the possible size of the mesh. We have plans to loosen these
+of the grid and the requirement of a single distinct introducer and vdrive
+server limits the possible size of the grid. We have plans to loosen these
limitations (see roadmap.txt [5]). Currently it should be noted that the
-mesh already depends as little as possible on the accessibility and
+grid already depends as little as possible on the accessibility and
correctness of the introduction server and the vdrive server. Also note that
the choice of which servers to use is easily configured -- you should be able
-to set up a private mesh for you and your friends almost as easily as to
-connect to our public test mesh.
+to set up a private grid for you and your friends almost as easily as to
+connect to our public test grid.
SOFTWARE ARCHITECTURE
It uses the Foolscap library [10] which provides a remote object protocol
inspired by the capability-secure "E" programming language [11]. Foolscap
-allows us to express the intended behavior of the distributed mesh directly
+allows us to express the intended behavior of the distributed grid directly
in object-oriented terms while relying on a well-engineered, secure transport
layer.
#! /usr/bin/env python
-# Allmydata Tahoe -- secure, distributed storage mesh
+# Allmydata Tahoe -- secure, distributed storage grid
#
# Copyright (C) 2007 Allmydata, Inc.
#
setup(name='allmydata-tahoe',
version='0.1.4b2',
- description='secure, distributed storage mesh',
+ description='secure, distributed storage grid',
long_description="""Welcome to the AllMyData "tahoe" project. This project implements a
-secure, distributed, fault-tolerant storage mesh.
+secure, distributed, fault-tolerant storage grid.
-The basic idea is that the data in this storage mesh is spread over all
+The basic idea is that the data in this storage grid is spread over all
participating nodes, using an algorithm that can recover the data even if a
majority of the nodes are no longer available.""",
author='Allmydata, Inc.',
"""
-Decentralized storage mesh.
+Decentralized storage grid.
maintainer web site: U{http://allmydata.com/}
The goal of the encoder is to turn the original file into a series of
'shares'. Each share is going to a 'shareholder' (nominally each shareholder
-is a different host, but for small meshes there may be overlap). The number
+is a different host, but for small grids there may be overlap). The number
of shares is chosen to hit our reliability goals (more shares on more
machines means more reliability), and is limited by overhead (proportional to
numshares or log(numshares)) and the encoding technology in use (Reed-Solomon
# self.populate_from_node must be defined by the subclass (CHK or
# SSK), since it controls how the spec is interpreted. It will
# probably use the contents of the node to figure out what to
- # download from the mesh, then pass this downloaded serialized data
+ # download from the grid, then pass this downloaded serialized data
# to populate_from_data()
raise NotImplementedError
Each subtree's populate_from_node() method is expected to use the
downloader to obtain a file with the subtree's serialized contents
- (probably by pulling data from some source, like the mesh, the vdrive
+ (probably by pulling data from some source, like the grid, the vdrive
server, an HTTP server, or somewhere on the local filesystem), then
unserialize them and populate the subtree's state.
def serialize_subtree_to_file(f):
"""Create a string which describes my structure and write it to the
given filehandle (using only .write()). This string should be
- suitable for uploading to the mesh or storing in a local file."""
+ suitable for uploading to the grid or storing in a local file."""
def update_now(uploader):
"""Perform whatever work is necessary to record this subtree to
the subtree has been persisted.
For directory subtrees, this will cause the subtree to serialize
- itself to a file, then upload this file to the mesh, then create an
+ itself to a file, then upload this file to the grid, then create an
INode-providing instance which describes where the file wound up. For
redirections, this will cause the subtree to modify the redirection's
persistent storage, then return the (unmodified) INode that describes
generated uri."""
def add_upload_chk(source_filename, stash_uri_in_boxname):
- """This step uploads a file to the mesh and obtains a content-based
+ """This step uploads a file to the grid and obtains a content-based
URI which can be used to later retrieve the same contents ('CHK'
mode). This URI includes unlink rights. It does not mark the file for
retention.
"""
def add_upload_ssk(write_capability, previous_version, source_filename):
- """This step uploads a file to the mesh in a way that replaces the
+ """This step uploads a file to the grid in a way that replaces the
previous version and does not require a change to the ID referenced
by the parent.
"""
class RIControlClient(RemoteInterface):
def upload_from_file_to_uri(filename=str):
- """Upload a file to the mesh. This accepts a filename (which must be
+ """Upload a file to the grid. This accepts a filename (which must be
absolute) that points to a file on the node's local disk. The node
- will read the contents of this file, upload it to the mesh, then
+ will read the contents of this file, upload it to the grid, then
return the URI at which it was uploaded.
"""
return URI
def download_from_uri_to_file(uri=URI, filename=str):
- """Download a file from the mesh, placing it on the node's local disk
+ """Download a file from the grid, placing it on the node's local disk
at the given filename (which must be absolute[?]). Returns the
absolute filename where the file was written."""
return str
from allmydata import workqueue
from cStringIO import StringIO
-class FakeMesh(object):
+class FakeGrid(object):
implements(IDownloader, IUploader)
"""
pairs = list(directory.in_pairs(l))
self.failUnlessEqual(pairs, [(0,1), (2,3), (4,5), (6,7)])
-class FakeMesh(object):
+class FakeGrid(object):
implements(IDownloader, IUploader)
debug = False
def upload(self, uploadable):
uri = "stub-uri-%d" % len(self.files)
if self.debug:
- print "FakeMesh.upload -> %s" % uri
+ print "FakeGrid.upload -> %s" % uri
assert upload.IUploadable.providedBy(uploadable)
f = uploadable.get_filehandle()
data = f.read()
def upload_filename(self, filename):
if self.debug:
- print "FakeMesh.upload_filename(%s)" % filename
+ print "FakeGrid.upload_filename(%s)" % filename
return self.upload(upload.FileName(filename))
def upload_data(self, data):
if self.debug:
- print "FakeMesh.upload_data(%s)" % data
+ print "FakeGrid.upload_data(%s)" % data
return self.upload(upload.Data(data))
def download(self, uri, target):
if self.debug:
- print "FakeMesh.download(%s)" % uri
+ print "FakeGrid.download(%s)" % uri
target.open()
target.write(self.files[uri])
target.close()
class VDrive(unittest.TestCase):
- def makeVirtualDrive(self, basedir, root_node=None, mesh=None):
+ def makeVirtualDrive(self, basedir, root_node=None, grid=None):
wq = workqueue.WorkQueue(os.path.join("test_filetree",
"VDrive",
basedir, "1.workqueue"))
- if mesh:
- assert IUploader.providedBy(mesh)
- assert IDownloader.providedBy(mesh)
- dl = ul = mesh
+ if grid:
+ assert IUploader.providedBy(grid)
+ assert IDownloader.providedBy(grid)
+ dl = ul = grid
else:
- dl = ul = FakeMesh()
+ dl = ul = FakeGrid()
if not root_node:
root_node = directory.LocalFileSubTreeNode()
root_node.new("rootdirtree.save")
def makeCHKTree(self, basename):
# create a LocalFileRedirection pointing at a CHKDirectorySubTree.
# Returns a VirtualDrive instance.
- mesh = FakeMesh()
+ grid = FakeGrid()
topdir = directory.CHKDirectorySubTree().new()
- d = topdir.update_now(mesh)
+ d = topdir.update_now(grid)
def _updated(topnode):
root = redirect.LocalFileRedirection()
root.new("%s-root" % basename, topnode)
- return root.update_now(mesh)
+ return root.update_now(grid)
d.addCallback(_updated)
d.addCallback(lambda rootnode:
self.makeVirtualDrive("%s-vdrive" % basename,
- rootnode, mesh))
+ rootnode, grid))
return d
def failUnlessListsAreEqual(self, list1, list2):
self.failUnlessEqual(c1a, c2a)
def testDirectory(self):
- stm = vdrive.SubTreeMaker(FakeMesh())
+ stm = vdrive.SubTreeMaker(FakeGrid())
# create an empty directory (stored locally)
subtree = directory.LocalFileSubTree()
<p>To view the global shared filestore, <a href="../vdrive">Click Here!</a></p>
-<h2>Mesh Status</h2>
+<h2>Grid Status</h2>
<div>My nodeid: <span n:render="string" n:data="my_nodeid" /></div>
<div>Introducer: <span n:render="string" n:data="introducer_pburl" /></div>