<body>
<h1>Welcome to Tahoe</h1>
- <p>Welcome to the Tahoe project, a secure, decentralized, fault-tolerant filesystem. All of the source code is available under a Free Software, Open Source licence.</p>
+ <p>Welcome to allmydata.org Tahoe: a secure, decentralized, fault-tolerant filesystem. All of the source code is available under a Free Software, Open Source licence.</p>
<p>This filesystem is encrypted and spread over multiple peers in such a way that it remains available even when some of the peers are unavailable, malfunctioning, or malicious.</p>
<p>See the web site for information, news, and discussion:</p>
<p><a href="http://allmydata.org">http://allmydata.org</a></p>
<h2>Overview</h2>
<p>A "storage grid" is made up of a number of storage servers. A storage server has local attached storage (typically one or more SATA hard disks). A "gateway" uses the storage servers and provides to its clients a filesystem over a standard protocol such as HTTP(S), FUSE, or SMB.</p>
<p>Users do not rely on storage servers to provide <i>confidentiality</i> nor <i>integrity</i> for the data -- instead all of the data is encrypted and integrity-checked by the gateway, so that the servers can neither read nor alter the contents of the files.</p>
- <p>Users do rely on storage servers for <i>availability</i>. The ciphertext is erasure-coded and distributed across <cite>N</cite> storage servers (the default value for <cite>N</cite> is 12) so that it can be recovered from any <cite>K</cite> of these servers (the default value of <cite>K</cite> is 3). Therefore only the simultaneous failure of <cite>N-K+1</cite> (with the defaults, 10) servers can make the data unavailable. Phrasing this in terms of <i>reliance</i>, we say that the users <i>rely on</i> the gateway for the confidentiality and integrity of the data, and on any 3 of the 10 servers for the availability of the data.</p>
- <p>In the typical deployment mode each user runs her own gateway on her own machine. This way she need rely only on her own machine for the confidentiality and integrity of the data, and she can take advantage of filesystem integration using FUSE or SMB.</p>
+ <p>Users do rely on storage servers for <i>availability</i>. The ciphertext is erasure-coded and distributed across <cite>N</cite> storage servers (the default value for <cite>N</cite> is 10) so that it can be recovered from any <cite>K</cite> of these servers (the default value of <cite>K</cite> is 3). Therefore only the simultaneous failure of <cite>N-K+1</cite> (with the defaults, 8) servers can make the data unavailable. Phrasing this in terms of <i>reliance</i>, we say that the users <i>rely on</i> the gateway for the confidentiality and integrity of the data, and on any 3 of the 10 servers for the availability of the data.</p>
+ <p>In the typical deployment mode each user runs her own gateway on her own machine. This way she relies on only her own machine for the confidentiality and integrity of the data, and she can take advantage of filesystem integration using FUSE or SMB.</p>
<p>An alternate deployment mode is that the gateway runs on a remote machine and the user connects to it over HTTPS. This means that the operator of the gateway can view and modify the user's data (the user <i>relies on</i> the gateway for confidentiality and integrity), but the user can access the filesystem with a client that doesn't have the gateway software installed, such as an Internet kiosk or cell phone.</p>
<p>A user who has read-write access to a file or directory in this filesystem can give another user read-write access to that file or directory, or read-only access to that file or directory. A user who has read-only access to a file or directory can give another user read-only access to it.</p>
<p>When linking a file or directory into a parent directory, you can use a read-write link or a read-only link. If you use a read-write link, then anyone who has read-write access to the parent directory can gain read-write access to the child, and anyone who has read-only access to the parent directory can gain read-only access to the child. If you use a read-only link, then anyone who has either read-write or read-only access to the parent directory can gain read-only access to the child.</p>
grid</a>), the Introducer will already be running, and you'll need to
create a node.</p>
- <p>To construct a node run <cite>tahoe create-client</cite>, which will
+ <p>To construct an introducer, create a new base directory for it (the name
+ of the directory is up to you), cd into it, and run "<cite>tahoe
+ create-introducer .</cite>". Now start the introducer by running "<cite>tahoe
+ start .</cite>". After it starts, there will be a file named
+ <cite>introducer.furl</cite> in that base directory. This file contains
+ the URL the nodes must use in order to connect to this
+ introducer.</p>
+
+ <p>To construct a node run "<cite>tahoe create-client</cite>", which will
create <cite>~/.tahoe</cite> to be the node's base directory. Acquire a copy
of the <cite>introducer.furl</cite> from the introducer and put it into this
- directory, then run <cite>tahoe start</cite>. After that, the node should be
+ directory, then run "<cite>tahoe start</cite>". After that, the node should be
off and running. The first thing it will do is connect to the introducer and
get itself connected to all other nodes on the grid.</p>
- <p>To construct an introducer, create a new base directory for it (the name
- of the directory is up to you), cd into it, and run <cite>tahoe
- create-introducer .</cite>. Now start the introducer by running <cite>tahoe
- start .</cite>. After it starts, there will be a file named
- <cite>introducer.furl</cite> in that base directory. This file contains
- the URL which the nodes must use in order to connect to this
- introducer.</p>
+ <p>To stop a running node run "<cite>tahoe stop</cite>".</p>
<h2>Do Stuff With It</h2>
</head>
<body>
+ <p>This is how to use your Tahoe node. First, you have to run your own local Tahoe node, as described in <a href="running.html">running.html</a>.</p>
+
<h1>The WUI</h1>
- <p>Point your web browser to <a href="http://127.0.0.1:8123">http://127.0.0.1:8123</a> to use the node.</p>
+ <p>Point your web browser to <a href="http://127.0.0.1:8123">http://127.0.0.1:8123</a> -- which is the URL of your own local computer -- to use your newly created node.</p>
<p>Create a new directory (with the button labelled "create a directory"). Your web browser will load the new directory. Now if you want to be able to come back to this directory later, you have to bookmark it, or otherwise save the URL of it. If you lose URL to this directory, then you can never again come back to this directory.</p>
<h1>The CLI</h1>
- <p>Prefer the command-line? Run <cite>tahoe --help</cite> (the same command-line tool that is used to start and stop nodes serves to navigate and use the decentralized filesystem). To make commands like <cite>tahoe ls</cite> work without the <cite>--dir-cap=</cite> option, you have to put a directory capability (e.g. <cite>http://127.0.0.1:8123/uri/URI:DIR2:yar9nnzsho6czczieeesc65sry:upp1pmypwxits3w9izkszgo1zbdnsyk3nm6h7e19s7os7s6yhh9y</cite>) into <cite>~/.tahoe/private/root_dir.cap</cite>.</p>
+ <p>Prefer the command-line? Run "<cite>tahoe --help</cite>" (the same command-line tool that is used to start and stop nodes serves to navigate and use the decentralized filesystem). To make commands like "<cite>tahoe ls</cite>" work without the <cite>--dir-cap=</cite> option, you have to put a directory capability (e.g. <cite>http://127.0.0.1:8123/uri/URI%3ADIR2%3Ax2pdyrez6nemrby5jzw6lxkxye%3Arl654zxxdppmhgzvaaaxdyf6bhbzbszmmoynm3h7kzuxtlksbynq/</cite>) into <cite>~/.tahoe/private/root_dir.cap</cite>.</p>
<p>As with the WUI (and with all current interfaces to Tahoe), you are responsible for remembering directory capabilities yourself. If you create a new directory and lose the capability to it, then you cannot access that directory ever again.</p>
<p>P.S. "CLI" is pronounced "clee".</p>
<h1>The FUSE Extension</h1>
- <p>You can plug Tahoe into your computer's local filesystem using the FUSE extension, found in the <cite>contrib</cite> directory. Warning: unlike most of Tahoe, and unlike the rest of the user interfaces described on this page, the FUSE plugin doesn't have extensive unit tests that are automatically run on every check-in of the source. Therefore, we don't for sure how complete and reliable it is.</p>
+ <p>You can plug Tahoe into your computer's local filesystem using the FUSE extension, found in the <cite>contrib</cite> directory. Warning: unlike most of Tahoe, and unlike the rest of the user interfaces described on this page, the FUSE plugin doesn't have extensive unit tests that are automatically run on every check-in of the source. Therefore, we can't be sure how complete and reliable it is.</p>
<p>P.S. "FUSE" rhymes with "booze".</p>
is handled at a time: all blocks for segment A are delivered before any
work is begun on segment B.
-As blocks are created, we retain the hash of each one. The list of
-block hashes for a single share (say, hash(A1), hash(B1), hash(C1)) is
-used to form the base of a Merkle hash tree for that share (hashtrees[1]).
+As blocks are created, we retain the hash of each one. The list of block hashes
+for a single share (say, hash(A1), hash(B1), hash(C1)) is used to form the base
+of a Merkle hash tree for that share, called the block hash tree.
+
This hash tree has one terminal leaf per block. The complete block hash
tree is sent to the shareholder after all the data has been sent. At
retrieval time, the decoder will ask for specific pieces of this tree before
def create_client(basedir, config, out=sys.stdout, err=sys.stderr):
if os.path.exists(basedir):
if os.listdir(basedir):
- print >>err, "The base directory already exists: %s" % basedir
- print >>err, "To avoid clobbering anything, I am going to quit now"
- print >>err, "Please use a different directory, or delete this one"
+ print >>err, "The base directory \"%s\", which is \"%s\" is not empty." % (basedir, os.path.abspath(basedir))
+ print >>err, "To avoid clobbering anything, I am going to quit now."
+ print >>err, "Please use a different directory, or empty this one."
return -1
# we're willing to use an empty directory
else:
def create_introducer(basedir, config, out=sys.stdout, err=sys.stderr):
if os.path.exists(basedir):
if os.listdir(basedir):
- print >>err, "The base directory already exists: %s" % basedir
- print >>err, "To avoid clobbering anything, I am going to quit now"
- print >>err, "Please use a different directory, or delete this one"
+ print >>err, "The base directory \"%s\", which is \"%s\" is not empty." % (basedir, os.path.abspath(basedir))
+ print >>err, "To avoid clobbering anything, I am going to quit now."
+ print >>err, "Please use a different directory, or empty this one."
return -1
# we're willing to use an empty directory
else:
from cStringIO import StringIO
from twisted.python import usage, runtime
from twisted.internet import defer
-import os.path
+import os.path, re
from allmydata.scripts import runner
from allmydata.util import fileutil, testutil
rc = runner.runner(argv, stdout=out, stderr=err)
self.failIfEqual(rc, 0)
self.failUnlessEqual(out.getvalue(), "")
- self.failUnless("The base directory already exists" in err.getvalue())
+ self.failUnless("is not empty." in err.getvalue())
+
+ # Fail if there is a line that doesn't end with a PUNCTUATION MARK.
+ self.failIf(re.search("[^\.!?]\n", err.getvalue()), err.getvalue())
c2 = os.path.join(basedir, "c2")
argv = ["--quiet", "create-client", c2]
rc = runner.runner(argv, stdout=out, stderr=err)
self.failIfEqual(rc, 0)
self.failUnlessEqual(out.getvalue(), "")
- self.failUnless("The base directory already exists" in err.getvalue())
+ self.failUnless("is not empty" in err.getvalue())
+
+ # Fail if there is a line that doesn't end with a PUNCTUATION MARK.
+ self.failIf(re.search("[^\.!?]\n", err.getvalue()), err.getvalue())
c2 = os.path.join(basedir, "c2")
argv = ["--quiet", "create-introducer", c2]
return tagged_hasher("allmydata_plaintext_segment_v1")
def content_hash_key_hash(k, n, segsize, data):
- # this is defined to return a 16-byte AES key. We use SHA-256d here..
+ # This is defined to return a 16-byte AES key. We use SHA-256d here.
# we'd like to use it everywhere, but we're only switching algorithms
# when we can hide the compatibility breaks in other necessary changes.
param_tag = netstring("%d,%d,%d" % (k, n, segsize))