From: Zooko O'Whielacronx Date: Fri, 15 Feb 2008 20:11:02 +0000 (-0700) Subject: docs: update install and usage docs, improve cli "usage" output, make new example... X-Git-Tag: allmydata-tahoe-0.8.0~46 X-Git-Url: https://git.rkrishnan.org/architecture.txt?a=commitdiff_plain;h=fc0d6375235a00c689368876629dd5f7964100de;p=tahoe-lafs%2Ftahoe-lafs.git docs: update install and usage docs, improve cli "usage" output, make new example directories, add unit test that fails code which prints out sentences that don't end with punctuation marks --- diff --git a/docs/about.html b/docs/about.html index b662269a..f8f8dd0b 100644 --- a/docs/about.html +++ b/docs/about.html @@ -9,7 +9,7 @@

Welcome to Tahoe

-

Welcome to the Tahoe project, a secure, decentralized, fault-tolerant filesystem. All of the source code is available under a Free Software, Open Source licence.

+

Welcome to allmydata.org Tahoe: a secure, decentralized, fault-tolerant filesystem. All of the source code is available under a Free Software, Open Source licence.

This filesystem is encrypted and spread over multiple peers in such a way that it remains available even when some of the peers are unavailable, malfunctioning, or malicious.

See the web site for information, news, and discussion:

http://allmydata.org

@@ -17,8 +17,8 @@

Overview

A "storage grid" is made up of a number of storage servers. A storage server has local attached storage (typically one or more SATA hard disks). A "gateway" uses the storage servers and provides to its clients a filesystem over a standard protocol such as HTTP(S), FUSE, or SMB.

Users do not rely on storage servers to provide confidentiality nor integrity for the data -- instead all of the data is encrypted and integrity-checked by the gateway, so that the servers can neither read nor alter the contents of the files.

-

Users do rely on storage servers for availability. The ciphertext is erasure-coded and distributed across N storage servers (the default value for N is 12) so that it can be recovered from any K of these servers (the default value of K is 3). Therefore only the simultaneous failure of N-K+1 (with the defaults, 10) servers can make the data unavailable. Phrasing this in terms of reliance, we say that the users rely on the gateway for the confidentiality and integrity of the data, and on any 3 of the 10 servers for the availability of the data.

-

In the typical deployment mode each user runs her own gateway on her own machine. This way she need rely only on her own machine for the confidentiality and integrity of the data, and she can take advantage of filesystem integration using FUSE or SMB.

+

Users do rely on storage servers for availability. The ciphertext is erasure-coded and distributed across N storage servers (the default value for N is 10) so that it can be recovered from any K of these servers (the default value of K is 3). Therefore only the simultaneous failure of N-K+1 (with the defaults, 8) servers can make the data unavailable. Phrasing this in terms of reliance, we say that the users rely on the gateway for the confidentiality and integrity of the data, and on any 3 of the 10 servers for the availability of the data.

+

In the typical deployment mode each user runs her own gateway on her own machine. This way she relies on only her own machine for the confidentiality and integrity of the data, and she can take advantage of filesystem integration using FUSE or SMB.

An alternate deployment mode is that the gateway runs on a remote machine and the user connects to it over HTTPS. This means that the operator of the gateway can view and modify the user's data (the user relies on the gateway for confidentiality and integrity), but the user can access the filesystem with a client that doesn't have the gateway software installed, such as an Internet kiosk or cell phone.

A user who has read-write access to a file or directory in this filesystem can give another user read-write access to that file or directory, or read-only access to that file or directory. A user who has read-only access to a file or directory can give another user read-only access to it.

When linking a file or directory into a parent directory, you can use a read-write link or a read-only link. If you use a read-write link, then anyone who has read-write access to the parent directory can gain read-write access to the child, and anyone who has read-only access to the parent directory can gain read-only access to the child. If you use a read-only link, then anyone who has either read-write or read-only access to the parent directory can gain read-only access to the child.

diff --git a/docs/running.html b/docs/running.html index 35f80f11..e2e647b5 100644 --- a/docs/running.html +++ b/docs/running.html @@ -28,20 +28,22 @@ grid), the Introducer will already be running, and you'll need to create a node.

-

To construct a node run tahoe create-client, which will +

To construct an introducer, create a new base directory for it (the name + of the directory is up to you), cd into it, and run "tahoe + create-introducer .". Now start the introducer by running "tahoe + start .". After it starts, there will be a file named + introducer.furl in that base directory. This file contains + the URL the nodes must use in order to connect to this + introducer.

+ +

To construct a node run "tahoe create-client", which will create ~/.tahoe to be the node's base directory. Acquire a copy of the introducer.furl from the introducer and put it into this - directory, then run tahoe start. After that, the node should be + directory, then run "tahoe start". After that, the node should be off and running. The first thing it will do is connect to the introducer and get itself connected to all other nodes on the grid.

-

To construct an introducer, create a new base directory for it (the name - of the directory is up to you), cd into it, and run tahoe - create-introducer .. Now start the introducer by running tahoe - start .. After it starts, there will be a file named - introducer.furl in that base directory. This file contains - the URL which the nodes must use in order to connect to this - introducer.

+

To stop a running node run "tahoe stop".

Do Stuff With It

diff --git a/docs/using.html b/docs/using.html index a86907d1..d3443808 100644 --- a/docs/using.html +++ b/docs/using.html @@ -9,9 +9,11 @@ +

This is how to use your Tahoe node. First, you have to run your own local Tahoe node, as described in running.html.

+

The WUI

-

Point your web browser to http://127.0.0.1:8123 to use the node.

+

Point your web browser to http://127.0.0.1:8123 -- which is the URL of your own local computer -- to use your newly created node.

Create a new directory (with the button labelled "create a directory"). Your web browser will load the new directory. Now if you want to be able to come back to this directory later, you have to bookmark it, or otherwise save the URL of it. If you lose URL to this directory, then you can never again come back to this directory.

@@ -21,14 +23,14 @@

The CLI

-

Prefer the command-line? Run tahoe --help (the same command-line tool that is used to start and stop nodes serves to navigate and use the decentralized filesystem). To make commands like tahoe ls work without the --dir-cap= option, you have to put a directory capability (e.g. http://127.0.0.1:8123/uri/URI:DIR2:yar9nnzsho6czczieeesc65sry:upp1pmypwxits3w9izkszgo1zbdnsyk3nm6h7e19s7os7s6yhh9y) into ~/.tahoe/private/root_dir.cap.

+

Prefer the command-line? Run "tahoe --help" (the same command-line tool that is used to start and stop nodes serves to navigate and use the decentralized filesystem). To make commands like "tahoe ls" work without the --dir-cap= option, you have to put a directory capability (e.g. http://127.0.0.1:8123/uri/URI%3ADIR2%3Ax2pdyrez6nemrby5jzw6lxkxye%3Arl654zxxdppmhgzvaaaxdyf6bhbzbszmmoynm3h7kzuxtlksbynq/) into ~/.tahoe/private/root_dir.cap.

As with the WUI (and with all current interfaces to Tahoe), you are responsible for remembering directory capabilities yourself. If you create a new directory and lose the capability to it, then you cannot access that directory ever again.

P.S. "CLI" is pronounced "clee".

The FUSE Extension

-

You can plug Tahoe into your computer's local filesystem using the FUSE extension, found in the contrib directory. Warning: unlike most of Tahoe, and unlike the rest of the user interfaces described on this page, the FUSE plugin doesn't have extensive unit tests that are automatically run on every check-in of the source. Therefore, we don't for sure how complete and reliable it is.

+

You can plug Tahoe into your computer's local filesystem using the FUSE extension, found in the contrib directory. Warning: unlike most of Tahoe, and unlike the rest of the user interfaces described on this page, the FUSE plugin doesn't have extensive unit tests that are automatically run on every check-in of the source. Therefore, we can't be sure how complete and reliable it is.

P.S. "FUSE" rhymes with "booze".

diff --git a/src/allmydata/encode.py b/src/allmydata/encode.py index def30526..a815a5d7 100644 --- a/src/allmydata/encode.py +++ b/src/allmydata/encode.py @@ -37,9 +37,10 @@ information necessary to validate the data upon retrieval. Only one segment is handled at a time: all blocks for segment A are delivered before any work is begun on segment B. -As blocks are created, we retain the hash of each one. The list of -block hashes for a single share (say, hash(A1), hash(B1), hash(C1)) is -used to form the base of a Merkle hash tree for that share (hashtrees[1]). +As blocks are created, we retain the hash of each one. The list of block hashes +for a single share (say, hash(A1), hash(B1), hash(C1)) is used to form the base +of a Merkle hash tree for that share, called the block hash tree. + This hash tree has one terminal leaf per block. The complete block hash tree is sent to the shareholder after all the data has been sent. At retrieval time, the decoder will ask for specific pieces of this tree before diff --git a/src/allmydata/scripts/create_node.py b/src/allmydata/scripts/create_node.py index db56d45a..24ab929f 100644 --- a/src/allmydata/scripts/create_node.py +++ b/src/allmydata/scripts/create_node.py @@ -42,9 +42,9 @@ c.setServiceParent(application) def create_client(basedir, config, out=sys.stdout, err=sys.stderr): if os.path.exists(basedir): if os.listdir(basedir): - print >>err, "The base directory already exists: %s" % basedir - print >>err, "To avoid clobbering anything, I am going to quit now" - print >>err, "Please use a different directory, or delete this one" + print >>err, "The base directory \"%s\", which is \"%s\" is not empty." % (basedir, os.path.abspath(basedir)) + print >>err, "To avoid clobbering anything, I am going to quit now." + print >>err, "Please use a different directory, or empty this one." return -1 # we're willing to use an empty directory else: @@ -67,9 +67,9 @@ def create_client(basedir, config, out=sys.stdout, err=sys.stderr): def create_introducer(basedir, config, out=sys.stdout, err=sys.stderr): if os.path.exists(basedir): if os.listdir(basedir): - print >>err, "The base directory already exists: %s" % basedir - print >>err, "To avoid clobbering anything, I am going to quit now" - print >>err, "Please use a different directory, or delete this one" + print >>err, "The base directory \"%s\", which is \"%s\" is not empty." % (basedir, os.path.abspath(basedir)) + print >>err, "To avoid clobbering anything, I am going to quit now." + print >>err, "Please use a different directory, or empty this one." return -1 # we're willing to use an empty directory else: diff --git a/src/allmydata/test/test_runner.py b/src/allmydata/test/test_runner.py index 9b0b5e0c..aa71c179 100644 --- a/src/allmydata/test/test_runner.py +++ b/src/allmydata/test/test_runner.py @@ -4,7 +4,7 @@ from twisted.trial import unittest from cStringIO import StringIO from twisted.python import usage, runtime from twisted.internet import defer -import os.path +import os.path, re from allmydata.scripts import runner from allmydata.util import fileutil, testutil @@ -31,7 +31,10 @@ class CreateNode(unittest.TestCase): rc = runner.runner(argv, stdout=out, stderr=err) self.failIfEqual(rc, 0) self.failUnlessEqual(out.getvalue(), "") - self.failUnless("The base directory already exists" in err.getvalue()) + self.failUnless("is not empty." in err.getvalue()) + + # Fail if there is a line that doesn't end with a PUNCTUATION MARK. + self.failIf(re.search("[^\.!?]\n", err.getvalue()), err.getvalue()) c2 = os.path.join(basedir, "c2") argv = ["--quiet", "create-client", c2] @@ -62,7 +65,10 @@ class CreateNode(unittest.TestCase): rc = runner.runner(argv, stdout=out, stderr=err) self.failIfEqual(rc, 0) self.failUnlessEqual(out.getvalue(), "") - self.failUnless("The base directory already exists" in err.getvalue()) + self.failUnless("is not empty" in err.getvalue()) + + # Fail if there is a line that doesn't end with a PUNCTUATION MARK. + self.failIf(re.search("[^\.!?]\n", err.getvalue()), err.getvalue()) c2 = os.path.join(basedir, "c2") argv = ["--quiet", "create-introducer", c2] diff --git a/src/allmydata/util/hashutil.py b/src/allmydata/util/hashutil.py index 412fd70a..ee2b0cd2 100644 --- a/src/allmydata/util/hashutil.py +++ b/src/allmydata/util/hashutil.py @@ -102,7 +102,7 @@ def plaintext_segment_hasher(): return tagged_hasher("allmydata_plaintext_segment_v1") def content_hash_key_hash(k, n, segsize, data): - # this is defined to return a 16-byte AES key. We use SHA-256d here.. + # This is defined to return a 16-byte AES key. We use SHA-256d here. # we'd like to use it everywhere, but we're only switching algorithms # when we can hide the compatibility breaks in other necessary changes. param_tag = netstring("%d,%d,%d" % (k, n, segsize))