From: Brian Warner Date: Tue, 3 Jun 2008 01:43:31 +0000 (-0700) Subject: docs/historical: move 'tahoe2' from wiki into source tree X-Git-Url: https://git.rkrishnan.org/components/com_hotproperty/class-simplejson.JSONEncoder-index.html?a=commitdiff_plain;h=2443ffe30537d09462b3c5ada71f705352651e15;p=tahoe-lafs%2Ftahoe-lafs.git docs/historical: move 'tahoe2' from wiki into source tree --- diff --git a/docs/historical/peer-selection-tahoe2.txt b/docs/historical/peer-selection-tahoe2.txt new file mode 100644 index 00000000..1e9a2684 --- /dev/null +++ b/docs/historical/peer-selection-tahoe2.txt @@ -0,0 +1,56 @@ += THIS PAGE DESCRIBES HISTORICAL DESIGN CHOICES. SEE docs/architecture.txt FOR CURRENT DOCUMENTATION = + +When a file is uploaded, the encoded shares are sent to other peers. But to +which ones? The PeerSelection algorithm is used to make this choice. + +Early in 2007, we were planning to use the following "Tahoe Two" algorithm. +By the time we released 0.2.0, we switched to "tahoe3", but when we released +v0.6, we switched back (ticket #132). + +As in Tahoe Three, the verifierid is used to consistently-permute the set of +all peers (by sorting the peers by HASH(verifierid+peerid)). Each file gets a +different permutation, which (on average) will evenly distribute shares among +the grid and avoid hotspots. + +With our basket of (usually 10) shares to distribute in hand, we start at the +beginning of the list and ask each peer in turn if they are willing to hold +on to one of our shares (the "lease request"). If they say yes, we remove +that share from the basket and remember who agreed to host it. Then we go to +the next peer in the list and ask them the same question about another share. +If a peer says no, we remove them from the list. If a peer says that they +already have one or more shares for this file, we remove those shares from +the basket. If we reach the end of the list, we start again at the beginning. +If we run out of peers before we run out of shares, we fail unless we've +managed to place at least some number of the shares: the likely threshold is +to attempt to place 10 shares (out of which we'll need 3 to recover the +file), and be content if we can find homes for at least 7 of them. + +In small networks, this approach will loop around several times and place +several shares with each node (e.g. in a 5-host network with plenty of space, +each node will get 2 shares). In large networks with plenty of space, the +shares will be placed with the first 10 peers in the permuted list. In large +networks that are somewhat full, we'll need to traverse more of the list +before we find homes for the shares. The average number of peers that we'll +need to talk to is vaguely equal to 10 / (1-utilization), with a bunch of +other terms that relate to the distribution of free space on the peers and +the size of the shares being offered. Small files with small shares will fit +anywhere, large files with large shares will only fit on certain peers, so +the mesh may have free space but no holes large enough for a very large file, +which might indicate that we should try again with a larger number of +(smaller) shares. + +When it comes time to download, we compute a similar list of permuted +peerids, and start asking for shares beginning with the start of the list. +Each peer gives us a list of the shareids that they are holding. Eventually +(depending upon how much churn the peerlist has experienced), we'll find +holders for at least 3 shares, or we'll run out of peers. If the mesh is very +large and we want to fail faster, we can establish an upper bound on how many +peers we should talk to (perhaps by recording the permuted peerid of the last +node to which we sent a share, or a count of the total number of peers we +talked to during upload). + +I suspect that this approach handles churn more efficiently than tahoe3, but +I haven't gotten my head around the math that could be used to show it. On +the other hand, it takes a lot more round trips to find homes in small meshes +(one per share, whereas tahoe three can do just one per node). + diff --git a/docs/historical/peer-selection.txt b/docs/historical/peer-selection.txt index 0d16c3bb..c5c80c02 100644 --- a/docs/historical/peer-selection.txt +++ b/docs/historical/peer-selection.txt @@ -8,11 +8,12 @@ where the peers involved would check up on each other to make sure the data was still available. The big limitation was the expense of tracking which nodes were parts of which cabals. -Formerly (until v0.6, ticket #132), we used the "tahoe3" algorithm (see -peer-selection-tahoe3.txt), but now we use the "tahoe2" algorithm (see the -PEER SELECTION section of docs/architecture.txt), which uses a permuted -peerid list and packs the shares into the first 10 or so members of this -list. (It is named "tahoe2" because it was designed before "tahoe3" was.) +When we release 0.2.0, we used the "tahoe3" algorithm (see +peer-selection-tahoe3.txt), but in v0.6 (ticket #132) we switched back to +"tahoe2" (see the peer-selection-tahoe2.txt, and the PEER SELECTION section +of docs/architecture.txt), which uses a permuted peerid list and packs the +shares into the first 10 or so members of this list. (It is named "tahoe2" +because it was designed before "tahoe3" was.) In the future, we might move to an algorithm known as "denver airport", which uses Chord-like routing to minimize the number of active connections.