From: Zooko O'Whielacronx Date: Tue, 9 Apr 2013 05:33:42 +0000 (-0600) Subject: convergence secret doc by CtB, marlowe, zooko X-Git-Tag: allmydata-tahoe-1.10b1~18 X-Git-Url: https://git.rkrishnan.org/specifications/components/status?a=commitdiff_plain;h=07f7d50afa74635dfce955e8eff275117e2333a6;p=tahoe-lafs%2Ftahoe-lafs.git convergence secret doc by CtB, marlowe, zooko --- diff --git a/docs/convergence-secret.rst b/docs/convergence-secret.rst new file mode 100644 index 00000000..8ddaa4b7 --- /dev/null +++ b/docs/convergence-secret.rst @@ -0,0 +1,75 @@ + + +What Is It? +----------- + +The identifer of a file (also called the "capability" to a file) is derived +from two pieces of information when the file is uploaded: the content of the +file and the upload node's "convergence secret". By default, the convergence +secret is randomly generated by the node when it first starts up, then stored +in the node's base directory (/private/convergence) and +re-used after that. So the same file content uploaded from the same node will +always have the same cap. Uploading the file from a different node with a +different convergence secret would result in a different cap—and in a second +copy of the file's contents stored on the grid. If you want files you upload +to converge (also known as "deduplicate") with files uploaded by someone +else, just make sure you're using the same convergence secret when you upload +files as they + +The advantages of deduplication should be clear, but keep in mind that the +convergence secret was created to protect confidentiality. There are two +attacks that can be used against you by someone who knows the convergence +secret you use. + +The first one is called the "Confirmation-of-a-File Attack". Someone who +knows the convergence secret that you used when you uploaded a file, and who +has a copy of that file themselves, can check whether you have a copy of that +file. This is usually not a problem, but it could be if that file is, for +example, a book or movie that is banned in your country. + +The second attack is more subtle. It is called the +"Learn-the-Remaining-Information Attack". Suppose you've received a +confidential document, such as a PDF from your bank which contains many pages +of boilerplate text as well as containing your bank account number and +balance. Someone who knows your convergence secret can generate a file with +all of the boilerplate text (perhaps they would open an account with the same +bank so they receive the same document with their account number and +balance). Then they can try a "brute force search" to find your account +number and your balance. + +The defense against these attacks is that only someone who knows the +convergence secret that you used on each file can perform these attacks on +that file. + +Both of these attacks and the defense are described in more detail in `Drew +Perttula's Hack Tahoe-LAFS Hall Of Fame entry`_ + +.. _`Drew Perttula's Hack Tahoe-LAFS Hall Of Fame entry`: + https://tahoe-lafs.org/hacktahoelafs/drew_perttula.html + +What If I Change My Convergence Secret? +--------------------------------------- + +All your old file capabilities will still work, but the new data that you +upload will not be deduplicated with the old data. If you upload all of the +same things to the grid, you will end up using twice the space until garbage +collection kicks in (if it's enabled). Changing the convergence secret that a +node uses for uploads can be though of as moving the node to a new +"deduplication domain". + +How To Use It +------------- + +To enable deduplication between different clients, **securely** copy the +convergence secret file from one client to all the others. + +For example, if you are on host A and have an account on host B and you have +scp installed, run: + + *scp ~/.tahoe/private/convergence + my_other_account@B:.tahoe/private/convergence* + +If you have two different nodes on a single computer, say one for each disk, +you would do: + + *cp /tahoe1/private/convergence /tahoe2/private/convergence*