From: Shawn Willden Date: Tue, 28 Jul 2009 22:44:30 +0000 (-0700) Subject: Lossmodel updates X-Git-Tag: allmydata-tahoe-1.5.0~9 X-Git-Url: https://git.rkrishnan.org/simplejson/components/flags/?a=commitdiff_plain;h=3782c27ac55e903e72f9d1e5a3c39dc77052c129;p=tahoe-lafs%2Ftahoe-lafs.git Lossmodel updates Various improvements to the lossmodel, plus addition of README.lossmodel that provides a link to the PDF. --- diff --git a/docs/proposed/README.lossmodel b/docs/proposed/README.lossmodel new file mode 100644 index 00000000..068affd7 --- /dev/null +++ b/docs/proposed/README.lossmodel @@ -0,0 +1,28 @@ +The lossmodel.lyx file is the source document for an in-progress paper +that analyzes the probability of losing files stored in a Tahoe +Least-acces File System under various scenarios. It describes: + +1. How to estimate peer reliabilities, based on peer MTBF failure +data. + +2. How to compute file loss probabilities, based on a given set of +shares stored on peers with estimated reliabilities. The peer +reliabilities do not have to be uniform, and the model takes into +account the file repair process. + +3. How to estimate Tahoe parameters for k (shares needed), n (shares +distributed) and A (repair interval) to achieve a file reliability +target. + +4. How to compute the estimated repair cost over time, discounted at +a fixed rate, of maintaining a file for a time period T. + +Future work will also address the latter three issues in the context +of "non-aggressive" repair, where repair will only be performed if +too many shares are lost, and it will also extend the repair cost +estimation model to suggest cost functions appropriate for common +network architectures. + +A PDF of the current version of the file may be downloaded from: + + http://willden.org/~shawn/lossmodel.pdf \ No newline at end of file diff --git a/docs/proposed/lossmodel.lyx b/docs/proposed/lossmodel.lyx index 63ecfc8f..77f62c03 100644 --- a/docs/proposed/lossmodel.lyx +++ b/docs/proposed/lossmodel.lyx @@ -1,4 +1,4 @@ -#LyX 1.6.1 created this file. For more info see http://www.lyx.org/ +#LyX 1.6.2 created this file. For more info see http://www.lyx.org/ \lyxformat 345 \begin_document \begin_header @@ -56,7 +56,7 @@ Shawn Willden \end_layout \begin_layout Date -01/14/2009 +07/22/2009 \end_layout \begin_layout Address @@ -81,8 +81,8 @@ The allmydata Tahoe distributed file system uses Reed-Solomon erasure coding \begin_inset Formula $N$ \end_inset - shares, each of which is then delivered to a randomly-selected peer in - a distributed network. + shares which are delivered to randomly-selected peers in a distributed + network. The file can later be reassembled from any \begin_inset Formula $k\leq N$ \end_inset @@ -112,7 +112,7 @@ s the missing ones, bringing the availability back up to full. \end_layout \begin_layout Standard -The question we're trying to answer is "What's the probability that we'll +The question we're trying to answer is "What is the probability that we'll be able to reassemble the file at some later time \begin_inset Formula $T$ \end_inset @@ -136,11 +136,11 @@ The question we're trying to answer is "What's the probability that we'll \end_inset in order to ensure -\begin_inset Formula $Pr[loss]\leq t$ +\begin_inset Formula $Pr[loss]\leq r$ \end_inset for some threshold probability -\begin_inset Formula $t$ +\begin_inset Formula $r$ \end_inset . @@ -149,45 +149,33 @@ The question we're trying to answer is "What's the probability that we'll \begin_inset Formula $Pr[loss]$ \end_inset - by choosing small -\begin_inset Formula $k,$ -\end_inset - - large -\begin_inset Formula $N$ -\end_inset - -, small -\begin_inset Formula $A$ -\end_inset - -, and setting -\begin_inset Formula $L=N$ -\end_inset - -, these choices have costs. + by selecting conservative parameters, these choices have costs. The peer storage and bandwidth consumed by the share distribution process are approximately \begin_inset Formula $\nicefrac{N}{k}$ \end_inset - times the size of the original file, so we would like to reduce this ratio - as far as possible consistent with -\begin_inset Formula $Pr[loss]\leq t$ + times the size of the original file, so we would like to minimize +\begin_inset Formula $\nicefrac{N}{k}$ +\end_inset + +, consistent with +\begin_inset Formula $Pr[loss]\leq r$ \end_inset . - Likewise, frequent and aggressive repair process can be used to ensure - that the number of shares available at any time is very close to + Likewise, a frequent and aggressive repair process keeps the number of + shares available close to \begin_inset Formula $N,$ \end_inset - but at a cost in bandwidth as the repair agent downloads + but at a cost in bandwidth and processing time as the repair agent downloads + \begin_inset Formula $k$ \end_inset - shares to reconstruct the file and uploads new shares to replace those - that are lost. + shares, reconstructs the file and uploads new shares to replace those that + are lost. \end_layout \begin_layout Section @@ -198,13 +186,13 @@ Reliability The probability that the file becomes unrecoverable is dependent upon the probability that the peers to whom we send shares are able to return those copies on demand. - Shares that are returned in corrupted form can be detected and discarded, - so there is no need to distinguish between corruption and loss. + Shares that are corrupted are detected and discarded, so there is no need + to distinguish between corruption and loss. \end_layout \begin_layout Standard -There are a large number of factors that affect share availability. - Availability can be temporarily interrupted by peer unavailability, due +Many factors affect share availability. + Availability can be temporarily interrupted by peer unavailability due to network outages, power failures or administrative shutdown, among other reasons. Availability can be permanently lost due to failure or corruption of storage @@ -238,12 +226,12 @@ Another consideration is that some failures affect multiple shares. \end_layout \begin_layout Standard -While the types of failures that may occur are pretty consistent across - even very different peers, their probabilities differ dramatically. - A professionally-administered blade server with redundant storage, power - and Internet located in a carefully-monitored data center with automatic - fire suppression systems is much less likely to become either temporarily - or permanently unavailable than the typical virus and malware-ridden home +While the types of failures that may occur are quite consistent across peers, + their probabilities differ dramatically. + A professionally-administered server with redundant storage, power and + Internet located in a carefully-monitored data center with automatic fire + suppression systems is much less likely to become either temporarily or + permanently unavailable than the typical virus and malware-ridden home computer on a single cable modem connection. A variety of situations in between exist as well, such as the case of the author's home file server, which is administered by an IT professional @@ -268,7 +256,7 @@ Reliability \begin_inset Formula $s_{i}$ \end_inset - will surve to (be retrievable at) time + will survive to (be retrievable at) time \begin_inset Formula $T=A$ \end_inset @@ -278,8 +266,12 @@ Reliability any lost shares. \end_layout -\begin_layout Definition -Reliability is clearly dependent on +\begin_layout Standard +Reliability +\begin_inset Formula $p_{i}$ +\end_inset + + is clearly dependent on \begin_inset Formula $A$ \end_inset @@ -296,7 +288,181 @@ decay \end_layout \begin_layout Subsection -Fixed Reliability +Peer Reliability +\end_layout + +\begin_layout Standard +Since peer reliability is the basis for any computations we may do on share + and file reliability, we must have a way to estimate it. + Reliability modeling of hardware, software and human performance are each + complex topics, the subject of much ongoing research. + In particular, the reliability of one of the key components of any peer + from our perspective -- the hard drive where file shares are stored -- + is the subject of much current debate. +\end_layout + +\begin_layout Standard +A common assumption about hardware failure is that it follows the +\begin_inset Quotes eld +\end_inset + +bathtub curve +\begin_inset Quotes erd +\end_inset + +, with frequent failures during the first few months, a constant failure + rate for a few years and then a rising failure rate as the hardware wears + out. + This curve is often flattened by burn-in stress testing, and by periodic + replacement that assures that in-service components never reach +\begin_inset Quotes eld +\end_inset + +old age +\begin_inset Quotes erd +\end_inset + +. +\end_layout + +\begin_layout Standard +In any case, we're generally going to ignore all of that complexity and + focus on the bottom of the bathtub, assuming constant failure rates. + This is a particularly reasonable assumption as long as we're focused on + failures during a particular, relatively short interval +\begin_inset Formula $A$ +\end_inset + +. + Towards the end of this paper, as we examine failures over many repair + intervals, the assumption becomes more tenuous, and we note some of the + issues. +\end_layout + +\begin_layout Subsubsection +Estimate Adaptation +\end_layout + +\begin_layout Standard +Even assuming constant failure rates, however, it will be rare that the + duration of +\begin_inset Formula $A$ +\end_inset + + coincides with the available failure rate data, particularly since we want + to view +\begin_inset Formula $A$ +\end_inset + + as a tunable parameter. + It's necessary to be able adapt failure rates baselined against any given + duration to the selected value of +\begin_inset Formula $A$ +\end_inset + +. +\end_layout + +\begin_layout Standard +Another issue is that failure rates of hardware, etc., are necessarily continuous + in nature, while the per-interval failure/survival rates that are of interest + for file reliability calculations are discrete -- a peer either survives + or fails during the interval. + The continuous nature of failure rates means that the common and obvious + methods for estimating failure rates result in values that follow continuous, + not discrete distributions. + The difference is minor for small failure probabilities, and converges + to zero as the number of intervals goes to infinity, but is important enough + in some cases to be worth correcting for. +\end_layout + +\begin_layout Standard +Continuous failure rates are described in terms of mean time to failure, + and under the assumption that failure rates are constant, are exponentially + distributed. + Under these assumptions, the probability that a machine fails at time +\begin_inset Formula $t$ +\end_inset + +, is +\begin_inset Formula \[ +f\left(t\right)=\lambda e^{-\lambda t}\] + +\end_inset + +where +\begin_inset Formula $\lambda$ +\end_inset + + represents the per unit-time failure rate. + The probability that a machine fails at or before time +\begin_inset Formula $A$ +\end_inset + + is therefore +\begin_inset Formula \begin{align} +F\left(t\right) & =\int_{0}^{A}f\left(x\right)dx\nonumber \\ + & =\int_{0}^{A}\lambda e^{-\lambda x}dx\nonumber \\ + & =1-e^{-\lambda A}\label{eq:failure-time}\end{align} + +\end_inset + + +\end_layout + +\begin_layout Standard +Note that +\begin_inset Formula $A$ +\end_inset + + and +\begin_inset Formula $\lambda$ +\end_inset + + in +\begin_inset CommandInset ref +LatexCommand ref +reference "eq:failure-time" + +\end_inset + + must be expressed in consistent time units. + If they're different, unit conversions should be applied in the normal + way. + For example, if the estimate for +\begin_inset Formula $\lambda$ +\end_inset + + is 750 failures per million hours, and +\begin_inset Formula $A$ +\end_inset + + is one month, then either +\begin_inset Formula $A$ +\end_inset + + should be represented as +\begin_inset Formula $30\cdot24/1000000=.00072$ +\end_inset + +, or +\begin_inset Formula $\lambda$ +\end_inset + + should be converted to failures per month. + Or both may be converted to hours. +\end_layout + +\begin_layout Subsubsection +Acquiring Peer Reliability Estimates +\end_layout + +\begin_layout Standard +Need to write this. +\end_layout + +\begin_layout Subsection +Uniform Reliability \begin_inset CommandInset label LatexCommand label name "sub:Fixed-Reliability" @@ -323,8 +489,8 @@ In the simplest case, the peers holding the file shares all have the same \end_inset . - Each share's survival can be viewed as an indepedent Bernoulli trial with - a succes probability of + Each share's survival can be viewed as an independent Bernoulli trial with + a success probability of \begin_inset Formula $p$ \end_inset @@ -332,7 +498,7 @@ In the simplest case, the peers holding the file shares all have the same \begin_inset Formula $K$ \end_inset - follows the binomial distribution with paramaters + follows the binomial distribution with parameters \begin_inset Formula $N$ \end_inset @@ -377,7 +543,15 @@ A Bernoulli trial is simply a test of some sort that results in one of two \begin_inset Formula $K$ \end_inset - be a random variable that represents the number of successes. + be a random variable that represents the number, +\begin_inset Formula $m$ +\end_inset + +, of successes, +\begin_inset Formula $0\le m\le n$ +\end_inset + +. We say that \begin_inset Formula $K$ \end_inset @@ -387,7 +561,8 @@ A Bernoulli trial is simply a test of some sort that results in one of two \end_inset . - The probability that + The probability mass function (PMF) of K is a function that gives the probabili +ty that \begin_inset Formula $K$ \end_inset @@ -403,9 +578,10 @@ A Bernoulli trial is simply a test of some sort that results in one of two \begin_inset Formula $n-m$ \end_inset - failures) is called the probability mass function and is given by: + failures). + The PMF of K is \begin_inset Formula \begin{equation} -Pr[K=m]=f(m;n,p)=\binom{n}{p}p^{m}(1-p)^{n-m}\label{eq:binomial-pmf}\end{equation} +Pr[K=m]=f(m;n,p)=\binom{n}{m}p^{m}(1-p)^{n-m}\label{eq:binomial-pmf}\end{equation} \end_inset @@ -455,7 +631,8 @@ Now consider any reordering of these Since multiplication is commutative, each such reordering has the same probability. There are n-choose-m such orderings, and each ordering is an independent - event, so the probability that any ordering of + event, meaning we can sum the probabilities of the individual orderings, + so the probability that any ordering of \begin_inset Formula $m$ \end_inset @@ -544,7 +721,7 @@ reference "eq:simple-failure" \end_inset - assumes that each share has the same probability of survival, but as explained + assumes that all shares have the same probability of survival, but as explained above, this is not necessarily true. A more accurate model allows each share \begin_inset Formula $s_{i}$ @@ -570,11 +747,7 @@ reference "eq:simple-failure" \end_inset and -\begin_inset Formula $p_{i}$ -\end_inset - - where -\begin_inset Formula $1\leq i\leq N$ +\begin_inset Formula $p_{1},p_{2},\dots,p_{N}$ \end_inset . @@ -589,7 +762,7 @@ The PMF for this generalized However, the PMFs for random variables representing individual share survival do. Let -\begin_inset Formula $S_{i}$ +\begin_inset Formula $K_{i}$ \end_inset be a random variable such that: @@ -597,7 +770,7 @@ The PMF for this generalized \begin_layout Standard \begin_inset Formula \[ -S_{i}=\begin{cases} +K_{i}=\begin{cases} 1 & \textnormal{if }s_{i}\textnormal{ survives}\\ 0 & \textnormal{if }s_{i}\textnormal{ fails}\end{cases}\] @@ -608,14 +781,20 @@ S_{i}=\begin{cases} \begin_layout Standard The PMF for -\begin_inset Formula $S_{i}$ +\begin_inset Formula $K_{i}$ \end_inset is very simple: \begin_inset Formula \[ -Pr[S_{i}=j]=\begin{cases} -1-p_{i} & j=0\\ -p_{i} & j=1\end{cases}\] +Pr[K_{i}=j]=\begin{cases} +p_{i} & j=1\\ +1-p_{i} & j=0\end{cases}\] + +\end_inset + + which can also be expressed as +\begin_inset Formula \[ +Pr[K_{i}=j]=f\left(j\right)=\left(1-p_{i}\right)\left(1-j\right)+p_{i}\left(j\right)\] \end_inset @@ -624,7 +803,7 @@ p_{i} & j=1\end{cases}\] \begin_layout Standard Note that since each -\begin_inset Formula $S_{i}$ +\begin_inset Formula $K_{i}$ \end_inset represents the count of shares @@ -635,16 +814,15 @@ Note that since each counts, we get the group survivor count. That is: \begin_inset Formula \[ -\sum_{i=1}^{N}S_{i}=K\] +\sum_{i=1}^{N}K_{i}=K\] \end_inset -Effectively, +Effectively, we have separated \begin_inset Formula $K$ \end_inset - has just been separated into the series of Bernoulli trials that make it - up. + into the series of Bernoulli trials that make it up. \end_layout \begin_layout Theorem @@ -709,54 +887,38 @@ where \begin_layout Proof The proof is beyond the scope of this paper. -\begin_inset Foot -status collapsed - -\begin_layout Plain Layout -\begin_inset Quotes eld -\end_inset +\end_layout -Beyond the scope of this paper -\begin_inset Quotes erd +\begin_layout Standard +If we denote the PMF of +\begin_inset Formula $K$ \end_inset - usually means -\begin_inset Quotes eld + with +\begin_inset Formula $f$ \end_inset -Too long and nasty to bore you with -\begin_inset Quotes erd + and the PMF of +\begin_inset Formula $K_{i}$ \end_inset -. - In this case it means -\begin_inset Quotes eld + with +\begin_inset Formula $g_{i}$ \end_inset -The author hasn't the foggiest idea why this is true, or how to prove it, - but reliable authorities say it's real, and in practice it works a treat. -\begin_inset Quotes erd + (more formally, +\begin_inset Formula $Pr[K=x]=f(x)$ \end_inset - -\end_layout - + and +\begin_inset Formula $Pr[K_{i}=x]=g_{i}(x)$ \end_inset - If you don't believe it's true, look it up on Wikipedia, which is never - wrong. -\end_layout - -\begin_layout Standard -Applying the discrete convolution theorem, if -\begin_inset Formula $Pr[K=i]=f(i)$ +) then since +\begin_inset Formula $K=\sum_{i=1}^{N}K_{i}$ \end_inset - and -\begin_inset Formula $Pr[S_{i}=j]=g_{i}(j)$ -\end_inset - -, then +, according to the discrete convolution theorem \begin_inset Formula $f=g_{1}\star g_{2}\star g_{3}\star\ldots\star g_{N}$ \end_inset @@ -777,7 +939,7 @@ Therefore, can be computed as a sequence of convolution operations on the simple PMFs of the random variables -\begin_inset Formula $S_{i}$ +\begin_inset Formula $K_{i}$ \end_inset . @@ -796,8 +958,13 @@ reference "eq:convolution" \begin_inset Formula $K$ \end_inset - even in the case of the standard binomial distribution, primarily because - the binomial calculation in equation + than the binomial theorem. + even in the case of shares with identical survival probability. + The reason it's better is because the calculation of +\begin_inset Formula $\binom{n}{m}$ +\end_inset + + in equation \begin_inset CommandInset ref LatexCommand ref reference "eq:binomial-pmf" @@ -811,24 +978,20 @@ reference "eq:binomial-pmf" \begin_layout Standard Note also that it is not necessary to have very simple PMFs like those of the -\begin_inset Formula $S_{i}$ +\begin_inset Formula $K_{i}$ \end_inset . Any share or set of shares that has a known PMF can be combined with any other set with a known PMF by convolution, as long as the two share sets are independent. - Since PMFs are easily represented as simple lists of probabilities, where - the -\begin_inset Formula $i$ -\end_inset - -th element in the list corresponds to -\begin_inset Formula $Pr[K=i]$ + The reverse holds as well; given a group with an empirically-derived PMF, + in it's theoretically possible to solve for an individual PMF, and thereby + determine +\begin_inset Formula $p_{i}$ \end_inset -, these functions are easily managed in software, and computing the convolution - is both simple and efficient. + even when per-share data is unavailable. \end_layout \begin_layout Subsection @@ -845,8 +1008,8 @@ name "sub:Multiple-Failure-Modes" \begin_layout Standard In modeling share survival probabilities, it's useful to be able to analyze separately each of the various failure modes. - If reliable statistics for disk failure can be obtained, then a probability - mass function for that form of failure can be generated. + For example, if reliable statistics for disk failure can be obtained, then + a probability mass function for that form of failure can be generated. Similarly, statistics on other hardware failures, administrative errors, network losses, etc., can all be estimated independently. If those estimates can then be combined into a single PMF for a share, @@ -873,7 +1036,7 @@ th failure mode of share , then \begin_inset Formula \[ -Pr[S_{i}=k]=f_{i}(k)=\begin{cases} +Pr[K_{i}=k]=f_{i}(k)=\begin{cases} \prod_{j=1}^{m}p_{i,j} & k=1\\ 1-\prod_{j=1}^{m}p_{i,j} & k=0\end{cases}\] @@ -918,12 +1081,12 @@ If there are failure modes that affect multiple computers, we can also construct \begin_inset Formula $K$ \end_inset -, a random variable representing the number of surviors is +, a random variable representing the number of survivors is \begin_inset Formula \[ -Pr[K=i]=f(i)=\begin{cases} -p & i=n\\ +Pr[K=k]=f(k)=\begin{cases} +p & k=n\\ 0 & 0