From: Shawn Willden Date: Thu, 15 Jan 2009 04:00:58 +0000 (-0700) Subject: Loss model work (temp1) X-Git-Tag: allmydata-tahoe-1.4.0~227 X-Git-Url: https://git.rkrishnan.org/pf/reliability?a=commitdiff_plain;h=d65b0ff91138c4134e2d5b30732c96454eacfd1f;p=tahoe-lafs%2Ftahoe-lafs.git Loss model work (temp1) --- diff --git a/docs/lossmodel.lyx b/docs/lossmodel.lyx index d2d852bf..8a438e55 100644 --- a/docs/lossmodel.lyx +++ b/docs/lossmodel.lyx @@ -20,6 +20,7 @@ theorems-ams-extended \font_tt_scale 100 \graphics default +\float_placement h \paperfontsize default \spacing single \use_hyperref false @@ -54,6 +55,14 @@ Tahoe Distributed Filesharing System Loss Model Shawn Willden \end_layout +\begin_layout Date +01/14/2009 +\end_layout + +\begin_layout Address +South Weber, Utah +\end_layout + \begin_layout Email shawn@willden.org \end_layout @@ -91,11 +100,11 @@ Over time shares are lost for a variety of reasons. and determines how many of the shares remain. If less than -\begin_inset Formula $R$ +\begin_inset Formula $L$ \end_inset ( -\begin_inset Formula $k\leq R\leq N$ +\begin_inset Formula $k\leq L\leq N$ \end_inset ) shares remain, then the repairer reconstructs the file shares and redistribute @@ -123,7 +132,7 @@ The question we're trying to answer is "What's the probability that we'll \end_inset , and -\begin_inset Formula $R$ +\begin_inset Formula $L$ \end_inset in order to ensure @@ -153,7 +162,7 @@ The question we're trying to answer is "What's the probability that we'll \end_inset , and setting -\begin_inset Formula $R=N$ +\begin_inset Formula $L=N$ \end_inset , these choices have costs. @@ -173,7 +182,12 @@ The question we're trying to answer is "What's the probability that we'll \begin_inset Formula $N,$ \end_inset - but at a cost in bandwidth. + but at a cost in bandwidth as the repair agent downloads +\begin_inset Formula $k$ +\end_inset + + shares to reconstruct the file and uploads new shares to replace those + that are lost. \end_layout \begin_layout Section @@ -283,6 +297,13 @@ decay \begin_layout Subsection Fixed Reliability +\begin_inset CommandInset label +LatexCommand label +name "sub:Fixed-Reliability" + +\end_inset + + \end_layout \begin_layout Standard @@ -319,18 +340,143 @@ In the simplest case, the peers holding the file shares all have the same \begin_inset Formula $p$ \end_inset - ( +. + That is, \begin_inset Formula $K\sim B(N,p)$ \end_inset -). - The probability mass function (PMF) of the binomial distribution is: +. +\end_layout + +\begin_layout Theorem +Binomial Distribution Theorem +\end_layout + +\begin_layout Theorem +Consider +\begin_inset Formula $n$ +\end_inset + + independent Bernoulli trials +\begin_inset Foot +status collapsed + +\begin_layout Plain Layout +A Bernoulli trial is simply a test of some sort that results in one of two + outcomes, one of which is designated success and the other failure. + The classic example of a Bernoulli trial is a coin toss. +\end_layout + +\end_inset + + that succeed with probability +\begin_inset Formula $p$ +\end_inset + +, and let +\begin_inset Formula $K$ +\end_inset + + be a random variable that represents the number of successes. + We say that +\begin_inset Formula $K$ +\end_inset + + follows the Binomial Distribution with parameters n and p, denoted +\begin_inset Formula $K\sim B(n,p)$ +\end_inset + +. + The probability that +\begin_inset Formula $K$ +\end_inset + + takes a particular value +\begin_inset Formula $m$ +\end_inset + + (the probability that there are exactly +\begin_inset Formula $m$ +\end_inset + + successful trials, and therefore +\begin_inset Formula $n-m$ +\end_inset + + failures) is called the probability mass function and is given by: \begin_inset Formula \begin{equation} -Pr(K=i)=f(i;N,p)=\binom{n}{i}p^{i}(1-p)^{n-i}\label{eq:binomial-pdf}\end{equation} +Pr[K=m]=f(m;n,p)=\binom{n}{p}p^{m}(1-p)^{n-m}\label{eq:binomial-pmf}\end{equation} + +\end_inset + + +\end_layout + +\begin_layout Proof +Consider the specific case of exactly +\begin_inset Formula $m$ +\end_inset + + successes followed by +\begin_inset Formula $n-m$ +\end_inset + + failures, because each success has probability +\begin_inset Formula $p$ +\end_inset + +, each failure has probability +\begin_inset Formula $1-p$ +\end_inset +, and the trials are independent, the probability of this exact case occurring + is +\begin_inset Formula $p^{m}\left(1-p\right)^{\left(n-m\right)}$ \end_inset +, the product of the probabilities of the outcome of each trial. +\end_layout + +\begin_layout Proof +Now consider any reordering of these +\begin_inset Formula $m$ +\end_inset + + successes and +\begin_inset Formula $n$ +\end_inset + + failures. + Any such reordering occurs with the same probability +\begin_inset Formula $p^{m}\left(1-p\right)^{\left(n-m\right)}$ +\end_inset + +, but with the terms of the product reordered. + Since multiplication is commutative, each such reordering has the same + probability. + There are n-choose-m such orderings, and each ordering is an independent + event, so the probability that any ordering of +\begin_inset Formula $m$ +\end_inset + + successes and +\begin_inset Formula $n-m$ +\end_inset + + failures occurs is given by +\begin_inset Formula \[ +\binom{n}{m}p^{m}\left(1-p\right)^{\left(n-m\right)}\] + +\end_inset + +which is the right-hand-side of equation +\begin_inset CommandInset ref +LatexCommand ref +reference "eq:binomial-pmf" +\end_inset + +. \end_layout \begin_layout Standard @@ -346,7 +492,7 @@ A file survives if at least Equation \begin_inset CommandInset ref LatexCommand ref -reference "eq:binomial-pdf" +reference "eq:binomial-pmf" \end_inset @@ -354,7 +500,11 @@ reference "eq:binomial-pdf" \begin_inset Formula $i$ \end_inset - shares survive, so the probability that fewer than + shares survive, for any +\begin_inset Formula $1\leq i\leq n$ +\end_inset + +, so the probability that fewer than \begin_inset Formula $k$ \end_inset @@ -368,7 +518,7 @@ reference "eq:binomial-pdf" \begin_layout Standard \begin_inset Formula \begin{equation} -Pr[failure]=\sum_{i=0}^{k-1}\binom{n}{i}p^{i}(1-p)^{n-i}\label{eq:simple-failure}\end{equation} +Pr[file\, lost]=\sum_{i=0}^{k-1}\binom{n}{i}p^{i}(1-p)^{n-i}\label{eq:simple-failure}\end{equation} \end_inset @@ -377,6 +527,13 @@ Pr[failure]=\sum_{i=0}^{k-1}\binom{n}{i}p^{i}(1-p)^{n-i}\label{eq:simple-failure \begin_layout Subsection Independent Reliability +\begin_inset CommandInset label +LatexCommand label +name "sub:Independent-Reliability" + +\end_inset + + \end_layout \begin_layout Standard @@ -388,7 +545,7 @@ reference "eq:simple-failure" \end_inset assumes that each share has the same probability of survival, but as explained - above, this is not typically true. + above, this is not necessarily true. A more accurate model allows each share \begin_inset Formula $s_{i}$ \end_inset @@ -408,12 +565,16 @@ reference "eq:simple-failure" \begin_inset Formula $K$ \end_inset - follows a generalized distribution with parameters + follows a generalized binomial distribution with parameters \begin_inset Formula $N$ \end_inset and -\begin_inset Formula $p_{i},1\leq i\leq N$ +\begin_inset Formula $p_{i}$ +\end_inset + + where +\begin_inset Formula $1\leq i\leq N$ \end_inset . @@ -447,27 +608,38 @@ S_{i}=\begin{cases} \begin_layout Standard The PMF for -\begin_inset Formula $Si$ +\begin_inset Formula $S_{i}$ \end_inset - is very simple, -\begin_inset Formula $Pr(S_{i}=1)=p_{i}$ -\end_inset + is very simple: +\begin_inset Formula \[ +Pr[S_{i}=j]=\begin{cases} +1-p_{i} & j=0\\ +p_{i} & j=1\end{cases}\] - and -\begin_inset Formula $Pr(S_{i}=0)=p_{i}$ \end_inset -. + \end_layout \begin_layout Standard -Observe that -\begin_inset Formula $\sum_{i=1}^{N}S_{i}=K$ +Note that since each +\begin_inset Formula $S_{i}$ \end_inset -. - Effectively, + represents the count of shares +\begin_inset Formula $s_{i}$ +\end_inset + + that survives (either 0 or 1), if we add up all of the individual survivor + counts, we get the group survivor count. + That is: +\begin_inset Formula \[ +\sum_{i=1}^{N}S_{i}=K\] + +\end_inset + +Effectively, \begin_inset Formula $K$ \end_inset @@ -475,8 +647,12 @@ Observe that up. \end_layout -\begin_layout Standard -The discrete convolution theorem states that given random variables +\begin_layout Theorem +Discrete Convolution Theorem +\end_layout + +\begin_layout Theorem +Let \begin_inset Formula $X$ \end_inset @@ -484,52 +660,55 @@ The discrete convolution theorem states that given random variables \begin_inset Formula $Y$ \end_inset - and their sum -\begin_inset Formula $Z=X+Y$ + be discrete random variables with probability mass functions given by +\begin_inset Formula $Pr\left[X=x\right]=f(x)$ \end_inset -, if -\begin_inset Formula $Pr[X=x]=f(x)$ + and +\begin_inset Formula $Pr\left[Y=y\right]=g(y).$ \end_inset - and -\begin_inset Formula $Pr[Y=y]=f(y)$ + Let +\begin_inset Formula $Z$ \end_inset - then -\begin_inset Formula $Pr[Z=z]=(f\star g)(z)$ + be the discrete random random variable obtained by summing +\begin_inset Formula $X$ \end_inset - where -\begin_inset Formula $\star$ + and +\begin_inset Formula $Y$ \end_inset - denotes the convolution operation. - Stated in English, the probability mass function of the sum of two random - variables is the convolution of the probability mass functions of the two - random variables. +. \end_layout -\begin_layout Standard -Discrete convolution is defined as -\end_layout +\begin_layout Theorem +The probability mass function of +\begin_inset Formula $Z$ +\end_inset -\begin_layout Standard + is given by \begin_inset Formula \[ -(f\star g)(n)=\sum_{m=-\infty}^{\infty}f(m)\cdot g(n-m)\] +Pr[Z=z]=h(z)=\left(f\star g\right)(z)\] \end_inset +where +\begin_inset Formula $\star$ +\end_inset + + denotes the discrete convolution operation: +\begin_inset Formula \[ +\left(f\star g\right)\left(n\right)=\sum_{m=-\infty}^{\infty}f\left(m\right)g\left(m-n\right)\] + +\end_inset -\end_layout -\begin_layout Standard -The infinite summation is no problem because the probability mass functions - we need to convolve are zero outside of a small range. \end_layout \begin_layout Standard -According to the discrete convolution theorem, then, if +Applying to the discrete convolution theorem, if \begin_inset Formula $Pr[K=i]=f(i)$ \end_inset @@ -538,10 +717,6 @@ According to the discrete convolution theorem, then, if \end_inset , then -\begin_inset Formula $ $ -\end_inset - - \begin_inset Formula $f=g_{1}\star g_{2}\star g_{3}\star\ldots\star g_{N}$ \end_inset @@ -556,21 +731,21 @@ f=(g_{1}\star g_{2})\star g_{3})\star\ldots)\star g_{N})\label{eq:convolution}\e \end_inset -which enables +Therefore, \begin_inset Formula $f$ \end_inset - to be implemented as a sequence of convolution operations on the simple - PMFs of the random variables + can be computed as a sequence of convolution operations on the simple PMFs + of the random variables \begin_inset Formula $S_{i}$ \end_inset . - In fact, as values of + In fact, for large \begin_inset Formula $N$ \end_inset - get large, equation + equation \begin_inset CommandInset ref LatexCommand ref reference "eq:convolution" @@ -585,7 +760,7 @@ reference "eq:convolution" the binomial calculation in equation \begin_inset CommandInset ref LatexCommand ref -reference "eq:binomial-pdf" +reference "eq:binomial-pmf" \end_inset @@ -627,6 +802,13 @@ th element in the list corresponds to \begin_layout Subsection Multiple Failure Modes +\begin_inset CommandInset label +LatexCommand label +name "sub:Multiple-Failure-Modes" + +\end_inset + + \end_layout \begin_layout Standard @@ -636,13 +818,12 @@ In modeling share survival probabilities, it's useful to be able to analyze mass function for that form of failure can be generated. Similarly, statistics on other hardware failures, administrative errors, network losses, etc., can all be estimated independently. - If those estimates can then be combined into a single PMF for that server, - then we can use it to predict failures for that server. + If those estimates can then be combined into a single PMF for a share, + then we can use it to predict failures for that share. \end_layout \begin_layout Standard -In the case of independent failure modes for a single server, this is very - simple to do. +Combining independent failure modes for a single share is straightforward. If \begin_inset Formula $p_{i,j}$ \end_inset @@ -651,37 +832,1069 @@ In the case of independent failure modes for a single server, this is very \begin_inset Formula $j$ \end_inset -th failure mode of server +th failure mode of share \begin_inset Formula $i$ \end_inset -, and there are -\begin_inset Formula $m$ +, +\begin_inset Formula $1\leq j\leq m$ +\end_inset + +, then +\begin_inset Formula \[ +Pr[S_{i}=k]=f_{i}(k)=\begin{cases} +\prod_{j=1}^{m}p_{i,j} & k=1\\ +1-\prod_{j=1}^{m}p_{i,j} & k=0\end{cases}\] + +\end_inset + +is the survival PMF. +\end_layout + +\begin_layout Subsection +Multi-share failures +\begin_inset CommandInset label +LatexCommand label +name "sub:Multi-share-failures" + +\end_inset + + +\end_layout + +\begin_layout Standard +If there are failure modes that affect multiple computers, we can also construct + the PMF that predicts their survival. + The key observation is that the PMF has non-zero probabilities only for + +\begin_inset Formula $0$ +\end_inset + + survivors and +\begin_inset Formula $n$ +\end_inset + + survivors, where +\begin_inset Formula $n$ +\end_inset + + is the number of shares in the set. + If +\begin_inset Formula $p$ +\end_inset + + is the probability of survival, the PMF of +\begin_inset Formula $K$ \end_inset - failure modes then +, a random variable representing the number of surviors is \begin_inset Formula \[ -p_{i}=\prod_{j=1}^{m}p_{i,j}\] +Pr[K=i]=f(i)=\begin{cases} +p & i=n\\ +0 & 0 + + + + + + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $k$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $Pr[K=k]$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $Pr[file\, loss]=Pr[K + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $N/k$ +\end_inset + + +\end_layout + +\end_inset + + + + +\begin_inset Text + +\begin_layout Plain Layout +1 +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $1.60\times10^{-9}$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $2.53\times10^{-11}$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +12 +\end_layout + +\end_inset + + + + +\begin_inset Text + +\begin_layout Plain Layout +2 +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $3.80\times10^{-8}$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $1.63\times10^{-9}$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +6 +\end_layout + +\end_inset + + + + +\begin_inset Text + +\begin_layout Plain Layout +3 +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $4.04\times10^{-7}$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $3.70\times10^{-8}$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +4 +\end_layout + +\end_inset + + + + +\begin_inset Text + +\begin_layout Plain Layout +4 +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $2.06\times10^{-6}$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $4.44\times10^{-7}$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +3 +\end_layout + +\end_inset + + + + +\begin_inset Text + +\begin_layout Plain Layout +5 +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $2.10\times10^{-5}$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $2.50\times10^{-6}$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +2.4 +\end_layout + +\end_inset + + + + +\begin_inset Text + +\begin_layout Plain Layout +6 +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $0.000428$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $2.35\times10^{-5}$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +2 +\end_layout + +\end_inset + + + + +\begin_inset Text + +\begin_layout Plain Layout +7 +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $0.00417$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $0.000452$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +1.7 +\end_layout + +\end_inset + + + + +\begin_inset Text + +\begin_layout Plain Layout +8 +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $0.0157$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $0.00462$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +1.5 +\end_layout + +\end_inset + + + + +\begin_inset Text + +\begin_layout Plain Layout +9 +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $0.00127$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $0.0203$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +1.3 +\end_layout + +\end_inset + + + + +\begin_inset Text + +\begin_layout Plain Layout +10 +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $0.0230$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $0.0216$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +1.2 +\end_layout + +\end_inset + + + + +\begin_inset Text + +\begin_layout Plain Layout +11 +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $0.208$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $0.0446$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +1.1 +\end_layout + +\end_inset + + + + +\begin_inset Text + +\begin_layout Plain Layout +12 +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $0.747$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +\begin_inset Formula $0.253$ +\end_inset + + +\end_layout + +\end_inset + + +\begin_inset Text + +\begin_layout Plain Layout +1 +\end_layout + +\end_inset + + + + +\end_inset + + +\end_layout + +\begin_layout Plain Layout +\begin_inset Caption + +\begin_layout Plain Layout +\align left +\begin_inset CommandInset label +LatexCommand label +name "tab:Example-PMF" + +\end_inset + +Example PMF +\end_layout + +\end_inset + + +\end_layout + +\begin_layout Plain Layout + +\end_layout + +\end_inset + + +\end_layout + +\begin_layout Standard +The table demonstrates the importance of the selection of +\begin_inset Formula $k$ +\end_inset + +, and the tradeoff against file size expansion. + Note that the survival of exactly 9 servers is significantly less likely + than the survival of 8 or 10 servers. + This is, again, an artifact of the group failure modes. + Because of this, there is no reason to choose +\begin_inset Formula $k=9$ +\end_inset + + over +\begin_inset Formula $k=10$ +\end_inset + +. + Normally, reducing the number of shares needed for reassembly improve the + file's chances of survival, but in this case it provides a miniscule gain + in reliability at the cost of a 10% increase in bandwidth and storage consumed. +\end_layout + +\begin_layout Section +Long-Term Reliability +\end_layout + +\begin_layout Standard +Thus far, we've focused entirely on the probability that a file survives + the interval +\begin_inset Formula $A$ +\end_inset + + between repair times. + The probability that a file survives long-term, though, is also important. + As long as the probability of failure during a repair period is non-zero, + a given file will eventually be lost. + We want to know what the probability of surviving for time +\begin_inset Formula $T$ +\end_inset + + is, and how the parameters +\begin_inset Formula $A$ +\end_inset + + (time between repairs) and +\begin_inset Formula $L$ +\end_inset + + (share low watermark) affect survival time. +\end_layout + +\begin_layout Standard +To model file survival time, let +\begin_inset Formula $T$ +\end_inset + + be a random variable denoting the time at which a given file becomes unrecovera +ble, and +\begin_inset Formula $R(t)=Pr[T>t]$ +\end_inset + + be a function that gives the probability that the file survives to time + +\begin_inset Formula $t$ +\end_inset + +. + +\begin_inset Formula $R(t)$ +\end_inset + + is the cumulative distribution function of +\begin_inset Formula $T$ +\end_inset + +. +\end_layout + +\begin_layout Standard +Most survival functions are continuous, but +\begin_inset Formula $R(t)$ +\end_inset + + is inherently discrete, and stochastic. + The time steps are the repair intervals +\end_layout + +\begin_layout Section +Time-Sensitive Retrieval +\end_layout + +\begin_layout Standard +The above work has almost entirely ignored the distinction between availability + and reliability. + In reality, temporary and permanent failures need to be modeled separately, + and \end_layout \end_body