# 17.1.9.3 Algorithms (Distribution Fit)

Use the Distribution Fit to fit a distribution to a variable.

There are seven distributions can be used to fit a given variable. We calculate the Maximum Likelihood Estimation(MLE) as parameters estimators. For some continuous distributions, we not only give Confidence Limit but also offer Goodness of Fit test.

## Distributions and Maximum Likelihood Estimation(MLE)

### Normal Distribution

#### PDF

$\frac{1}{\sqrt{2\pi \sigma^2}}\exp [-\frac{(x-\mu)^2}{2\sigma^2}]$

where $-\infty and $0 < \sigma$. With $E(X)=\mu$ and $Var(X)=\sigma^2$.

#### Maximum Likelihood Estimation(MLE)

##### Parameters
• $\hat{\mu} = \bar{X}_n$
• $\hat{\sigma} = \sqrt{\frac{1}{n}\sum_{i=1}^n (X_i - \bar{X}_n)^2}$.
##### Confidence Intervals

The confidence interval for $\mu$ and $\sigma$ are:

$\left[ \hat{\mu} - z \hat{\mu}_{se}, \hat{\mu} + z\hat{\mu}_{se} \right]$
$\left[ \frac{\hat{\sigma}}{\exp \left[ (z \hat{\sigma}_{se})/\hat{\sigma} \right]},\hat{\sigma}\exp \left[ (z \hat{\sigma}_{se})/\hat{\sigma} \right] \right]$

where $z$ is the $0.975$ critical value for the standard normal distribution in which $95\%$ is the confidence level. And $\hat{\mu}_{se}$ is standard error for $\mu$ while $\hat{\sigma}_{se}$ is for $\sigma$.

### LogNormal Distribution

#### PDF

$\frac{1}{x\sqrt{2\pi \sigma^2}} exp\left[ -\frac{(\ln(x)-\mu)^2}{2\sigma^2}\right]$,

where $0 \leq x, -\infty < \mu < \infty$ and $0 < \sigma$. With $E(X)=exp(\mu + \sigma^2/2)$ and $Var(X)=exp(2(\mu + \sigma^2)) -exp(2\mu + \sigma^2 )$.

#### Maximum Likelihood Estimation(MLE)

##### Parameters
• $\hat{\mu} = ln\left(\bar{X}_n \right)$
• $\hat{\sigma} =ln\left(\sqrt{\frac{1}{n}\sum_{i=1}^n (X_i - \bar{X}_n)^2} \right)$.
##### Confidence Interval

The confidence interval for $\mu$ and $\sigma$ are:

$\left[ \hat{\mu} - z \hat{\mu}_{se}, \hat{\mu} + z \hat{\mu}_{se} \right]$
$\left[ \frac{\hat{\sigma}}{\exp \left[ (z \hat{\sigma}_{se})/\hat{\sigma} \right]},\hat{\sigma}\exp \left[ (z \hat{\sigma}_{se})/\hat{\sigma} \right] \right]$

where $z$ is the $0.975$ critical value for the standard normal distribution in which $95\%$ is the confidence level. And $\hat{\mu}_{se}$ is standard error for $\mu$ while $\hat{\sigma}_{se}$ is for $\sigma$.

### Weibull Distribution

#### PDF

$\frac{\beta}{\alpha^\beta}x^{\beta -1} exp\left[ -\left(\frac{x}{\alpha}\right)^\beta\right],$

where $\alpha , \beta > 0$. With $E(X)=\alpha \Gamma \left(1+ \frac{1}{\beta}\right)$ and $Var(X)=\alpha ^2 \{ \Gamma \left(1+\frac{2}{\beta}\right) -\Gamma ^2 \left(1+\frac{1}{\beta} \right) \}$.

#### Maximum Likelihood Estimation(MLE)

Origin calls a NAG function nag_estim_weibull (g07bec), for the MLE of statistics of weibull distribution. Please refer to related NAG document, for more details on the algorithm.

### Exponential Distribution

#### PDF

$\frac{1}{\sigma} exp\left[ -\frac{x}{\sigma}\right]$,

where $0 \leq x, -\infty < \mu < \infty$ and $0 < \sigma$. With $E(X)=\sigma$ and $Var(X)=\sigma^2$.

#### Maximum Likelihood Estimation(MLE)

##### Parameters

$\hat{\sigma} = \bar{X}_n$

##### Confidence Interval

The confidence interval for $\sigma$ is:

$\left[ \frac{\hat{\sigma}}{\exp \left[ (z \hat{\sigma}_{se})/\hat{\sigma} \right]},\hat{\sigma}\exp \left[ (z \hat{\sigma}_{se})/\hat{\sigma} \right] \right]$

where $z$ is the $0.975$ critical value for the standard normal distribution in which $95\%$ is the confidence level. And $\hat{\sigma}_{se}$ is standard error for $\sigma$.

### Gamma Distribution

#### PDF

$\frac{1}{\Gamma(\alpha)\sigma^\alpha}x^{\alpha -1} exp(-x/\sigma),$

where $\alpha , \sigma > 0$. With $E(X)=\alpha \sigma$ and $Var(X)=\alpha \sigma ^2$.

#### Maximum Likelihood Estimation(MLE)

##### Parameters

It's not easy to calculate MLE of $\alpha$ and $\sigma$ by hand. But with Newton-Raphson method, we can easily get what we want. In order to obtain good root of likelihood equation, we need to offer a proper initial estimator, which can be given by: $\alpha_0 = \frac{3-s+\sqrt{(s-3)^2+24s}}{12s},where s = \ln \left(\frac{1}{n}\sum_{i=1}^{n}x_i \right) - \frac{1}{n}\sum_{i=1}^{n}\ln (x_i).$

##### Confidence Interval

The confidence interval for $\alpha$ and $\theta$ are:

$\left[ \hat{\alpha} - z \hat{\alpha}_{se}, \hat{\alpha} + z\hat{\alpha}_{se} \right]$
$\left[ \frac{\hat{\theta}}{\exp \left[ (z \hat{\theta}_{se})/\hat{\theta} \right]},\hat{\theta}\exp \left[ (z \hat{\theta}_{se})/\hat{\theta} \right] \right]$

where $z$ is the $0.975$ critical value for the standard normal distribution in which $95\%$ is the confidence level. And $\hat{\alpha}_{se}$ is standard error for $\alpha$ while $\hat{\theta}_{se}$ is for $\theta$.

### Binomial Distribution

#### PDF

$\left( \begin{matrix} n \\ x \end{matrix}\right) p^x (1-p)^{n-x},$

where $0 \leq p \leq 1$ and $x=0,1,2,...,n$. With $E(X)=np$ and $Var(X)=np(1-p)$. Given a number of success $x$ and sample size $n$

#### Maximum Likelihood Estimation(MLE)

##### Parameters

$\hat{p} = x/n$

##### Confidence Interval
$\left[\frac{1}{1+z^2/n}\left(\hat{p}+\frac{z^2}{2n} - z \sqrt{\frac{1}{n}\hat{p}(1-\hat{p})+\frac{z^2}{4n^2}}\right),\frac{1}{1+z^2/n}\left(\hat{p}+\frac{z^2}{2n} + z \sqrt{\frac{1}{n}\hat{p}(1-\hat{p})+\frac{z^2}{4n^2}}\right)\right]$

where $z$ is the $0.975$ critical value for the standard normal distribution in which $95\%$ is the confidence level.

### Possion Distribution

#### PDF

$e^{-\lambda}\frac{{\lambda}^x}{x!},$

where $x=1,2,...,n$. With $E(X)=Var(X)=\lambda$.

#### Maximum Likelihood Estimation(MLE)

##### Parameters

$\hat{\lambda} = \frac{1}{n}\sum_{k=1}^{n}x_k$.

##### Confidence Interval

The confidence interval for $\lambda$ are:

$\left[ \hat{\lambda} - z \sqrt{\hat{\lambda}}, \hat{\lambda} + z \sqrt{\hat{\lambda}} \right]$

where $z$ is the $0.975$ critical value for the standard normal distribution in which $95\%$ is the confidence level.

## Goodness of Fit

### Kolmogorov-Smirnov

Origin calls a NAG function nag_1_sample_ks_test (g08cbc) , to compute the statistics. Please refer to related NAG document, for more details on the algorithm.

### Kolmogorov-Smirnov(Modified)

• Modified Kolmogorov-Smirnov Statistic

The modified Kolmogorov-Smirnov statisticis a modification of the Kolmogorov-Smirnov Statistic based on different distribution.

• P-value

The p-value for the Kolmogorov-Smirnov statistic is computed based on critical values table below, provided by D’Agostino and Stephens (1986). If the value of D is between two probability levels, then linear interpolation is used to estimate the p-value.

Here $D_n$ is the Kolmogorov-Smirnov statistic

#### Normal/Lognormal Distribution

• Modified Kolmogorov-Smirnov Statistic:
$D=D_n\left(\sqrt{N}-0.01+\frac{0.85}{\sqrt{N}}\right)$
• Critical Values Table
 D P-Value <0.775 0.775 0.819 0.895 0.995 1.035 >1.035 >=0.15 0.15 0.1 0.05 0.025 0.01 <=0.01

#### Weibull distribution

• Modified Kolmogorov-Smirnov Statistic:
$D=D_n\sqrt{N}$
• Critical Values Table
 D P-Value <1.372 1.372 1.477 1.577 1.671 >1.671 >=0.1 0.1 0.05 0.025 0.01 <=0.01

#### Exponential Distribution

• Modified Kolmogorov-Smirnov Statistic:
$D=\left(D_n-\frac{0.2}{N}\right)\left(\sqrt{N}+0.26+\frac{0.5}{\sqrt{N}}\right)$
• Critical Values Table
 D P-Value <0.926 0.926 0.995 1.094 1.184 1.298 >1.298 >=0.15 0.15 0.1 0.05 0.025 0.01 <=0.01

#### Gamma Distribution

• Modified Kolmogorov-Smirnov Statistic:
$D=D_n\left(\sqrt{N}+\frac{0.3}{\sqrt{N}}\right)$
• Critical Values Table
 D P-Value <0.74 0.74 0.78 0.8 0.858 0.928 0.99 1.069 1.13 >1.13 >=0.25 0.25 0.2 0.15 0.1 0.05 0.025 0.01 0.005 <=0.005

### Anderson-Darling

• Anderson-Darling Statistics
$z=-N-\sum_{i=1}^n\frac{(2i-1)}{N}\left[lnF(Y_i)+ln(1-F(Y_{N+1-i})\right]$
where
• $F$ is the cumulative distribution function of the specified distribution
• $Y_i$ are ordered data points: $Y_{1} \leq Y_2 \leq ... \leq Y_{n-1} \leq Y_n$
• P-value
The p-value for the Adjusted Anderson-Darling statistics is computed based on critical values table below, provided by D’Agostino and Stephens (1986). If the value of $z^{*}$ is between two probability levels, then linear interpolation is used to estimate the p-value.

#### Normal/Lognormal Distribution

$z^*=z\left(1 + \frac{0.75}{N}+\frac{2.25}{N^2}\right)$
• P-value
$p=\begin{cases} 1-e^{-13.436+101.14z^{*}-223.73z^{*2}}, z^{*} \leq 0.2\\ 1-e^{-8.318+42.796z^{*}-59.938z^{*2}}, 0.2 < z^{*} \leq 0.34\\ e^{0.9177-4.279z^{*}-1.38z^{*2}}, 0.34 < z^{*} \leq 0.6\\ e^{1.2937-5.709z^{*}+0.0186z^{*2}}, z^{*} \geq 153.467 \end{cases}$

#### Weibull distribution

$z^{*}=\left(1+\frac{0.2}{N}\right)$
• Critical Values Table
 $z^{*}$ P-Value <0.474 0.474 0.637 0.757 0.877 1.038 >1.038 >=0.25 0.25 0.1 0.05 0.025 0.01 <=0.01

#### Exponential Distribution

$z^{*}=z\left(1+\frac{0.6}{N}\right)$
• P-value
$p=\begin{cases} 1-e^{-12.2204+67.459z^{*}-110.3z^{*2}}, z^{*} \leq 0.26\\ 1-e^{-6.1327+20.218z^{*}-18.663z^{*2}}, 0.26 < z^{*} \leq 0.51\\ e^{0.9209-3.353z^{*}-0.3z^{*2}}, 0.51 < z^{*} \leq 0.95\\ e^{0.731-3.009z^{*}+0.15z^{*2}}, 0.95 < z^{*} \leq 10.03\\ 0, z^{*} \geq 10.03 \end{cases}$

#### Gamma Distribution

• Critical Values Table
 $z$ P-Value <0.486 0.486 0.657 0.786 0.917 1.092 1.227 >1.227 >=0.25 0.25 0.1 0.05 0.025 0.01 0.005 <=0.005

 $z$ P-Value <0.473 0.473 0.637 0.759 0.883 1.048 1.173 >1.173 >=0.25 0.25 0.1 0.05 0.025 0.01 0.005 <=0.005

 $z$ P-Value <0.470 0.47 0.631 0.752 0.873 1.035 1.159 >1.159 >=0.25 0.25 0.1 0.05 0.025 0.01 0.005 <=0.005

## Mean Test

### Z-Test

#### Test Statistics

$t=\frac{\bar{x}-\mu_0}{\sigma/\sqrt{n}}$

where

• $\bar{x}: \frac{1}{n}\sum_{i=1}^n x_i$
• $\mu_0$: The specified test mean
• $\sigma$: The specified standard deviation

#### P-Value

The $P$, is returned based on an approximate Normal test statistics $Z$.

#### Confidence Intervals

For the specified significance level, the confidence interval for the sample mean is:

Null Hypothesis Confidence Interval
$H_0:z=z_0\,\!$ $\left[\bar{x}-Z_{\frac{\sigma}{2}}(\frac{\sigma}{\sqrt{n}}),\bar{x}+Z_{\frac{\sigma}{2}}(\frac{\sigma}{\sqrt{n}})\right]$
$H_0:z \le z_0$ $\left[\bar{x}-Z_{\frac{\sigma}{2}}(\frac{\sigma}{\sqrt{n}}), \infty\right]$
$H_0:z \ge z_0$ $\left[-\infty, \bar{x}+Z_{\frac{\sigma}{2}}(\frac{\sigma}{\sqrt{n}})\right]$