# 17.9.2 Algorithms (ROC Curve)

In this part, Following notation will be used. $x_i\,\!$ : Test result score for case $n_{TP}\,\!$ : Number of true positive decisions $n_{FN}\,\!$ : Number of false negative decisions $n_{TN}\,\!$ : Number of true negative decisions $n_{FP}\,\!$ : Number of false positive decisions $n_{-}\,\!$: Number of cases with negative actual state $n_{+}\,\!$: Number of cases with positive actual state $n_{-=j}\,\!$: Number of true negative cases with test results equal to $n_{+>j}\,\!$: : Number of true positive cases with test results greater than $n_{+=j}\,\!$: : Number of true positive cases with test results equal to $n_{-: : Number of true negative cases with test results less than

1. ROC Values

1- Specificity (X): $1-\frac{n_{TN}}{n_{TN}+n_{FP}}\,\!$

Sensitivity (Y): $\frac{n_{TP}}{n_{TP}+n_{FN}}\,\!$

2. The area under the ROC curve

Let $x\,\!$ be the scale of the test result variable. Denote $x_{-}\,\!$ by the $x\,\!$ values for cases with negative actual states and $x_{+}\,\!$ the values for cases with positive actual states. Then, the nonparametric approximation of the &rdquor;true” area under the ROC curve, $\theta \,\!$,is $A_Z=\frac 1{n_{+}n_{-}}$ $\sum_{j=1}^{n_{-}}\sum _{i=1}^{n_{+}}\Psi (x_{+},x_{-})$

where $n_{+}\,\!$ is the sample size of $D\,\!$+, $n_{+}\,\!$is the sample size of $D\,\!$-, and $\Psi (x_{+},x_{-})=\,\!$ $\begin{cases} 1, & \mbox{if }x_{+}>x_{-} \\ 0.5, & \mbox{if }x_{+}=x_{-} \\ 0, & \mbox{if }x_{+}

Note that $A_z\,\!$ is the observed area under the ROC curve, which connects successive points by a straight line, i.e., by the trapezoidal rule.

An alternative way to compute $A_z\,\!$ is as follows: $A_Z=\frac 1{n_{+}+n_{-}}\sum \left\{ n_{-=j}n_{+>j}+\frac{n_{-=j}n_{+=j}}2\right\}$

3. The SE of the area under the ROC curve statistic

The standard deviation of $A_z\,\!$ is estimated by: $SE(A_Z)=\sqrt{\frac{A_Z(1-A_Z)+(n_{+}-1)(Q_1-A_Z^2)+(n_{-}-1)(Q_2-A_Z^2)}{n_{+}n_{-}}} \,\!$

where $Q_{1=\frac 1{n_{-}n_{+}^2}}\sum n\__{=j}[n_{+>j}^2+n_{+>j}n_{+=j}+\frac{n_{+>j}^2}3] \,\!$

and $Q_{2=\frac 1{n_{-}^2n_{+}}}\sum n_{+=j}[n_{->j}^2+n_{->j}n_{-=j}+\frac{n_{-=j}^2}3] \,\!$

4. The asymptotic confidence interval of the area under the ROC curve

A 2-sided asymptotic $c\%=(100-\alpha )\%\,\!$ confidence interval for the true area under the ROC curve is $A_Z\pm SE(A_Z)\,\!$

5. The asymptotic P-value under the null hypothesis that $\theta=0.5\ \,\!$ vs. the alternative hypothesis that $\theta \neq 0.5\ \,\!$

Since $A_z\,\!$ is asymptotically normal under the null hypothesis that $\theta=0.5\ \,\!$ , we can calculate the asymptotic P-value under the null hypothesis that $\theta=0.5\ \,\!$ vs. the alternative hypothesis that $\theta \neq 0.5\ \,\!$: $P\left( \left| Z\right| >\left| \frac{A_Z-0.5}{SD(A_Z)|_{\theta =0.5}}\right| \right) =2P\left( Z>\left| \frac{A_Z-0.5}{SD(A_Z)\mid _{\theta =0.5}}\right| \right)$

In the nonparametric case, $SD(A_Z)|_{\theta =0.5}=\sqrt{\frac{\theta (1-\theta )+(n_{+}-1)(Q_1-\theta ^2)+(n_{-}-1)(Q_2-\theta ^2)}{n_{+}n_{-}}}|_{\theta =0.5}\,\!$ $=\sqrt{\frac{0.5(1-0.5)+(n_{+}-1)(\frac 13-0.5^2)+(n_{-}-1)(\frac 13-0.5^2)}{n_{+}n_{-}}}$