\documentclass{article}

\usepackage{amsmath}
\usepackage{fullpage}
\usepackage{url}
\usepackage{xspace}
\newcommand{\R}{\texttt{R}\xspace}


\title{Confidence intervals for binomial data: Simulations}
\author{Patrick Breheny}
\date{September 29, 2016}

<<include=FALSE, purl=FALSE>>=
require(knitr)
opts_chunk$set(prompt = TRUE)
opts_chunk$set(comment = NA)
opts_template$set(fig = list(fig.height=5, fig.width=5, out.width='.5\\linewidth', fig.align='center'))
@

\begin{document}
\maketitle

The purpose of today's lab is to carry out simulations to see what the coverage is for the various intervals for binomial data we have derived in class: the Clopper-Pearson interval, the Bayesian HPD interval, the Wald interval, and the score interval.  We proved in class that the Clopper-Pearson interval must have at least 95\% coverage, but is the coverage exactly 95\%, or it is, say, 99\%.  The Wald and score intervals have approximate 95\% coverage, but how close is the approximation?  And what's the coverage of a Bayesian interval?  I'm not entirely sure if we'll have time for all this in today's lab, but if we don't finish, we'll pick up next time where we left off.

Let's let $\pi$ denote the true probability of success; we'll start off supposing that $\pi=0.25$ and $n=10$.  Let's carry out the simulation.  Typically, simulations involve the use of {\tt for} loops.  The syntax of {\tt for} loops is pretty straightforward, and shown below.  The idea is that we want to loop over a block of code (enclosed in curly brackets), with an index ({\tt i} in the example below) that updates every time the loop is run again.  Note that with the {\tt rbinom} function, we can actually do the sampling outside of a {\tt for} loop, but we still need the {\tt for} loop to calculate the CI:

<<Coverage_simulation>>=
N <- 10000
n <- 10
pi <- 0.25
x <- rbinom(N, prob=0.25, size=n)
covered <- numeric(N)
for (i in 1:N) {
  ci <- binom.test(x[i], n)$conf
  covered[i] <- (ci[1] < pi) & (pi < ci[2])
}
mean(covered)
@

So our interval is actually fairly conservative here: its coverage is about 98\%.  What if we re-run the simulation with $n=50$?

It would be interesting to see what happens to the coverage as a function of $\pi$.  We can accomplish this with {\em nested} {\tt for} loops.  This actually takes a little while to run, so let's add a progress bar to let us know how things are going.

<<Coverage_simulation_loop, results="hide">>=
N <- 10000
n <- 20
pi <- c(0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.5, 0.68, 0.84, 0.92, 0.96, 0.98, 0.99)
coverage <- numeric(length(pi))
pb <- txtProgressBar(1, length(pi), style=3)
for (j in 1:length(pi)) {
  x <- rbinom(N, prob=pi[j], size=n)
  covered <- numeric(N)
  for (i in 1:N) {
    ci <- binom.test(x[i], n)$conf
    covered[i] <- (ci[1] < pi[j]) & (pi[j] < ci[2])
  }
  coverage[j] <- mean(covered)
  setTxtProgressBar(pb, j)
}
@ 
<<Coverage_results, opts.label="fig">>=
plot(pi, coverage, type="l", lwd=3, ylim=c(0.8,1), las=1)
abline(h=0.95, col="red", lwd=3)
@

So our Clopper-Pearson interval seems quite conservative when $\pi$ is close to 0 or 1, but pretty close to the {\em nominal coverage} when $\pi$ is close to 0.5.  As guaranteed by theory, the coverage is indeed always at least 95\%.

There are lots of directions we could head here:

\begin{itemize}
  \item Re-run the simulation above with a different sample size
  \item Try a similar simulation, only keeping $\pi$ fixed and plotting coverage vs. {\tt n}
  \item Include one or both of the Bayesian intervals and see what their coverage looks like
\end{itemize}

Now, let's expand on this simulation, adding the Wald, score, and Bayesian HPD intervals as competitors.  I have posted a function online, {\tt bayes.binom}, that we can use to calculate HPD intervals for binomial data (as you know from your homework assignment, there is no standard \R function for this).  The score interval is available via the \R function {\tt prop.test} (for inference concerning proportions).  Lastly, there is no \R function that returns the Wald interval, but it is trivial to write one.  So, after we source the following:

<<Sourcing_ci_methods>>=
source("http://myweb.uiowa.edu/pbreheny/5710/f16/labs/binom.bayes.R")
prop.wald <- function(x, n, level=0.95) {
  pi.hat <- x/n
  SE <- sqrt(pi.hat*(1-pi.hat)/n)
  z <- qnorm(c((1-level)/2, 1-(1-level)/2))
  pi.hat + z*SE
}
prop.test(31,39) ## Just to show how prop.test works
@

We can now run our simulation loop as in the last lab, using the {\tt binom.test}, {\tt binom.bayes}, {\tt prop.test}, and {\tt prop.wald} functions to calculate our various intervals.  Here, for the sake of organization, I'll store all the coverage results in a matrix (previously, a vector was sufficient).

<<Coverage_simulation_loop_2, results="hide">>=
N <- 10000
n <- 20
pi <- c(0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.5, 0.68, 0.84, 0.92, 0.96, 0.98, 0.99)
coverage <- matrix(NA, length(pi), 4, dimnames=list(pi, c("CP", "Bayes", "Score", "Wald")))
pb <- txtProgressBar(1, length(pi), style=3)
for (j in 1:length(pi)) {
  x <- rbinom(N, prob=pi[j], size=n)
  covered <- matrix(NA, N, 4)
  for (i in 1:N) {
    ci <- binom.test(x[i], n)$conf
    covered[i,1] <- (ci[1] < pi[j]) & (pi[j] < ci[2])
    ci <- binom.bayes(x[i], n)$ci.hpd
    covered[i,2] <- (ci[1] < pi[j]) & (pi[j] < ci[2])
    ci <- prop.test(x[i], n, correct=FALSE)$conf
    covered[i,3] <- (ci[1] < pi[j]) & (pi[j] < ci[2])
    ci <- prop.wald(x[i], n)
    covered[i,4] <- (ci[1] < pi[j]) & (pi[j] < ci[2])
  }
  coverage[j,] <- apply(covered, 2, mean)
  setTxtProgressBar(pb, j)
}
@ 

<<Coverage_results_2, opts.label="fig">>=
col <- hcl(seq(15, 375, len = 5), 150, 60)[1:4] ## Setting up some colors
matplot(pi, coverage, type="l", lwd=3, lty=1, col=col)
legend("bottom", legend=colnames(coverage), col=col, lwd=3)
matplot(pi, coverage, type="l", lwd=3, lty=1, col=col, ylim=c(0.8,1), las=1)
legend("bottom", legend=colnames(coverage), col=col, lwd=3)
abline(h=0.95, col="gray", lwd=3)
@

Note the use of {\tt apply} here.  {\tt apply} is a very useful function that applies a function across all rows or columns of a matrix (or a larger array).  We could write a for loop to calculate the average coverage for each method, but with {\tt apply}, we can obtain the result we're interested in with just a single line of code.  Also, the function {\tt matplot} works a lot like {\tt plot}, but allows you to plot several lines at once.

Some observations I would make:
\begin{itemize}
  \item The Wald interval is simply unacceptable.  Its coverage is nowhere even close to 95\%.  The other three intervals are at least reasonable here, never falling too far below 95\%.
  \item The Clopper-Pearson interval is the most conservative, but is also the only interval that always provides at least 95\% coverage.
  \item The Bayesian interval is not a confidence interval at all -- it provides an interval for the middle 95\% of the posterior distribution and is not concerned with coverage in the long-run frequency sense -- but still has reasonable frequentist properties.  This is often the case for Bayesian methods.
\end{itemize}

As a brief aside, the default behavior for {\tt prop.test} is to carry out a minor ``continuity correction''.  This is a fairly simple adjustment, but we didn't really discuss it in class.  To get the version of the interval we discussed in class, you can turn off the correction with {\tt correct=FALSE}.

\end{document}