Sequential interval estimation for Bernoulli trials
MetadataShow full item record
Interval estimation of a binomial proportion is one of the most-basic problems in statistics with many important real-world applications. Some classical applications include estimation of the prevalence of a rare disease and accuracy assessment in remote sensing. In these applications, the sample size is fixed beforehand, and a confidence interval for the proportion is obtained. However, in many modern applications, sampling is especially costly and time consuming, e.g., estimating the customer click-through probability in online marketing campaigns and estimating the probability that a stochastic system satisfies a specific property as in Statistical Model Checking. Because these applications tend to require extensive time and cost, it is advantageous to reduce the sample size while simultaneously assuring satisfactory quality (coverage) levels for the corresponding interval estimates. The sequential version of the interval estimation aims at the latter goal by allowing the sample size to be random and, in particular, formulating a stopping time controlled by the observations themselves. The literature focusing on the sequential setup of the problem is limited compared to its fixed sample-size counterpart, and sampling procedure optimality has not been established in the literature. The work in this thesis aims to extend the body of knowledge on the topic of sequential interval estimation for Bernoulli trials, addressing both the theoretical and practical concerns. In the first part of this thesis, we propose an optimal sequential methodology for obtaining fixed-width confidence intervals for a binomial proportion when prior knowledge of the proportion is available. We assume that there exists a prior distribution for the binomial proportion, and our goal is to minimize the expected number of samples while guaranteeing that the coverage probability is at least a specified nominal coverage probability level. We demonstrate our stopping time is always bounded from above and below; we will need to first accumulate a sufficient amount of information before we start applying our stopping rule, and our stopping time will always terminate in finite time. We also compare our method with the optimum fixed-sample-size procedure as well as with existing alternative sequential schemes. In the second part of this thesis, we propose a two-stage sequential method for obtaining tandem-width confidence intervals for a binomial proportion when no prior knowledge of the proportion is available and when it is desired to have a computationally efficient method. By tandem-width, we mean that the half-width of the confidence interval of the proportion is not fixed beforehand; it is instead required to satisfy two different upper bounds depending on the underlying value of the binomial proportion. To tackle this problem, we propose a simple but useful sequential method for obtaining fixed-width confidence intervals for the binomial proportion based on the minimax estimator of the binomial proportion. Finally, we extend the idea for Bernoulli distributions in the first part of this thesis to interval estimation for arbitrary distributions, with an alternative optimality formulation. Here, we propose a conditional cost alternative formulation to circumvent certain analytical/computational difficulties. Specifically, we assume that an independent and identically distributed process is observed sequentially with its common probability density function having a random parameter that must be estimated. We follow a semi-Bayesian approach where we assign cost to the pair (estimator, true parameter), and our goal is to minimize the average sample size guaranteeing at the same time an average cost below some prescribed level. For a variety of examples, we compare our method with the optimum fixed-sample-size and other existing sequential schemes.