1$ and $0<\theta_d<1$. With PFM, $Z_{k}(\xv)= \Tkbar(\xv) + \lambda^{v_k}(\xv) \times \max \{0, q - \Rkbar(\xv)\}$ is calculated. Then, to solve Problem (\ref{chap3:eqn:originalprob}), a solution with the smallest $Z_k(\xv)$ is selected as the current best at search iteration $k$. Section~\ref{chap2:sec:PO} shows that a solution with the smallest $Z_k(\xv)$ at search iteration $k$ converges to the true best feasible solution to Problem (\ref{chap3:eqn:originalprob}) as $k$ goes to infinity when (i) PFM is designed in a way that the penalty value of a feasible solution converges to zero but diverges to infinity for an infeasible solution and (ii) a globally convergent DOvS algorithm is applied with the PFM. The version of PFM presented above does satisfy the convergence property when no solution is located right on the boundary of feasible/infeasible regions (i.e., no active constraint). However, if a solution has an active constraint, the penalty value of a solution with an active constraint converges in distribution to a random variable with two possible values: zero and infinity as in Section~\ref{chap2:psc}. Recall that Section~\ref{subsec:psforPSc} provides a direction for choosing good parameters which balance between fast convergence and high probability of returning a true optimal solution when PS$_c$ is used. %Thus $\theta_a$ and $\theta_d$ need to be selected carefully so that the converging probability of penalty value to zero for a feasible solution with an active constraint is reasonably high. This choice of parameters, in turns, guarantees a good performance of NP+PFM when the optimization problem is difficult: the best feasible solution is located close to the boundary of feasible/infeasible regions and there are many infeasible solutions with better primary performance measures near the best feasible. %Recall that Section~\ref{subsec:psforPSc} supports that high converging probability tends to accompany slow convergence when applying PFM to a difficult optimization problem. Therefore, a decision maker needs to choose $\theta_a$ and $\theta_d$ carefully, considering a trade-off between convergence and efficiency in PFM. \subsection{NP+PFM} \label{sec:PO} Among DOvS algorithms, we choose a version of NP presented in \cite{pjnb}. NP focuses on a special region called the most promising region and spends more computational efforts in the most promising region by sampling more solutions from it. More specifically, NP systematically partitions the promising region into a number of subregions. Then it samples and assesses solutions from the subregions. At the same time, it keeps sampling some solutions from the region outside the most promising region called the surrounding region. If the current sample best solution is in one of the subregions, the subregion with the current sample best will be the next most promising region. Otherwise, the most promising region becomes the whole set, $\Theta$. Prior to the description of NP+PFM, some additional notation for NP is defined first:\\ \noindent ${\cal{R}}_k :=$ the most promising region at search iteration $k$; \\ ${\cal{R}}_k (\ell):=$ the $\ell$th subregion at search iteration $k$; \\ $\Theta \setminus {\cal{R}}_k := $ the surrounding region at search iteration $k$;\\ $\Sk := $ the set of solutions sampled at iteration $k$;\\ $\Vs_{k} := $ the set of all solutions visited up to iteration $k$;\\ $\omega := $ the number of subregions;\\ $\tau_k := $ the total number of sampled solutions at iteration $k$; and\\ $\tau_k (\ell) := $ the number of sampled solutions at iteration $k$ from subregion $\ell$. %$\hat{\xv}_k^* := $ the current best solution obtained by NP+PFM (i.e., a solution with the smallest $Z_k(\xv)$) at iteration $k$.\\ We specialize NP+PFM for the water quality monitoring design problem and its detailed steps are given in Figure~\ref{alg:nppfm}. \begin{figure} \begin{center} \fbox{ \begin{minipage}{\textwidth} \begin{hangref} \vspace{0.1in} \item \begin{center}\textbf{Algorithm NP + PFM}\end{center} \vspace{0.1in} \item \textbf{Step 0. Initialization:}\\ - Perform the indexing algorithm (Section 3.2.1.).\\ - Set $k = 1$, ${\cal{R}}_k = \Theta$, and $\Vs_k=\emptyset$.\\ - Sample an initial solution $\hat{\xv}_0^{*}$ randomly from $\Theta$.\\ - Select constants $\omega,\ \tau_k$, $\Delta n$, $\lambda^0(\xv),\ \theta_a$ and $\theta_d$.\vspace{6pt} \vspace{0.1in} \item \textbf{Step 1. Partitioning:} (Section 3.2.2.)\\ - Partition ${\cal{R}}_k$ into $\omega$ disjoint subregions, ${{\cal{R}}_{k}(1), {\cal{R}}_{k}(2), ..., {\cal{R}}_{k}(\omega)}$. If ${\cal{R}}_k$ is a singleton, set ${\cal{R}}_{k}(1) = {\cal{R}}_k$ and ${\cal{R}}_k(2) = \ldots = {\cal{R}}_k(\omega) = \emptyset$.\\ - Set ${\cal{R}}_{k}(\omega+1) = \Theta \setminus {\cal{R}}_k$ which denotes the surrounding region. \vspace{6pt} \vspace{0.1in} \item \textbf{Step 2. Sampling Solutions:} (Section 3.2.3.)\\ - From each region ${\cal{R}}_{k}(j), j = {1,2, ..., \omega+1}$, sample $\tau_{k}(j)$ solutions. Always sample $\hat{\xv}_{k-1}^{*}$ so that $\hat{\xv}_{k-1}^{*} \in \Sk$.\\ - Include all sampled solutions $\xv$ into $\Sk$.\\ - If $\xv \notin \Vs_k$ for any $\xv \in \Sk$, then $\Vs_k=\Vs_k \cup \{ \xv \}$. \vspace{6pt} \vspace{0.1in} \item \textbf{Step 3. Estimating the Promising Index:}\\ - For each $\xv \in \Sk$, take $\Delta n$ observations, set $n_{v_k}(\xv)=n_{v_{k-1}}(\xv)+\Delta n$ where $n_0(\xv) = 0$, for any $\xv \in \Theta$, and update $Z_{k}(\xv)$.\\ - Select $\hat{\xv}^{*}_{k}$ such that $\hat{\xv}^*_{k} \equiv \mbox{argmin}_{\xv \in \Vs_k} Z_{k}(\xv).$ \vspace{6pt} \vspace{0.1in} \item \textbf{Step 4. Selecting the Most Promising Region and Backtracking:} \\ - Determine $j^*$ such that $\hat{\xv}^*_{k} \in {\cal{R}}_{k}(j^{*})$.\\ % \item \textbf{Step 5. Backtracking:}\\ - If ${\cal{R}}_{k}(j^*) \subset {\cal{R}}_k$, then ${\cal{R}}_{k+1}={\cal{R}}_{k}(j^{*})$. Otherwise, ${\cal{R}}_{k+1}=\Theta$.\\ - Set $k=k+1$. \vspace{6pt} \vspace{0.1in} \item \textbf{Step 5. Stopping Rule} (Section 3.2.4.)\\ - If the stopping rule is satisfied, then stop and return $\hat{\xv}^*_k$ as the best solution. Otherwise go to Step 1. \vspace{0.1in} \end{hangref} \end{minipage} } \caption{Algorithm NP + PFM for the water quality monitoring design problem. \label{alg:nppfm}} \end{center} \end{figure} \cite{sobook} point out that the performance of NP on a combinatorial type problem highly depends on how to index nodes, partition a search space, and sample solutions. The next section discusses how to efficiently perform each step in NP+PFM including stopping criteria. \section{Implementation} This section addresses implementation issues in NP+PFM for the water quality monitoring design problem including indexing, partitioning, sampling, and stopping. \subsection{Indexing} Many search methods tend to generate alternative solutions from neighbors of the current best solution. In this chapter, a solution is a M-dimensional vector whose elements are in the increasing order of integers. Changing a few elements in $\xv$ up and down generates neighbor solutions. As NP+PFM spends more computing efforts on the most promising region, it would be desirable if neighbors of a solution tend to share the same feasibility with the current solution. \begin{figure} \begin{center} \scalebox{0.5}{\includegraphics[width=8.0in,height=6.0in]{Oldhypo.eps}} \caption{ The hypothetical river with 12 nodes indexed based on quality. \label{fig:oldhypo}} \end{center} \end{figure} The idea is to index each node based on quality where the quality is defined as the probability that a spill is detected, assuming only one monitoring device is placed at the node and the device does not miss a spill under any circumstances. Then nodes are indexed in the decreasing order of the quality. If there are nodes with the same quality, they are indexed by the increasing order of the distance from the node with the highest quality. Figure~\ref{fig:oldhypo} shows quality levels in parenthesis for all nodes in the hypothetical river \cite{ouyang:designrivga}, assuming equal spill probability at each node (i.e., $1/12$), and the indices of nodes based on the quality levels. This indexing method helps neighbors of a solution show similar levels of reliability. \subsection{Partitioning} The current most promising region $\mathcal{R}_k$ needs to be partitioned into two subregions, $\mathcal{R}_k(1)$ and $\mathcal{R}_k(2)$, if $\omega=2$. The partitioning scheme in this chapter is based on the bisection method. More specifically, the algorithm selects an element of $\xv$ whose range of possible values contains two or more integers. Then the partitioning is done by dividing the range into approximately two equal ranges. Figure \ref{fig:hypopartition} illustrates the partitioning scheme for the monitoring system with three monitoring devices in the hypothetical river, assuming no backtracking occurs by $k=5$. Although the partitioning is presented only for $\omega = 2$ in this chapter, it can be easily extended to $\omega \geq 3$, in which case the range can be divided into $\omega$ number of equal ranges. \begin{figure} \begin{center} \scalebox{0.7}{\includegraphics[width=6.0in,height=4.0in]{Hypopartition.eps}} \caption{Partitioning for the hypothetical river. \label{fig:hypopartition}} \end{center} \end{figure} \subsection{Sampling} Solutions are sampled from three regions: two subregions, $\mathcal{R}_k(1)$ and $\mathcal{R}_k(2)$, and the surrounding region, $\Theta \setminus \mathcal{R}_k$. Since the size of two subregions can be different, $\tau_k(1)$ and $\tau_k(2)$ are set to proportional to the sizes of subregions 1 and 2, respectively. Although the size of the surrounding region $\Theta \setminus \mathcal{R}_k$ tends to be much bigger than $\mathcal{R}_k$, the $\lfloor \tau_k/2 \rfloor$ or $|\mathcal{R}_k|$ number, whichever is smaller, of solutions are sampled from $\mathcal{R}_k$ and then the rest $\tau_k - \tau_k(1) - \tau_k(2)$ number of solutions are sampled from the surrounding region. This ensures that NP+PFM visits more solutions in the most promising region. Moreover, $\tau_k(1)$ and $\tau_k(2)$ are proportional to the sizes of $\mathcal{R}_k(1)$ and $\mathcal{R}_k(2)$, respectively. Detailed steps of the sampling scheme is given in Figure \ref{fig:samplingscheme}. \begin{figure} \vspace{0.1in} \begin{center} \fbox{ \begin{minipage}{\textwidth} \begin{hangref}\vspace{0.1in} \item \begin{center}\textbf{Sampling scheme}\end{center} \vspace{0.1in} \begin{enumerate} \item Set $\left \{ \begin{array}{l} \tau_k(1) = \min \Big( |\mathcal{R}_k (1)|, \max \Big( 1, \lfloor \frac{|\mathcal{R}_k(1)|}{|\mathcal{R}_k|} \cdot \frac{\tau_k}{2}\rfloor \Big)\Big);\\ \tau_k(2) = \min \Big( |\mathcal{R}_k (2)|, \lfloor \frac{\tau_k}{2} \rfloor - \tau_k(1) \Big); \mbox{ and}\\ \tau_k(3) = \tau_k - \tau_k(1) -\tau_k(2). \end{array} \right.$ \vspace{0.1in} \vspace{0.1in} \item Sample $\tau_k(1)$ and $\tau_k(2)$ number of different solutions from $\mathcal{R}_k(1)$ and $\mathcal{R}_k(2)$ using the uniform sampling, respectively. \vspace{0.1in} \item If $\mathcal{R}_k$ is not a singleton (i.e., $|\mathcal{R}_k| \ge 2$), sample $\tau_k(3)$ number of different solutions by the uniform sampling from $\Theta \setminus \mathcal{R}_k$. Otherwise, sample $\lfloor \frac{\tau_k}{2} \rfloor$ number of different solutions by the local search sampling and the rest number of different solutions by the uniform sampling from $\Theta \setminus \mathcal{R}_k$. \\ \end{enumerate} \end{hangref} \end{minipage} } \caption{Sampling scheme for NP+PFM. \label{fig:samplingscheme}} \end{center} \end{figure} In general, it is important that a search algorithm visits a few number of neighbor solutions to the current best solution because good solutions tend to be herded together. Visiting neighbor solutions is more important for OvS problems with stochastic constraints. Finding the best feasible solution becomes difficult when the best feasible solution is the one whose reliability is close to the minimum reliability level, $q$. An extreme case occurs when the reliability of the best feasible solution is exactly equal to $q$, in which case there always exists positive probability that the solution is declared as infeasible at each visit to the solution. This, in turns, increases the chance of labeling the region that contains the best feasible as the surrounding region. PFM is designed to ensure that NP+PFM eventually selects the best feasible as long as there is nonzero probability of visiting the solution. When the most promising region is not a singleton, the sampling scheme ensures visiting some neighbor solutions to the current best. To ensure that this happens also when $\mathcal{R}_k$ is a singleton, a local search sampling is adopted. In the local search sampling, two methods are used. The first method randomly selects an element $x_{u^*}$ from the current best solution in the singleton set $\mathcal{R}_k$, and then change the value of $x_{u^*}$ by $\pm 1$ randomly. The second method varies the value of $x_{u^*}$ to any integer between $x_{u^*-1}$ and $x_{u^*+1}$. For example, if the solution in a singleton set $\mathcal{R}_k$ is $(1,4,7)$ and $x_{u^*} = 4$ (or $u^* = 2$), then the first method randomly selects either $(1,3,7)$ or $(1,5,7)$ with equal probability and the second method generates one solution among $(1,2,7), (1,3,7), (1,5,7)$, and $(1,6,7)$ with equal probability. About half of $\lfloor \tau_k/2 \rfloor$ number of solutions are generated using the first method and the other half number of solutions are generated using the second method. \subsection{Stopping Criteria} \label{subsec:stop} The global convergence is achieved when $k$ goes to infinity but, in practice, the algorithm should terminate with finite search iterations. \cite{hncompass} give some popular stopping criteria. In this chapter, the following stopping criterion is used: the algorithm stops when event $E_1$ occurs consecutively $n_E$ times, where \begin{equation} \label{eq:stopct} E_1 := \{{\hat{\xv}_{k}^{*}}={\hat{\xv}_{k-1}^{*}} \mbox{ , } |Z_{k}(\hat{\xv}_{k}^{*})-Z_{k-1}(\hat{\xv}_{k-1}^{*})|<\epsilon \mbox{ , } {\cal{R}}(k) \mbox{ is a singleton} \} \end{equation} for a small positive constant $\epsilon$. The decision maker needs to choose $\epsilon$ and $n_E$. \section{Case Study} \label{sec:PS} This section presents the performance of NP + PFM compared to the GA algorithm in \cite{telci:optimalwqmnd} on the water quality monitoring design problem for the Altamaha River. The Altamaha River is located in Georgia, U.S.A.\@ and known to have the largest watershed in the State. Figure \ref{fig: Altriver} shows the Altamaha River with one hundred nodes which are located on the most upstream points, confluences, and points evenly distributed along each river reach. Each node can be a possible monitoring location or a spill location. The Altamaha River is composed of 60 river reaches and total 62 river junctions. To construct river system, U.S. Geological Survey (USGS) in the National Elevation Dataset is used as in \cite{telci:optimalwqmnd}. Each process simulation continues until the simulation clock reaches 40 days and uses every 15 minutes of the simulation clock as the inter-reporting time. \cite{telci:optimalwqmnd} define \begin{equation*} d'(x_u) = \left\{ \begin{array}{ll} \mbox{detected time }- \mbox{ spill time}, & \mbox{ if detected at } x_u;\\ P_v, & \mbox{ otherwise},\\ \end{array} \right. \end{equation*} where $P_v$ is a constant penalty value and $t'(\xv)= \min_{1\leq u \leq M} d'(x_u)$. Then they solve $\mbox{argmin}_{\xv \in \Theta} \E [t'(\xv)]$ by the GA algorithm. \begin{figure} \begin{center} \scalebox{0.7}{\includegraphics[width=6.0in,height=4.0in]{Altamahariver.eps}} \caption{Shape of the Altamaha River and possible monitoring locations \cite{telci:optimalwqmnd}. \label{fig: Altriver}} \end{center} \end{figure} As discussed in the beginning of this chapter, the GA tends to return a solution with 100$\%$ or the highest possible reliability. For implementation of the GA, three parameters are needed: the number of observations (i.e., SWMM runs), a generation size, and a population size. Large values of these parameters help the GA return a solution close to the true best solution but at the cost of computational efforts. In general, it is hard to pick these parameters that balance between computational efficiency and accuracy in a returned solution. In addition, when a decision maker is more interested in finding a solution with the shortest detection time at the slight cost of reliability (e.g., $99\%$ or $95\%$), it is hard to use the GA because there is no available information about what values of $P_v$ would be appropriate to find the best feasible solution. \subsection{Experimental Setup} Spill location $L_i^S$ is assumed as a discrete random variable from sample space $\{ 1,2, \ldots, 100\}$ and it is modeled as either a uniform discrete random variable or a non-uniform discrete random variable. The non-uniformly distributed $L_i^S$ case is assumed to detect a specific type of chemicals produced by paper mill factories. Two paper mill factories exist close to nodes 30 and 68 in Figure~\ref{fig: Subcatchment} around the Altamaha River \cite{mccarthy11} thus the probability of occurring a spill at the two nodes is assumed to be ten times higher than the probability at the other nodes. Threshold $C_{th}$ for the monitoring devices is set to $C_{th} \in \{ 0.0001, 0.05\}$. Then, four different cases are examined: $C_{th} = 0.0001$ $mg/\ell$ and uniformly distributed $L_i^S$; $C_{th} = 0.0001$ $mg/\ell$ and non-uniformly distributed $L_i^S$; $C_{th} = 0.05$ $mg/\ell$, and uniformly distributed $L_i^S$; and $C_{th} = 0.05$ $mg/\ell$ and non-uniformly distributed $L_i^S$. Under each case, $S_i^S$, $I_i^S$, $P_i^R$, $M$, and $q$ are varied. \begin{figure} \begin{center} \scalebox{0.7}{\includegraphics[width=5.0in,height=5.0in]{Subcatchment.eps}} \caption{Ten sub-catchments of the Altamaha River \cite{telci:real11}. \label{fig: Subcatchment}} \end{center} \end{figure} For a spill event, only one single instantaneous spill is considered. Spill starting time $S_i^S$ is uniformly distributed between 0 and 240 hours in the simulation clock and intensity of a spill $I_i^S$ is uniformly distributed between $10$ and $1000$ $g/\ell$. The hydrodynamics of the river system is adopted from \cite{telci:real11} where the steady-state hydraulic system is calibrated for the flow pattern in the river based on the data obtained from annual average flow rates measured in 2006 at twenty USGS gauging stations that are distributed throughout the river network. In this application, all lakes and impoundments were approximated as river reaches to simplify the network with an adjustment to the length of the reach. The rain events for the case study are also generated in the same way as in \cite{telci:real11}. The Altamaha River watershed is divided into ten sub-catchments as in Figure~\ref{fig: Subcatchment}. The rainfall measurements are obtained from different USGS observation stations close to these 10 sub-catchments in the year 2006. Then, using the results of the statistical analysis of these observations, five rain patterns are generated for each sub-catchment. Note that these five rain patterns are different for each sub-catchment and thus there are total fifty rain patterns for the entire watershed. Also, each rain pattern describes time-dependent rainfall events and keeps changing hydrologic conditions in each sub-catchment during process simulation. For each SWMM run, one out of five rain patterns is randomly selected for each sub-catchment, which defines $P_i^R$. All nodes in the same sub-catchment have the same rain pattern during process simulation. The number of monitoring devices $M$ is set to $M \in \{ 5,6,7,10\}$. Minimum reliability requirement $q$ is $q \in \{ 0.9, 1.0\}$, if $C_{th} = 0.0001$ $mg/\ell$. Otherwise $q \in \{ 0.9, 0.95\}$ are used since there is no solution with 100$\%$ detection reliability. % where the rainfall measurements in the year 2006 are obtained from ten different observation stations distributed throughout the watershed as obtained from National Climatic Data Center. %Five types of rain patterns are considered where each rain pattern is generated based on USGS data and includes time series data with rain intensities changing over time. The Altamaha River is divided into ten sub-catchments as in Figure \ref{fig: Subcatchment}. All nodes in the same sub-catchment have the same rain pattern during process simulation. For each SWMM run, one rain pattern out of the five is selected for each sub-catchment, which defines $P_i^R$. In a SWMM run, simulating hydrodynamics under a random rain event $P^R_i$ takes long but simulating contaminant transports under the hydrodynamics can be done relatively fast. Thus, \cite{telci:real11} design one SWMM run to generate 100 observations of $t'(\xv)$ by (i) simulating hydrodynamics with $P^R_i$ and (ii) performing 100 different spill events, each with randomly generated $I^S_i$ and $S_i^S$, under the hydrodynamics. Moreover, \cite{telci:real11} generate 1000 SWMM runs (thus, 100,000 observations of $t'(\xv)$) and apply the GA to find a solution with the smallest sample mean of the 100,000 observations. Following their experiments, when the spill location is uniformly distributed, one SWMM run performs 100 spill events (one spill event at each node with randomly generated $I^S_{i}$ and $S_i^S$) under shared hydrodynamics and obtain 100 observations of $t(\xv)$ and $R(\xv)$. When the spill location is non-uniformly distributed, 10 spill events are simulated at both nodes 30 and 68 and one spill event is simulated at each of the other nodes in one SWMM run. Thus, one SWMM run performs 118 spill events resulting in 118 observations of $t_i(\xv)$ and $R_i(\xv)$, respectively. While the GA completes 1000 SWMM runs prior to optimization search, NP+PFM performs additional SWMM runs as the optimization search continues and NP+PFM stops either when the stopping criterion in Section~\ref{subsec:stop} is satisfied with $n_E=10$ and $\epsilon = 0.5$ or when the total number of SWMM runs reaches 100. For the implementation of the GA, it always takes 1000 SWMM runs for each solution visited. As parameters of the GA, a population size of 100 and a generation size of 400 are initially used. Unless the GA with the initial population and generation sizes finds a comparable solution to that of NP+PFM, the population and generation sizes are increased up to 200 and 800, respectively, until a comparable solution is found by the GA. This chapter reports only the best results obtained by the GA with this parameter adjustment. It is clear that monitoring devices should be located toward downstream if one is to increase reliability, but this tends to increase the expected detection time. Thus there exists positive dependence between reliability and the expected detection time. It implies that the best feasible solution is likely (but not always) to have reliability exactly equal to or close to $q$. Thus $\theta_a$ and $\theta_d$ need to be selected carefully to ensure both convergence and efficiency as discussed in section \ref{subsec:psforPSc}. Recall $\rho_c$ represent the probability that $\lambda_{\ell}^{v_k}(\xv)$ converges to $0$ for any active constraint. It is known that $\sqrt{1.3}\leq \theta_a \leq \sqrt{1.9}$ and $0.7\leq \rho_c \leq 0.9$ can generally be a good compromise regarding efficiency and accuracy for PS$_c$. Thus we select $\theta_a = \sqrt{1.5}$ and $\theta_d = \frac{\sqrt{1.5}}{10} $ which ensures $\rho_c \approx 0.8082$ by Theorem~\ref{thm:PScactived}. For the implementation of NP+PFM, $\tau_k= 200$ is used. Also, $\Delta n = 100$ is selected for uniform spill probability case and $\Delta n = 118$ is selected for non-uniform spill probability cases. This ensures that each iteration of NP+PFM makes one SWMM run. %This choice was also shown performed well in the examples of Chapter 2. Therefore, $\theta_a = \sqrt{1.5}$ and $\theta_d = 0.1 \sqrt{1.5}$ are chosen in the case study. Three performance measures are reported to compare the GA and NP+PFM: NUM, EMD and ER. NUM represents the number of SWMM runs required by each optimization algorithm until it stops. It takes about 2 hours to complete one SWMM run on a PC with 2.6 GHz Intel Core 3 Quad Q8300 CPU while one iteration of NP+PFM takes less than two seconds. This implies that, with a single CPU, it would take about 2000 hours (83 days) to perform 1000 SWMM runs. In this chapter, multiple CPUs were used to obtain 1000 SWMM runs in two weeks. Since computation cost mainly depends on the number of SWMM runs needed, NUM is used as a measure for computation cost. Note that NUM is always 1000 for the GA because it requires the 1000 SWMM runs to be completed prior to optimization. On the other hand, NUM is usually smaller than 1000 for NP+PFM because it only continues until stopping criteria are satisfied. EMD and ER are quality measures for solutions returned by the two competing methods. EMD denotes estimated conditional minimum detection time in minutes and is calculated as conditional sample average of $t_i (\xv)$ given $R_i(\xv) = 1$ over 1000 SWMM runs. EMD can be interpreted as estimated time interval between the time when a spill occurs and the time when the spill is detected. ER represents estimated reliability. ER is calculated as sample average of $R_i(\xv)$ over 1000 SWMM runs. Note that EMD and ER values reported in the tables are calculated based on all observations from 1000 SWMM runs, even when NP+PFM stops with a fewer number of iterations than 1000. EMD and ER are meaningful up to the tenths digit and the ten-thousandths digit, respectively. %We record the number of SWMM runs required by each optimization algorithm until it stops (NUM). Note that NUM is 1000 for the GA while it can be a number fewer than 1000 for NP+PFM because it only continues to obtain more SWMM runs until stopping criteria are satisfied. To compare which algorithm gives a better solution, we estimate the conditional expected detection time $\E[ {t_i(\xv) \mid R_i(\xv) = 1}]$ and expected reliability $\Pr(R_i(\xv)=1)$ of the returned solutions by NP+PFM and GA using 1000 SWMM runs. The two estimates are denoted as conditional estimated minimum detection time (EMD) which is reported in minute and estimated reliability (ER), respectively. \subsection{Results} The main results are summarized in this section. For all figures in this section, the node indices are generated by rules in Section 3.1. \subsubsection{Uniform spill probability and small threshold monitoring system} Figure \ref{fig:Result1} shows solutions returned by NP+PFM in red circles and GA in black triangles when $C_{th} = 0.0001$ $mg/\ell$, uniformly distributed $L_i^S$, and $q=1.0$ for $M=5,6,7$, and $10$. When $M=5$ and $7$, both NP+PFM and GA return exactly the same solution and when $M=6$ and $10$, only one placement from NP+PFM is different of that from GA. Table \ref{tab:result1} shows that NP+PFM uses a smaller number of SWMM runs (smaller NUM) and yet returns equal or better solutions than GA. More specifically, when $M=6$, NP+PFM stopped only after 107 SWMM runs and the returned solution's EMD is 16 minutes shorter than that of the solution returned by the GA with 1000 SWMM runs. That is, the GA and NP+PFM find a similar solution with respect to EMD but NP+PFM saves approximately 1786 hours of CPU computation time when using a single PC with 2.6 GHz Intel Core 3 Quad Q8300 CPU. Figure \ref{fig:Result2} and Table \ref{tab:result2} show results when $C_{th} = 0.0001$ $mg/\ell$, uniformly distributed $L_i^S$, and $q=0.9$. As GA returns the same solutions when $q=1.0$, only the results of NP+PFM are reported as discussed earlier. The optimal solutions when $q=0.9$ tend to be located in the more upstream than those when $q=1.0$. Table \ref{tab:result2} shows NUM, EMD, and ER when $q=0.9$. EMD are smaller than those when $q=1.0$. Note that $C_{th}=0.0001$ $mg/\ell$ is so small that it rarely misses any spill. As a result, the estimated reliability is simply the maximum probability that the monitoring devices can achieve to detect a spill assuming no miss. \begin{figure} \begin{center} \scalebox{0.40}{\includegraphics[width=12.74in,height=10.0in]{Result1.eps}} \caption{Optimal solutions when $C_{th}=0.0001\ mg/\ell$, uniformly distributed $L_i^S$, and $q=1.0$. \label{fig:Result1}} \end{center} \end{figure} \begin{figure} \begin{center} \scalebox{0.40}{\includegraphics[width=12.74in,height=10.0in]{Result2.eps}} \caption{Optimal solutions when $C_{th}=0.0001\ mg/\ell$, uniformly distributed $L_i^S$, and $q=0.9$. \label{fig:Result2}} \end{center} \end{figure} \begin{table} \begin{center} \caption{Results of NP+PFM and GA when $C_{th}=0.0001\ mg/\ell$, uniformly distributed $L_i^S$ and $q=1.0$. \label{tab:result1}} {\begin{tabular}{@{}lcccccc}\hline M &\multicolumn{3}{c}{NP+PFM}&\multicolumn{3}{c}{GA}\\ \cline{2-7} & NUM & EMD & ER & NUM & EMD & ER\\ \hline 5 & 91 & 3156.1&1.0000&1000&3156.1&1.0000\\ 6 & 107& 2718.8&1.0000&1000&2734.8&1.0000\\ 7 & 277& 2329.0&1.0000&1000&2329.0&1.0000\\ 10 & 757& 1917.3&1.0000&1000&1923.3&1.0000\\ \hline \end{tabular}} \end{center} \end{table} \begin{table} \begin{center} \caption{Results of NP+PFM when $C_{th}=0.0001\ mg/\ell$, uniformly distributed $L_i^S$, and $q=0.9$. \label{tab:result2}} {\begin{tabular}{@{}lccc} \hline M & NUM & EMD & ER \\ \hline 5 & 154& 2782.7&0.9300\\ 6 & 321& 2407.1&0.9300\\ 7 & 205& 2222.7&0.9300\\ 10 & 459& 1738.1&0.9100\\ \hline \end{tabular}} \end{center} \end{table} \subsubsection{Non-uniform spill probability and small threshold monitoring system} \begin{figure} \begin{center} \scalebox{0.40}{\includegraphics[width=12.74in,height=10.0in]{Result3.eps}} \caption{Optimal solutions when $C_{th}=0.0001 mg/\ell$, non-uniformly distributed $L_i^S$, and $q=1.0$. \label{fig:Result3}} \end{center} \end{figure} \begin{figure} \begin{center} \scalebox{0.40}{\includegraphics[width=12.74in,height=10.0in]{Result4.eps}} \caption{Optimal solutions when $C_{th}=0.0001\ mg/\ell$, non-uniformly distributed $L_i^S$, and $q=0.9$. \label{fig:Result4}} \end{center} \end{figure} \begin{table} \begin{center} \caption{Results of NP+PFM and GA when $C_{th}=0.0001\ mg/\ell$, non-uniformly distributed $L_i^S$, and $q=1.0$. \label{tab:result3}} {\begin{tabular}{@{}lcccccc}\hline M &\multicolumn{3}{c}{NP+PFM}&\multicolumn{3}{c}{GA}\\ \cline{2-7} & NUM & EMD & ER & NUM & EMD & ER\\ \hline 5 & 79 & 2778.2&1.0000&1000&2778.2&1.0000\\ 6 & 126& 2407.1&1.0000&1000&2407.1&1.0000\\ 7 & 241& 2122.8&1.0000&1000&2122.8&1.0000\\ 10 & 827& 1677.9&1.0000&1000&1762.2&1.0000\\ \hline \end{tabular}} \end{center} \end{table} \begin{table} \begin{center} \caption{Results of NP+PFM when $C_{th}=0.0001\ mg/\ell$, non-uniformly distributed $L_i^S$ and $q=0.9$. \label{tab:result4}} {\begin{tabular}{@{}lccc} \hline M & NUM & EMD & ER \\ \hline 5 & 120& 2450.4&0.9407\\ 6 & 305& 2144.6&0.9407\\ 7 & 156& 1930.0&0.9407\\ 10 & 602& 1488.9&0.9068\\ \hline \end{tabular}} \end{center} \end{table} Now this section discusses the results with $C_{th} = 0.0001$ $mg/\ell$ and non-uniformly distributed $L_i^S$. Figures \ref{fig:Result3} and \ref{fig:Result4} show optimal locations of monitoring devices when $q=1.0$ and $q=0.9$, respectively. In Figure \ref{fig:Result3}, solutions from NP+PFM and GA are exactly same when $M=5,6$, and $7$, but for $M=10$ case, results from NP+PFM and GA are different. The monitoring devices are located exactly on the nodes close to the paper mill factories for any value of $M$ and for both NP+PFM and GA. When $q = 0.9$, monitors tend to be placed upward compared to the $q=1.0$ case as shown in Figure~\ref{fig:Result3} and Figure~\ref{fig:Result4}, but two monitoring devices are still located exactly on the nodes closed to the paper mill factories. Tables \ref{tab:result3} and \ref{tab:result4} show three main performance measures. Table~\ref{tab:result3} shows that NP+PFM returns the same or a better solution with significantly smaller number of SWMM runs than the GA does. \subsubsection{Uniform spill probability and large threshold monitoring system} Next this section considers the case with uniformly distributed $L_i^S$ but larger threshold $C_{th}$ (i.e., $C_{th}=0.05$ $mg/\ell$). The larger threshold $C_{th}$ increases the chance that a monitoring device misses a spill due to flow speed and dilution by dynamic flow and rain. Figure \ref{fig:Result5} shows solutions returned by NP+PFM and GA and Table \ref{tab:result5} shows NUM, EMD, and ER of the returned solutions. NP+PFM returns solutions with smaller EMD and smaller ER. Especially, when $M=10$, a solution from NP+PFM saves about 400 minutes according to EMD but its detection reliability is also dropped by $1.34\ \%$ compared to a solution from GA. \begin{figure} \begin{center} \scalebox{0.40}{\includegraphics[width=12.74in,height=10.0in]{Result5.eps}} \caption{Optimal solutions with $C_{th}=0.05\ mg/\ell$, uniformly distributed $L_i^S$, and $q=0.95$. \label{fig:Result5}} \end{center} \end{figure} \begin{figure} \begin{center} \scalebox{0.40}{\includegraphics[width=12.74in,height=10.0in]{Result6.eps}} \caption{Optimal solutions with $C_{th}=0.05\ mg/\ell$, uniformly distributed $L_i^S$, and $q=0.9$. \label{fig:Result6}} \end{center} \end{figure} By comparing Figures \ref{fig:Result2} and \ref{fig:Result6}, one can notice that the optimal locations of monitoring devices when $C_{th}=0.05$ $mg/\ell$ tend to be placed in more downstream than those when $C_{th}=0.0001$ $mg/\ell$. Also, Table~\ref{tab:result6} shows that NP+PFM needs more SWMM runs (thus, larger NUM) until it stops and the estimated detection times (EMD) are much larger when $C_{th}=0.05$ $mg/\ell$ than when $C_{th}=0.0001$ $mg/\ell$. It is well expected because this case is more difficult than the first case with the small threshold. The estimated reliability levels (ER) of the returned solutions are greater than $q=0.9$, providing an evidence that the solutions are feasible, but are smaller than the reliability levels obtained when $C_{th} = 0.0001$ $mg/\ell$. \begin{table} \begin{center} \caption{Results of NP+PFM and GA when $C_{th}=0.05\ mg/\ell$, uniformly distributed $L_i^S$, and $q=0.95$. \label{tab:result5}} {\begin{tabular}{@{}lcccccc}\hline M &\multicolumn{3}{c}{NP+PFM}&\multicolumn{3}{c}{GA}\\ \cline{2-7} & NUM & EMD & ER & NUM & EMD & ER\\ \hline 5 & 893& 3750.7&0.9534&1000&3779.9&0.9559\\ 6 & 656& 3230.0&0.9570&1000&3322.1&0.9600\\ 7 & 598& 2764.7&0.9553&1000&3043.0&0.9728\\ 10 & 809& 2272.6&0.9626&1000&2670.3&0.9760\\ \hline \end{tabular}} \end{center} \end{table} \begin{table} \begin{center} \caption{Results of NP+PFM when $C_{th}=0.05\ mg/\ell$, uniformly distributed $L_i^S$, and $q=0.9$. \label{tab:result6}} {\begin{tabular}{@{}lccc} \hline M & NUM & EMD & ER \\ \hline 5 &614&3349.7 &0.9041 \\ 6 &702&2851.2 &0.9018 \\ 7 &800&2668.4 &0.9162 \\ 10 &586&2130.3 &0.9021 \\ \hline \end{tabular}} \end{center} \end{table} \subsubsection{Non-uniform spill probability and large threshold monitoring system} Finally, non-uniformly distributed $L_i^S$ are considered with $C_{th}=0.05$ $mg/\ell$. Figure~\ref{fig:Result7} and Table~\ref{tab:result7} show the optimal solutions returned by NP+PFM and GA and their performance measures, respectively. Similar to Table \ref{tab:result5}, NP+PFM provides solutions whose EMD is smaller and ER is also smaller than a solution from GA. \begin{figure} \begin{center} \scalebox{0.4}{\includegraphics[width=12.74in,height=10.0in]{Result7.eps}} \caption{Optimal solutions with $C_{th}=0.05\ mg/\ell$, non-uniform $L_i^S$, and $q= 0.95$. \label{fig:Result7}} \end{center} \end{figure} \begin{figure} \begin{center} \scalebox{0.4}{\includegraphics[width=12.74in,height=10.0in]{Result8.eps}} \caption{Optimal solutions with $C_{th}=0.05\ mg/\ell$, non-uniform $L_i^S$, and $q= 0.9$. \label{fig:Result8}} \end{center} \end{figure} Solutions returned by NP+PFM are shown in Figure \ref{fig:Result8} and Table \ref{tab:result8} when $q=0.9$ is considered. Compared to the second case where $C_{th} = 0.0001$ $mg/\ell$ and non-uniformly distributed $L_S^i$, this is a more difficult case because a spill can be missed. NP+PFM needs more SWMM runs until it stops and EMD are larger than those from the second case. The returned solutions for $M=5,6,7,$ and $10$ seem all feasible based on estimated reliability ER. \begin{table} \begin{center} \caption{Results of NP+PFM and GA when $C_{th}=0.05\ mg/\ell$, non-uniformly distributed $L_i^S$, and $q=0.95$. \label{tab:result7}} {\begin{tabular}{@{}lcccccc}\hline M &\multicolumn{3}{c}{NP+PFM}&\multicolumn{3}{c}{GA}\\ \cline{2-7} & NUM & EMD & ER & NUM & EMD & ER\\ \hline 5 & 421 &3198.3&0.9527&1000&3446.4&0.9674\\ 6 & 954 &2795.4&0.9571&1000&2895.5&0.9706\\ 7 & 544 &2472.8&0.9507&1000&2660.3&0.9763\\ 10 & 1000&1961.5&0.9545&1000&2269.9&0.9808\\ \hline \end{tabular}} \end{center} \end{table} \begin{table} \begin{center} \caption{Results of NP+PFM when $C_{th}=0.05\ mg/\ell$, non-uniformly distributed $L_i^S$, and $q=0.9$. \label{tab:result8}} {\begin{tabular}{@{}lccc} \hline M & NUM & EMD & ER \\ \hline 5 &320 &2834.9 & 0.9035\\ 6 &350 &2498.5 & 0.9109\\ 7 &543 &2261.1 & 0.9132\\ 10 &675 &1835.3 & 0.9135\\ \hline \end{tabular}} \end{center} \end{table} \section{Conclusions} \label{sec:Conclusion} This chapter considers the problem of designing a water quality monitoring network for river systems where the goal is to find the optimal location of a finite number of monitoring devices that minimizes the expected detection time of a contaminant spill event while guaranteeing detection reliability greater or equal to some constant. The problem is formulated as a DOvS problem with a stochastic constraint and implementation issues in a DOvS algorithm, namely NP+PFM, are discussed. The advantage of NP+PFM is that it obtains additional observations as needed while the GA requires the decision maker to determine how many observations to take prior to optimization. The chosen number of observations for the GA may be too small, introducing large estimation error in performance measures which in turns increases a chance to return a solution quite different from the true optimal, or too large, wasting time on taking unnecessary observations. In addition, NP+PFM with the penalty sequence used in the chapter has a guarantee on global convergence when there is no active constraint while the GA is a heuristic algorithm that does not provide such guarantee. The experimental results on the Altamaha River show that NP+PFM handles feasibility of solutions on a stochastic constraint effectively and returns a good feasible solution. Also, NP+PFM finds the same or better solution than the GA algorithm with significant savings in computational efforts. The proposed algorithm can be applicable to other river systems and extended to more realistic and complex settings such as those that account for multiple spills, complicated rain patterns, or detection error of a monitoring device. \chapter{Improving Performance of Penalty Function with Memory by Approximate Simulation Budget Allocation} In discrete optimization via simulation (DOvS), decision variables are discrete and the number of potential solutions is large, requiring a search method to determine which solutions to simulate in the next iteration. A number of efficient DOvS algorithms are presented. For example, see \cite{as_convergence}, \cite{BEESE}, \cite{hncompass}, \cite{fumras}, \cite{pjnb}, \cite{sostochastic}, and \cite{xu:AHA}. Recently stochastically constrained DOvS problems have received attentions from the simulation community. For the problems, we propose penalty function with memory (PFM) to handle stochastic constraints on secondary performance measures in Chapter II. PFM determines a penalty sequence of each solution based on past results of feasibility checks and a DOvS algorithm combined with PFM, denoted as ${\cal D}+$PFM, can solve a stochastically constrained DOvS problem. To ensure a good performance of ${\cal D}+$PFM, two things need to happen: (i) PFM should make a correct decision on feasibility with high probability when a solution is visited; and (ii) a DOvS algorithm should find the most promising region (or solution) correctly at each search iteration. The version proposed in Chapter II currently takes the equal number of observations on each visited solution (equal allocation) and uses cumulative sample means to make a feasibility decision or find a feasible promising region. If we can efficiently and accurately select the most promising region (or solution) with accurate feasibility checks, the performance of ${\cal D}+$PFM can be further improved. For stochastically unconstrained DOvS problems, a number of ranking and selection (R\&S) procedures are used to further improve efficiency or accuracy of a DOvS algorithm. For example, \cite{bosel:cleanup} and \cite{pjnb} propose statistical comparison procedures for either cleaning up solutions at the end of the search (in order to find the best among all solutions visited during the search) or assisting a DOvS algorithm with finding a correct promising region. Although the statistical comparison procedures are shown to be very useful for clean-up, it is known that they tend to `overdo' when they are applied within a DOvS algorithm at each search iteration because they try to provide a guarantee on the probability of correctly selecting the true best solution (PCS). On the other hand, there are statistical procedures called optimal computing budget allocation (OCBA). These procedures allocate a finite computing budget (either count or time) among a finite number of solutions to maximize PCS. OCBA often suffers due to low PCS or inability telling how many observations would be needed for a correct selection in R\&S. However, OCBA finds a good application in DOvS because DOvS algorithms need to allocate a finite budget among visited solutions at each search iteration. For example, \cite{He10CEOCBA} presents that OCBA improves the performance of optimization algorithms significantly compared to equal allocation. To improve the performance of ${\cal D}+$PFM for stochastically constrained DOvS problems, constrained R\&S procedures can be considered. \cite{ak} and \cite{Healey:HAK} propose statistically valid procedures that select the best feasible solution with PCS guarantee in the presence of multiple stochastic constraints. They are good candidate procedures for clean-up at the end of the optimization search. \cite{huterragu} and \cite{Lee12OCBACO} provide optimal budget allocation methods that allocate finite sampling budget to maximize the probability of correctly selecting the best feasible solution. \cite{huterragu} allows observations to have general marginal distributions but it is hard to implement their procedure when the number of solutions is large. On the other hand, \cite{Lee12OCBACO} provides a procedure, called OCBA-CO, that is easy to implement even with a large number of solutions but the procedure requires the normal assumption on observations. In this chapter, we focus on improving the performance of ${\cal D}+$PFM by combining it with a budget allocation method. More specifically, we consider OCBA-CO from \cite{Lee12OCBACO} due to its easiness in implementation and applicability to a large number of solutions, modify it so that convergence properties of ${\cal D}+$PFM can be preserved when the modified budget allocation method is combined with ${\cal D}+$PFM, and test how much improvement is achieved over equal allocation. We call the modified budget allocation method the approximate budget allocation (ABA). ABA is different from OCBA-CO in two aspects: \begin{enumerate} \item OCBA-CO always considers a fixed set of simulated solutions and re-allocates cumulative total number of simulation budget among all solutions in the set at each iteration. On the other hand, the size of a set of sampled solutions and identities of elements in the set can be different in ${\cal D}+$PFM at each iteration. Thus ABA allocates only new additional simulation budget among sampled solutions at each iteration. \item Assuming a minimization problem, OCBA-CO chooses a solution with the smallest sample mean of the primary performance measure as the sample best among solutions whose estimates of the secondary performance measures satisfy stochastic constraints. However, ABA combined with ${\cal D}+$PFM (which we denote as ${\cal D}+$PFM+ABA) define the sample best as a solution with the smallest sum between sample mean of the primary performance measure and penalty value. \end{enumerate} This chapter is organized as follows: Section~\ref{chap4:sec:background} defines our problem and notation, and then briefly reviews PFM and OCBA-CO. Section~\ref{chap4:sec:methodology} introduces ${\cal D}+$PFM+ABA and provides its asymptotic convergence properties along with proofs. Experimental results of ${\cal D}$+PFM+ABA on the three numerical examples are presented in Section~\ref{chap4:sec:numericalexample}, followed by concluding remarks in Section~\ref{chap4:sec:conclusion}. \section{Background} \label{chap4:sec:background} In this section, we define our problem and notation, and review PFM and OCBA-CO. \subsection{Problem} %All notation is provided in \ref{tab:notation}. From Section~\ref{chapt:subsec:formulation}, recall that $\Theta$ represents the whole decision variable space which is a discrete and finite set in $\mathbb{R}^d$ and $\xv = (x_1, \ldots, x_d)$ represents a solution (or decision variable). Let $G(\xv)$ and $H_{\ell}(\xv)$ represent the primary performance measure and the secondary performance measure of the $\ell$th constraint, respectively and $\Gixv$ and $\Hlixv$ represent their $i$th observations from stochastic simulation. Then our DOvS problem with stochastic constraints is defined as follows: \begin{equation} \label{chap4:eqn:originalprob} \begin{array}{l} \mbox{argmin}_{\xv \in \Theta} \E [G(\xv)],\\ \mbox{subject to } \E [H_{\ell}(\xv)] \geq \ql,\ \ell \in \Lambda,\\ \end{array} \end{equation} where $m$ is the number of stochastic constraints and $\Lambda$ is a set of all indices for stochastic constraints (i.e., $\Lambda = \{1,2,\ldots,m\}$). We also make the following assumption throughout the chapter. \vspace{-6pt} \begin{assumption} \label{chap4:assump:normal} The original problem (\ref{chap4:eqn:originalprob}) satisfies the following:\vspace{-12pt} \begin{enumerate} \item $G_i(\xv)$ are normally distributed and independent for all $i=1,2, \ldots$, given any $\xv \in \Theta$,\vspace{-12pt} \item $H_{\ell i}(\xv)$ are normally distributed and independent for all $i=1,2, \ldots$, given any $\ell \in \{1,2, \ldots ,m\}$ and any $\xv \in \Theta$.\vspace{-12pt} %\item $G_i(\xv)$ and $H_{\ell i}(\xv)$, $\ell = 1,2, \ldots ,m$, are also independent. \item There exists a unique best feasible solution. \end{enumerate} \end{assumption} The first two assumptions in Assumption~\ref{chap4:assump:normal} are often used in R$\&$S. In practice, $G_i(\xv)$ and $H_{\ell i}(\xv)$ have approximate normal distributions, if $G_i(\xv)$ and $H_{\ell i}(\xv)$ are either within-replication averages or batch means. We test the performance of ${\cal D}$+PFM+ABA when the normal assumption is violated in Section~\ref{chap4:section:numex:sS}. \subsection{PFM} We review PFM. In this chapter, ``solution $\xv$ is visited" means that solution $\xv$ is simulated to obtain additional observations. Thus, it is possible that a solution is sampled under a DOvS algorithm ${\cal D}$, but not visited if no additional observation is obtained from the solution based on a simulation budget allocation method. We list notation for PFM below: \noindent$k\equiv$ iteration counter;\\ %$r\equiv$ counter for the number of occurring the event $\{ \Delta n_k (\xv) >0 \}$;\\ %$v_k(\xv)\equiv$ the number of occurring the event $\{ \Delta n_k (\xv) >0 \}$ to iteration $k$ for $\xv$;\\ $r\equiv$ counter for the number of visits;\\ $v_k(\xv)\equiv$ the number of visits up to iteration $k$ for $\xv$;\\ $n_r(\xv)\equiv$ the total number of observations obtained up to the $r$th visit for $\xv$;\\ $n_{v_k}(\xv) \equiv n_{v_k(\xv)}(\xv)$, the total number of observations obtained up to iteration $k$ for $\xv$;\\ $\Delta n_r (\xv)\equiv$ the number of new observations obtained at the $r$th visit for $\xv$;\\ $\Gkbar(\xv) \equiv$ $\frac{1}{\nvk(\xv)} \sum_{i=1}^{\nvk(\xv)} \Gixv$, cumulative sample mean of observations $\Gixv$ for the primary performance measure up to iteration $k$;\\ $\Hlkbar(\xv) \equiv$ $\frac{1}{\nvk(\xv)} \sum_{i=1}^{\nvk(\xv)} \Hlixv$, cumulative sample mean of observations $\Hlixv$ for the $\ell$th secondary performance measure up to iteration $k$;\\ %$\lambda_\ell^{r}(\xv)\equiv$ a penalty sequence of the $\ell$th constraint at the $r$th visit for $\xv$;\\ %$\lambda_\ell^{v_k}(\xv)\equiv$ a penalty sequence of the $\ell$th constraint at visit $v_k(\xv)$ for $\xv$;\\ $\hat{\xv}_k^*\equiv$ the sample best among all sampled solutions up to iteration $k$; and \\ $\xv^b_o \equiv$ the true best feasible solution up to iteration $k$. As discussed in Section \ref{chap2:sec:PO}, a new objective function with PFM at search iteration $k$ is \begin{equation} \label{eq:newobj} Z_{k}(\xv)= \Gkbar(\xv) + \sum_{\ell \in \Lambda} \big[ \lambda_{\ell}^{v_k}(\xv) \times \max \{0, \ql - \Hlkbar(\xv)\} \big], \end{equation} where $\lambda_\ell^{v_k}(\xv)$ is a penalty sequence of the $\ell$th constraint at visit $v_k(\xv)$ for $\xv$. Whenever a solution $\xv$ is visited, a feasibility check is performed. If $\xv$ is declared as feasible, a positive constant smaller than 1 (depreciation factor) is multiplied to the previous penalty sequence value, $\lambda_\ell^{v_{k-1}}(\xv)$. Similarly, if $\xv$ is declared as feasible, a positive constant larger than 1 (appreciation factor) is multiplied to $\lambda_\ell^{v_{k-1}}(\xv)$. In Section~\ref{chap2:sec:PO}, two example penalty sequences, PS$_c$ and PS$_f$ are presented. PS$_f$ generally performs better than PS$_c$ as shown in Sections~\ref{chap2:sec:PS} and \ref{chap2:sec:numericalexamples} because PS$_c$ uses same appreciation and depreciation factors for every solution while PS$_f$ adaptively adjusts appreciation and depreciation factors according to the level of feasibility of each solution. When PS$_f$ with finite $\Delta n_r(\xv)$ is used, we should guarantee Assumption~\ref{assump:symmetric}: both $\lim_{r \rightarrow \infty} \Delta n_r(\xv) = \Delta n (\xv)$ and symmetric marginal distributions of all secondary performance measures. However, a simulation budget allocation method tends to change $\Delta n_r(\xv)$ at every iteration and we may have a problem with either $\lim_{r \rightarrow \infty} \Delta n_r(\xv) = \Delta n (\xv)$ or symmetry of the underlying distribution of $H_{\ell}(\xv)$. Therefore, throughout this chapter, we consider only two penalty sequences that do not require Assumption~\ref{assump:symmetric}: (i) PS$_f$ with increasing $\Delta n_r(\xv)$ and (ii) an adaptive penalty sequence with constants (APS$_c$). More specifically, APS$_c$ is constructed as follows: Recall $S_\ell^{v_k}(\xv) \equiv \sum_{r=1}^{v_k(\xv)} \zeta_{\ell r}(\xv)$ where $\zeta_{\ell r}(\xv) \equiv \sum_{i=n_{r-1}(\xv)+1}^{n_{r-1}(\xv) + \Delta n_r(\xv)} { H_{\ell i}(\xv) - q_\ell \over \sqrt{\Delta n_r(\xv)}}$ and the infeasible probability, \begin{equation} \label{eq:activedecision} \hat{p}^{v_k}_{\ell}(\xv) = \frac{\sum_{r=1}^{v_k(\xv)} \mathbb{I} \{ \zeta_{\ell r}(\xv) < 0 \}}{v_k(\xv)}. \end{equation} Prior to running APS$_c$, the following parameters should be chosen:\vspace{-6pt} \begin{itemize} \item $0 < h_1 < \ldots < h_u < 0.5 < h_{u+1} < \ldots < h_v < 1$;\vspace{-6pt} \item $ 1 < a_{\nu} $ and $ 0 < d_{\nu} < 1$, for all $\nu=1,2, \ldots, v$.\vspace{-6pt} \end{itemize} Then APS$_c$ is defined as follows: \vspace{-18pt} \begin{center} \fbox{ \begin{minipage}{\textwidth} \vspace{0.05in} \noindent {\bf Adaptive Penalty Sequence with Constants (APS$_c$)} \begin{eqnarray*} \lambda_{\ell}^{v_k}(\xv) = \left\{ \begin{array}{ll} \lambda^{v_{k-1}}_{\ell}(\xv) \times \alpha^{v_k}_{\ell}(\xv), & \mbox{ if } S_\ell^{v_k}(\xv) < 0,\\ \lambda^{v_{k-1}}_{\ell}(\xv) \times \delta^{v_k}_{\ell}(\xv), & \mbox{ if } S_\ell^{v_k}(\xv) \geq 0,\\ \end{array} \right. \end{eqnarray*} where $\lambda^0_{\ell}(\xv)$ is the initial penalty constant $\lambda^0_{\ell}$, $\alpha^{v_k}_{\ell}(\xv)$ is an appreciation function, and $\delta^{v_k}_{\ell}(\xv)$ is a depreciation function whose values are determined by $\hat{p}^{v_k}_{\ell}(\xv)$ and the following table: \begin{center} \small{ %\begin{tabular}{p{0.8cm}p{0.8cm}p{0.7cm}p{0.6cm}p{2.0cm}p{2.6cm}p{2.3cm}p{0.6cm}p{1.2cm}p{0.8cm}} \begin{tabular}{cccccc} \hline $\hat{p}^{v_k}_{\ell}(\xv)$ & $ [0, h_1) $ & $[h_1, h_2) $ & $\ldots$& $(h_{v-1}, h_v]$ & $(h_v,1]$\\ \hline $\alpha^{v_k}_{\ell}(\xv)$ & $a_0$ & $a_1$ & $\ldots$ & $a_{v-1}$ & $a_v$ \\ \hline $\delta^{v_k}_{\ell}(\xv)$ & $d_0$ & $d_1$ & $\ldots$ & $d_{v-1}$ & $d_{v}$ \\ \hline\end{tabular} } \end{center} \end{minipage} } \end{center} \vspace{12pt} Note that APS$_c$ is PS$_f$ with finite $\Delta n_r(\xv)$ and $\gamma^k_{\ell} = 0$ but its convergence proofs follow convergence proofs of PS$_c$ in Chapter II. In this chapter, if simulation budget allocated at each iteration increases, then PS$_f$ is used. If the simulation budget is finite but can be different at each iteration, then APS$_c$ is used. \subsection{OCBA-CO} In this section, we review OCBA-CO in \cite{Lee12OCBACO}. OCBA-CO considers a fixed set, say $\Omega$. Let $T$ define a total computing budget we want to allocate and $N(\xv)$ define the number of simulation observations for solution $\xv$ (i.e., $T=\sum_{\xv \in \Omega} N(\xv)$). The goal of the optimal budget allocation is to intelligently control each $N(\xv)$ in a way that PCS is maximized. PCS is defined as follows: \begin{eqnarray*} \mbox{PCS} &= &\Pr \left\{ \cap_{\ell \in \Lambda}( \bar{H}_{\ell N}(\xv^b_o)\geq \ql )\right. \\ & &\left.\cap_{\xv \in \Omega, \xv \ne \xv^b_o} \Big\{ \Big[ \ \cap_{\ell\in \Lambda} (\bar{H}_{\ell N}(\xv) \geq \ql ) \cap (\bar{G}_{N}(\xv^b_o) >\bar{G}_{N}(\xv) ) \Big]^c \Big\}\right\}, \end{eqnarray*} where $\bar{G}_{N}(\xv) = \frac{\sum_{i=1}^{N(\xv)} G_{i}(\xv)}{N(\xv)}$ and $\bar{H}_{\ell N}(\xv) = \frac{\sum_{i=1}^{N(\xv)} H_{\ell i}(\xv)}{N(\xv)}$. The OCBA-CO problem is defined as follows: \begin{equation} \label{chap4:eq:OCBA_CO_Problem} \max_{N(\xv), \forall \xv \in \Omega} \mbox{PCS} \quad \mbox{subject to} \quad \sum_{\xv \in \Omega} N(\xv) = T. \end{equation} However, it is known that there is no closed-form for PCS. Bonferroni inequality provides that PCS is bounded below by APCS (i.e., PCS $\geq$ APCS) which is defined as \vspace{-12pt} \begin{eqnarray}\label{chap4:eq:APSC} \mbox{APCS} &= &\sum_{\ell \in \Lambda} \Pr\big( \bar{H}_{\ell N}(\xv^b_o)\geq \ql \big)+(1-m) \\ \nonumber & &-\sum_{\xv \in \Omega, \xv \ne \xv^b_o} \Big[ \min \Big[ \min_{\ell \in \Lambda} \Pr\big(\bar{H}_{\ell N}(\xv) \geq \ql \big) , \Pr\big(\bar{G}_{N}(\xv^b_o) > \bar{G}_{N}(\xv) \big) \Big] \Big]. \end{eqnarray} Let $\ell^*(\xv)$ define the most critical constraint (i.e., $\ell^*(\xv) \equiv\mbox{arg} \min_{\ell \in \Lambda} \Pr \{ \bar{H}_{\ell N}(\xv) \geq \ql \}$), then, we can define two sets $\Omega_O$ and $\Omega_F$ as follows. \vspace{-6pt} \begin{eqnarray*} \Omega_O & \equiv & \{ \xv | \xv \ne \xv^b_o, \xv \in \Omega, \Pr(\bar{H}_{\ell^* N}(\xv) \geq q_{\ell^*} (\xv)) \geq \Pr(\bar{G}_{N} (\xv^b_o)>\bar{G}_{N} (\xv))\}, \mbox{ and}\\ \Omega_F & \equiv & \{ \xv | \xv \ne \xv^b_o, \xv \in \Omega, \Pr(\bar{H}_{\ell^* N}(\xv) \geq q_{\ell^*} (\xv)) < \Pr(\bar{G}_{N} (\xv^b_o)>\bar{G}_{N} (\xv))\}.\vspace{-6pt} \end{eqnarray*} Note that $\Omega_O$ represents the set of solutions where optimality is a more dominant issue and $\Omega_F$ represents the set of solutions where feasibility is a more dominant issue. Then APCS of (\ref{chap4:eq:APSC}) can be rewritten as \vspace{-6pt} \begin{eqnarray}\label{chap4:eq:APSCre} \mbox{APCS} &= &\sum_{\ell \in \Lambda} \Pr\big( \bar{H}_{\ell N}(\xv^b_o)\geq \ql \big)+(1-m) \\ \nonumber & &-\sum_{\xv \in \Omega_F} \Pr\big(\bar{H}_{\ell^* N}(\xv) \geq \ql \big) -\sum_{\xv \in \Omega_O} \Pr\big(\bar{G}_{ N}(\xv^b_o) > \bar{G}_{N}(\xv) \big). \end{eqnarray} Let $\beta(\xv)$ be the proportion of the total budget allocated to solution (i.e., $N(\xv) = \beta(\xv) \cdot T$). Instead of Problem (\ref{chap4:eq:OCBA_CO_Problem}) we can solve the following problem to optimally allocate $T$: \vspace{-6pt} \begin{equation} \label{chap4:eq:OCBA_CO_Problemre} \max_{N(\xv), \forall \xv \in \Omega} \mbox{APCS} \quad \mbox{subject to} \quad \sum_{\xv \in \Omega} \beta(\xv) = 1. \end{equation} Recall $\sigma_0^2 (\xv) = \Var(G(\xv))$ and $\sigma_{\ell}^2 (\xv) = \Var(H_{\ell}(\xv))$, for any $\ell \in \Lambda$. In \cite{Lee12OCBACO}, the following theorem is provided and proved: \vspace{-6pt} \begin{lemma}\label{chap4:lemma:OCBAconverge} Define the noise-to-signal ratio, $\eta(\xv)$, as follows: \begin{eqnarray*} \eta(\xv) = \left\{ \begin{array}{ll} \frac{\sigma_0(\xv)}{\E[G(\xv)]- \E[G(\xv^b_o)]} \sqrt{1+\frac{\sigma_0^2(\xv^b_o)/\beta(\xv^b_o)}{\sigma_0^2(\xv)/\beta(\xv)}}, & \mbox{ if } \xv \in \Omega_O,\\ \frac{\sigma_{\ell^*}(\xv)}{q_{\ell^*} - \E[H_{\ell^*}(\xv)]}, & \mbox{ if } \xv \notin \Omega_O. \end{array}\right. \end{eqnarray*} \vspace{-6pt} Then, {\rm APCS} is asymptotically (as $T\rightarrow \infty$) maximized if \begin{eqnarray*} \frac{\beta(\xv)}{\beta(\yv)} & = & \left( \frac{\eta(\xv)}{\eta(\yv)} \right)^2, \forall \xv \ne \yv \ne \xv^b_o, \mbox{ and}\\ \beta(\xv^b_o) & = &\max(\beta_{O}(\xv^b_o), \beta_{F}(\xv^b_o)), \end{eqnarray*} where $\beta_O(\xv^b_o) = \sigma_{0}(\xv^b_o) \sqrt{\sum_{\xv \in \Omega_O} \frac{\beta^2(\xv)}{\sigma_0^2(\xv)}}$ and $\frac{\beta_F(\xv^b_o)}{\beta(\xv)} = \left(\frac{\eta(\xv^b_o)}{\eta(\xv)}\right)^2, \forall \xv \ne \xv^b_o$. \end{lemma} %\noindent {\bf Proof of Lemma~\ref{chap4:lemma:OCBAconverge}.} The proof is appeared in \cite{Lee12OCBACO}.$\Box$ As in \cite{Lee12OCBACO}, we assume $\beta(\xv^b_o) >> \beta(\xv)$ for all $\xv \in \Omega_O$ in this chapter. This assumption produces $\eta(\xv) \approx \frac{\sigma_0(\xv)}{\E[G(\xv)]- \E[G(\xv^b_o)]}$ and it makes implementation of OCBA-CO easier. Based on Lemma~\ref{chap4:lemma:OCBAconverge}, a heuristic sequential allocation procedure, OCBA-CO, is constructed to solve (\ref{chap4:eq:OCBA_CO_Problemre}) and detailed steps of OCBA-CO are given in \cite{Lee12OCBACO}. \section{Methodology} \label{chap4:sec:methodology} In this section, we provide ${\cal D}$+PFM+ABA and prove its convergence properties. \subsection{${\cal D}$+PFM+ABA} ${\cal D}$+PFM+ABA represents a framework combining ${\cal D}$+PFM of Chapter II with a revised OCBA-CO procedure, ABA. To explain ${\cal D}$+PFM+ABA, we first need additional notation. \noindent ${\cal Q}_k \equiv$ a set of solutions to which we apply ABA at iteration $k$;\\ ${\cal V}_k \equiv$ a set of all visited solutions up to iteration $k$;\\ $\Delta N_k (\xv)\equiv$ the number of new observations obtained at iteration $k$ for $\xv$\\ $\Delta T_k \equiv$ total number of new observations obtained at iteration $k$;\\ % \indent \indent (i.e., $\Delta T_k = \sum_{\xv \in \Theta} \Delta N_{k}(\xv)$);\\ $s^2_{0k}(\xv) \equiv$ sample variance of all $\Gixv$ obtained up to iteration $k$;\\ $s^2_{\ell k}(\xv) \equiv$ sample variance of all $\Hlixv$ obtained up to iteration $k$, for $\ell \in \Lambda$; and\\ $\tilde{\xv}_k\equiv$ the sample best among all solutions in ${\cal Q}_k$. OCBA-CO always considers the same set of solutions, say $\Omega$, but in ABA, ${\cal Q}_k$ can be different at each iteration. The budget allocation ABA in ${\cal D}$+PFM+ABA, is designed to solve the following problem: \begin{equation} \label{chap4:eq:PFM_OCBA_problem} \max_{\Delta N_k (\xv), \forall \xv \in {\cal Q}_k} \mbox{PCS} \quad \mbox{subject to} \quad \sum_{\xv \in {\cal Q}_k} \Delta N_k (\xv) = \Delta T_k . \end{equation} Let $\hat{\eta}_k(\xv)$ represent a strongly consistent estimator of $\eta(\xv)$ and $\hat{\beta}_{k}(\xv)$ represent an estimator $\beta(\xv)$ at iteration $k$. $\tilde{\cal Q}_{kO}$ and $\tilde{\cal Q}_{kF}$ represent estimates of ${\cal Q}_{kO}$ and ${\cal Q}_{kF}$ in implementation where ${\cal Q}_{kF}$ (${\cal Q}_{kO}$) represents the set of solutions in ${\cal Q}_k$ in which optimality (feasibility) is a more dominant issue. An implementation summary of ${\cal D}$+PFM+ABA is provided in Figure \ref{alg:PFM_OCBA}. \noindent {\bf Remark 3}: In practice, an active constraint often produces an extremely small $\bar{H}_{\ell^* k} (\xv) - q_{\ell^*}(\xv)$, which, in turn, results in an extremely large $\hat{\eta}_{k}(\xv)$. To prevent such cases, when calculating $\hat{\eta}_{k}(\xv)$, we use $\max \{ \epsilon_{c \ell^*}, |\bar{H}_{\ell^* k} (\xv) - q_{\ell^*}(\xv)| \}$ instead of $\bar{H}_{\ell^* k} (\xv) - q_{\ell^*}(\xv)$, where $\epsilon_{c \ell^*}$ is an error tolerance for the constraint $\ell^*$, the amount of error a decision maker is willing to take. \noindent {\bf Remark 4}: Similarly, if $\bar{G}_k(\xv)$ is very close to $\bar{G}_{k} (\tilde{\xv}_k)$, then $\hat{\eta}_{k}(\xv)$ becomes extremely large. To avoid such cases, when calculating $\hat{\eta}_{k}(\xv)$, we use $\max \{\epsilon_{ob}, |\bar{G}_k(\xv) - \bar{G}_k(\tilde{\xv}_k)| \}$ instead of $\bar{G}_k(\xv) - \bar{G}_k(\tilde{\xv}_k)$, where $\epsilon_{ob}$ is an indifference zone parameter, practically meaningful difference in the primary performance measure worth detecting. When a problem includes a small search space $\Theta$, then we can simply sample all solutions at every $k$ (i.e., ${\cal Q}_k = \Theta$). However, if a search space $\Theta$ is large, then sampling all solutions at every $k$ is not desirable (or impossible). Instead, existing DOvS algorithms use a solution sampling strategy that forms ${\cal Q}_k$ as a subset of $\Theta$. Depending on DOvS algorithms, ${\cal Q}_k$ converges to a fixed set or keeps changing: \vspace{-6pt} \begin{itemize} \item Converging ${\cal Q}_k$: As $k$ increases, ${\cal Q}_k$ converges to a fixed set, ${\cal Q}_{\infty}$. This happens if SMRAS of \cite{fumras}, NP of \cite{ShiChen00CENP}, and CE of \cite{He10CEOCBA} are used as ${\cal D}$ because these algorithms take ${\cal Q}_k = {\cal V}_k$ and ${\cal V}_k$ eventually converges to $\Theta$ in the algorithms. \vspace{-6pt} \item Changing ${\cal Q}_k$: As $k$ increases, ${\cal Q}_k$ keeps changing but its cardinality is fixed. This happens if NP of \cite{pjnb}, a random search of \cite{as_convergence}, and BEESE of \cite{BEESE} are used as ${\cal D}$ because these algorithms take ${\cal Q}_k$ as a set of sampled solutions at iteration $k$. \end{itemize} \clearpage \begin{figure}[htb!] \begin{center} \fbox{ \begin{minipage}{\textwidth} \noindent {\large \bf Algorithm : ${\cal D}$+PFM+ABA} \begin{hangref} %\vspace{0.1in} %\vspace{0.1in} \item \textbf{Step 0. Initialization:}\\ - Set $k=1$. Select $n_0$, $\epsilon_{ob}$, and $\epsilon_{c \ell}$ for all $\ell \in \Lambda$.\\ - Choose a DOvS algorithm as ${\cal D}$, set ${\cal V}_k = \emptyset$, and $v_0(\xv)=0$ for all $\xv \in \Theta$. \vspace{6pt} \item \textbf{Step 1. Sampling:}\\ - Sample solutions using the solution sampling strategy of ${\cal D}$ to form ${\cal Q}_k$.\\ - Set $n_1 (\xv) = n_0$ and $v_k(\xv) = 1$ for all $\xv \in {\cal Q}_k\setminus {\cal V}_{k-1}$ (i.e., solutions newly visited).\\ - Calculate $\bar{G}_k(\xv)$, $s^2_{0k}(\xv)$, $\bar{H}_{\ell k}(\xv)$, $S^1_{\ell}(\xv)$, and $s^2_{\ell k}(\xv)$, for all $\ell \in \Lambda$ and all $\xv \in {\cal Q}_k\setminus {\cal V}_{k-1}$.\\ - Set ${\cal V}_k = {\cal V}_{k-1} \cup {\cal Q}_k$ and find $\tilde{\xv}_k = \mbox{arg}\min_{\xv \in {\cal Q}_k} Z_k(\xv)$ where $Z_k(\xv)$ is defined in (\ref{eq:newobj}). \vspace{0pt} \item \textbf{Step 2. Updating:}\\ - Set \vspace{-16pt} \begin{eqnarray*} \tilde{\cal Q}_{kO} &=&\Big\{ \xv \ | \ \xv \ne \tilde{\xv}_k,\ \xv \in {\cal Q}_k,\ \frac{q_{\ell^*}(\xv)-\bar{H}_{\ell^* k} (\xv)}{s_{\ell^* k} (\xv)} \leq \frac{\bar{G}_k(\xv) - \bar{G}_k(\tilde{\xv}_k)}{s_{0k}(\xv)} \Big\}\\ \tilde{\cal Q}_{kF} &=& \Big\{ \xv \ | \ \xv \ne \tilde{\xv}_k,\ \xv \in {\cal Q}_k,\ \frac{q_{\ell^*}(\xv)-\bar{H}_{\ell^* k} (\xv)}{s_{\ell^* k} (\xv)} > \frac{\bar{G}_k(\xv) - \bar{G}_k(\tilde{\xv}_k)}{s_{0k}(\xv)} \Big\} \end{eqnarray*} where $\ell^*(\xv) \equiv\mbox{arg} \min_{\ell \in \Lambda} \ql - \bar{H}_{\ell k} (\xv)$, $ s^2_{\ell^*k}(\xv)=s^2_{\ell^*(\xv)k}(\xv)$, and $q_{\ell^*}(\xv)=q_{\ell^*(\xv)}$.\\ - If $\xv \in \tilde{\cal Q}_{kO}$, $\hat{\eta}_k(\xv) = \frac{s_{0k}(\xv)}{\max \{\epsilon_{ob}, |\bar{G}_k(\xv) - \bar{G}_k(\tilde{\xv}_k)| \}}$. Otherwise, $\hat{\eta}_k(\xv) = \frac{s_{\ell^* k} (\xv)}{\max \{ \epsilon_{c \ell^*}, |\bar{H}_{\ell^* k} (\xv) - q_{\ell^*}(\xv)| \}}$. \vspace{-6pt} \item \textbf{Step 3. Allocating:}\\ - If $\xv \notin {\cal Q}_k$, set $\hat{\beta}_k(\xv)=0$. Otherwise, find $\hat{\beta}_k(\xv)$ such that %\begin{center} $\sum_{\xv \in {\cal Q}_k} \hat{\beta}_k(\xv)= 1$, $\frac{\hat{\beta}_k(\xv)}{\hat{\beta}_k(\yv)} = \left(\frac{\hat{\eta}_k(\xv)}{\hat{\eta}_k(\yv)}\right)^2 \mbox{ for all }\xv \ne \yv \ne \tilde{\xv}_k$, and $\hat{\beta}_k(\tilde{\xv}_k) = \max(\beta^{\tilde{O}}_{k+1},\beta^{\tilde{F}}_{k+1})$, where $\beta^{\tilde{O}}_{k+1} = s_{0k} (\tilde{\xv}_k) \sqrt{\sum_{\xv \in \tilde{\cal Q}_{kO}} \left(\frac{\hat{\beta}_k(\xv)}{s_{0k}(\xv)}\right)^2}$ and $\beta^{\tilde{F}}_{k+1} = \hat{\beta}_k(\xv) \left(\frac{\hat{\eta}_k(\tilde{\xv}_k)}{\hat{\eta}_k(\xv)}\right)^2$ for all $\xv \ne \tilde{\xv}_k$.\\ %\end{center} - Determine the number of additional simulation observations for each $\xv \in {\cal Q}_k$: \begin{equation} \label{eq:allocation} \Delta N_{k}(\xv) = \lfloor \Delta T_k \hat{\beta}_k(\xv) \rfloor + \mathbb{I} \big\{ U < \Delta T_k \hat{\beta}_k(\xv) - \lfloor \Delta T_k \hat{\beta}_k(\xv) \rfloor \big\}, \end{equation} where $\mathbb{I}\{\cdot\}$ represent an indicator function and $U$ is a uniform random variable between 0 and 1. If $\Delta N_k(\xv) > 0$, then set $v_k(\xv) = v_{k-1}(\xv)+1$ and $\Delta n_{v_k}(\xv) = \Delta N_k (\xv)$. Otherwise, set $v_k(\xv) = v_{k-1}(\xv)$ and $\Delta n_{v_k}(\xv) = 0$.\\ - Adjust the allocation so that $\sum_{\xv \in {\cal Q}_k} \Delta n_{v_k}(\xv) = \Delta T_k$. \vspace{6pt} \item \textbf{Step 4. Simulating:}\\ - Obtain $\Delta n_{v_k}(\xv)$ additional observations and update $n_{v_k}(\xv) = n_{v_{k-1}}(\xv) + \Delta n_{v_k}(\xv)$ for all $\xv \in {\cal Q}_k$.\\ - Update $\bar{G}_{k+1}(\xv)$, $s^2_{0,k+1}(\xv)$, $\bar{H}_{\ell,k+1}(\xv)$, $S^{v_{k+1}}_{\ell}(\xv)$, and $s^2_{\ell, k+1}(\xv)$, for all $\ell \in \Lambda$ and for all $\xv \in {\cal Q}_k$. \vspace{6pt} \item \textbf{Step 5. Stopping Rule:}\\ - Update the sample best $\hat{x}^*_{k} = \mbox{arg}\min_{\xv \in {\cal V}_k} Z_{k}(\xv)$.\\ - If a stopping rule is satisfied, then stop and return $\hat{\xv}^*_k$ as the best feasible solution.\\ Otherwise, update the solution sampling strategy, set $k \leftarrow k+1$, and go to Step 1. \vspace{6pt} \end{hangref} \end{minipage} } \caption{Algorithmic statements of an ${\cal D}$+PFM+ABA. \label{alg:PFM_OCBA}} \end{center} \end{figure} \clearpage \pagebreak Locally convergent algorithms, such as COMAPSS of \cite{hncompass} and \cite{xu:AHA}, can also be used as ${\cal D}$. In this case, $Q_k$ converges to a set of neighbor solutions of one of a local optimal solution. However, stochastic constraints and a definition of a local optimum in \cite{hncompass} and \cite{xu:AHA} can unexpectedly create too many local optima near the infeasible and feasible boundary. This will be further discussed in Section~\ref{chap4:section:numex:GS}. %The global or local convergence is achieved when $k$ goes to infinity but in practice the algorithm should terminate in a finite number of search iterations. The algorithm can stop when all computational budget is consumed or a certain number of search iterations is made. %\subsection{${\cal D}$+PFM+ABA} %Converging ${\cal Q}_k$ has a main advantage that PFM+OCBA maximizes APCS given total computing budget (i.e., total number of simulation observations obtained so far) over all solutions in ${\cal Q}_{\infty}$. This advantage aids ${\cal D}$ to enhance performance of ${\cal D}$ with less number of iterations. However, when the size of ${\cal Q}_{k}$ increases fast, then PFM+OCBA needs large computing budget to scan and compare solutions in ${\cal Q}_{k}$. On the other hand, changing ${\cal Q}_{k}$ requires always the same amount of computing budget to scan and compare solutions in ${\cal Q}_{k}$ if the size of ${\cal Q}_{k}$ is fixed. Nevertheless, changing ${\cal Q}_k$ maximizes APCS given the only additional computing budget (i.e., the number of additional simulation observations) over solutions in ${\cal Q}_k$. Convergence proofs are included in the next section. \subsection{Convergence Properties} We first provide a lemma needed to guarantee convergence of ${\cal D}$+PFM+ABA. \begin{lemma} \label{lemma:limit} Under Assumptions \ref{assump:normal} and \ref{assump:limit}, if $\Delta T_k > 0$ for any $k$, then ${\cal D}$+PFM+ABA guarantees \begin{eqnarray*} \Pr \big[ \lim_{k \rightarrow \infty} v_k(\xv) = \infty \big] =1 \quad \mbox{and} \quad \Pr \big[ \lim_{k \rightarrow \infty} n_{v_k}(\xv) = \infty \big] =1 \mbox{ for any }\xv \in \Theta. \end{eqnarray*} \end{lemma} \noindent {\bf Proof of Lemma~\ref{lemma:limit}.} If $\Delta T_k \rightarrow \infty$ as $k\rightarrow \infty$, then $\Delta N_{k}(\xv) \rightarrow \infty$ because $\beta_k (\xv) > 0$ for any $\xv \in {\cal Q}_k$ at any $k$. On the other hand, if $\Delta T_k$ is finite, then $ \Delta N_{k}(\xv) = \lfloor \Delta T_k \hat{\beta}_k(\xv) \rfloor + \mathbb{I} \big\{ U < \Delta T_k \hat{\beta}_k(\xv) - \lfloor \Delta T_k \hat{\beta}_k(\xv) \rfloor \big\}$ can be 0 and a ``sampled" solution is not always ``visited". However, for any $\xv$ and $k$, since $\beta_k(\xv) >0$, $\Pr \big\{ U < \Delta T_k \hat{\beta}_k(\xv) - \lfloor \Delta T_k \hat{\beta}_k(\xv) \rfloor \big\}>0$ (i.e., the probability of visiting $\xv$ is always positive when $\xv$ is sampled at iteration $k$, which guarantees $v_k(\xv) \rightarrow \infty$). Therefore, with Assumption \ref{assump:limit}, the result directly follows. $\Box$ \vspace{6pt} Remind that in this chapter, APS$_c$ is used with finite $\Delta T_k$ and PS$_f$ is used with $\Delta T_k \rightarrow \infty$. Then, the following corollary shows the convergence properties of ${\cal D}$+PFM+ABA. \begin{corollary} \label{chap4:col:convergence} Under Assumptions \ref{assump:normal}, \ref{assump:limit}, and \ref{chap4:assump:normal}, ${\cal D}$+PFM+ABA satisfies Theorems~\ref{thm:PScstrict} and \ref{thm:PScactived} when APS$_c$ (with finite $\Delta T_k$) is used. Similarly, ${\cal D}$+PFM+ABA satisfies Theorem~\ref{thm:PSfaconvI} when PS$_f$ (with $\Delta T_k \rightarrow \infty$) is used. \end{corollary} \vspace{-6pt} \noindent {\bf Proof of Corollary~\ref{chap4:col:convergence}.} The results follow directly from Lemma~\ref{lemma:limit}.$\Box$ \vspace{6pt} Most OCBA methods calculate computing budget based on the total number of observations obtained up to the current iteration over $\Theta$ as in OCBA-CO procedure of \cite{Lee12OCBACO}. However, ABA uses a different allocation scheme (\ref{eq:allocation}) that calculates computing budget only based on the number of new observations over ${\cal Q}_k$. The next theorem shows that the allocation scheme (\ref{eq:allocation}) still guarantees that the number of observations obtained for solution $\xv$ up to iteration $k$, $n_{v_k}(\xv)$, is proportional to $\beta(\xv)$, if $\hat{\beta}_k(\xv)$ is a strongly consistent estimator of $\beta(\xv)$. \begin{theorem} \label{theorem:PFMOCBA_ACPS} Assume $\hat{\beta}_k(\xv)$ is a strongly consistent estimator of $\beta(\xv)$. For a finite constant $\Delta T$, if either $\lim_{k \rightarrow \infty} \Delta T_k = \Delta T$ or $\lim_{k \rightarrow \infty} \Delta T_k = \infty$ holds, then (\ref{eq:allocation}) of ${\cal D}$+PFM+OCBA with converging ${\cal Q}_k$ (i.e., ${\cal Q}_k={\cal V}_k$) guarantees that $\lim_{k \rightarrow \infty} \frac{n_{v_k}(\xv)}{n_{v_k}(\yv)} = \frac{\beta(\xv)}{\beta(\yv)}$ for any $\xv \ne \yv$, $\xv \in \Theta$, and $\yv \in \Theta$. \end{theorem} \vspace{-12pt} \noindent {\bf Proof of Theorem~\ref{theorem:PFMOCBA_ACPS}.} Assumption~\ref{assump:limit} guarantees ${\cal V}_k \rightarrow \Theta$. We first provide the proof with $\lim_{k \rightarrow \infty} \Delta T_k = \Delta T$. For any $\xv \in \Theta$, by CMT, $\Delta T_k \hat{\beta}_k(\xv) \overset{a.s.}{\longrightarrow} \Delta T \beta(\xv)$, as $k\rightarrow\infty$. For $\xv \in \Theta$, \begin{eqnarray} \lim_{k \rightarrow \infty} \frac{n_{v_k}(\xv)}{k} & = &\lim_{k \rightarrow \infty} \frac{n_0+\sum_{j=1}^k \lfloor \Delta T_j \hat{\beta}_j(\xv) \rfloor + \mathbb{I} \big\{ U < \Delta T_j \hat{\beta}_j(\xv) - \lfloor \Delta T_j \hat{\beta}_j(\xv) \rfloor \big\}}{k} \nonumber \\ & = & \lim_{k \rightarrow \infty} \frac{n_0}{k}+\frac{\sum_{j=1}^k \lfloor \Delta T_j \hat{\beta}_j(\xv) \rfloor}{k} + \frac {\sum_{j=1}^k \mathbb{I} \big\{ U < \Delta T_j \hat{\beta}_j(\xv) - \lfloor \Delta T_j \hat{\beta}_j(\xv) \rfloor \big\}}{k} \nonumber \\ & = & \lfloor \Delta T \beta(\xv) \rfloor + \Pr \big\{ U < \Delta T \beta(\xv) - \lfloor \Delta T \beta(\xv) \rfloor \big\} \quad (\mbox{By Lemmas \ref{lemma:kolmogorovLLN} and \ref{lemma:arithmetic}} ) \nonumber \\ & = & \Delta T \beta(\xv). \label{eq:convDeltaTbeta} \end{eqnarray} For any $\xv \ne \yv$, $\xv \in \Theta$, and $\yv \in \Theta$, (\ref{eq:convDeltaTbeta}) implies \begin{eqnarray*} \lim_{k \rightarrow \infty} \frac{n_{v_k}(\xv)}{n_{v_k}(\yv)} = \lim_{k \rightarrow \infty} \frac{n_{v_k}(\xv)}{k} \frac{k}{n_{v_k}(\yv)} = \frac{\beta(\xv)}{\beta(\yv)}. \end{eqnarray*} Next, we provide the proof with $\lim_{k \rightarrow \infty} \Delta T_k = \infty$. For a fixed $k$ and $1\leq j \leq k$, define a sequence $\chi_{j,k}(\xv)=\frac{ \lfloor \Delta T_j \hat{\beta}_j(\xv) \rfloor + \mathbb{I} \big\{ U < \Delta T_j \hat{\beta}_j(\xv) - \lfloor \Delta T_j \hat{\beta}_j(\xv) \rfloor \big\}}{\Delta T_k}$. Then, \begin{eqnarray*} \frac{\Delta T_k \hat{\beta}_k(\xv)-1}{\Delta T_k } \leq \frac{\lfloor\Delta T_k \hat{\beta}_k(\xv)\rfloor}{\Delta T_k }\leq \chi_{k,k}(\xv) \leq \frac{\lceil\Delta T_k \hat{\beta}_k(\xv)\rceil}{\Delta T_k } \leq \frac{\Delta T_k \hat{\beta}_k(\xv)+1}{\Delta T_k } \end{eqnarray*} We have $\lim_{k \rightarrow \infty} \frac{\Delta T_k \hat{\beta}_k(\xv)-1}{\Delta T_k } = \lim_{k \rightarrow \infty} \hat{\beta}_k(\xv)-\frac{1}{\Delta T_k } = \beta(\xv)$ and similarly, we also have $\lim_{k \rightarrow \infty} \frac{\Delta T_k \hat{\beta}_k(\xv)+1}{\Delta T_k } = \beta(\xv)$. Therefore, we have $\lim_{j \rightarrow \infty} \chi_{j,\infty}(\xv) = \beta(\xv)$. \begin{eqnarray} \lim_{k \rightarrow \infty} \frac{n_{v_k}(\xv)}{k \Delta T_k} & = &\lim_{k \rightarrow \infty} \frac{n_0+\sum_{j=1}^k \lfloor \Delta T_j \hat{\beta}_j(\xv) \rfloor + \mathbb{I} \big\{ U < \Delta T_j \hat{\beta}_j(\xv) - \lfloor \Delta T_j \hat{\beta}_j(\xv) \rfloor \big\}}{k \Delta T_k} \nonumber \\ & = & \lim_{k \rightarrow \infty} \frac{n_0}{k \Delta T_k}+\frac{\sum_{j=1}^k \chi_{j,k}(\xv)}{k} \nonumber \\ & = & \beta(\xv) \quad (\mbox{By Lemma~\ref{lemma:arithmetic}} ). \label{eq:convDeltaTbeta2} \end{eqnarray} For any $\xv \ne \yv$, $\xv \in \Theta$, and $\yv \in \Theta$, (\ref{eq:convDeltaTbeta2}) implies \begin{eqnarray*} \lim_{k \rightarrow \infty} \frac{n_{v_k}(\xv)}{n_{v_k}(\yv)} & = & \lim_{k \rightarrow \infty} \frac{n_{v_k}(\xv)}{k \Delta T_k} \frac{k \Delta T_k}{n_{v_k}(\yv)}\\ & = & \frac{\beta(\xv)}{\beta(\yv)}. \Box \end{eqnarray*} \noindent {\bf Remark 5}: By Lemma~\ref{lemma:limit} and the strong law of large numbers (SLLN), all estimates converge to true values (i.e., as $k \rightarrow \infty$, $\bar{G}_{k}(\xv)\overset{a.s.}{\longrightarrow} \E[G(\xv)]$, $s^2_{0k}(\xv) \overset{a.s.}{\longrightarrow} \Var[G(\xv)]$, $\bar{H}_{\ell k}(\xv)\overset{a.s.}{\longrightarrow} \E[H_{\ell}(\xv)]$, and $s^2_{\ell k}(\xv) \overset{a.s.}{\longrightarrow} \Var[H_{\ell}(\xv)]$ for any $\ell$ and any $\xv \in \Theta$). As a result, the sets, ${\cal Q}_{kO}$ and ${\cal Q}_{kF}$, can be determined asymptotically correct and the sample allocation $\hat{\beta}_k(\xv)$ is an asymptotically consistent estimator $\beta(\xv)$ under assumptions as $k\rightarrow\infty$. If we take ${\cal Q}_k$ as a set of sampled solutions at iteration $k$ (i.e., changing ${\cal Q}_k$) in Theorem~\ref{theorem:PFMOCBA_ACPS} while keeping the other conditions same, then the main results in Theorem~\ref{theorem:PFMOCBA_ACPS} do not hold any more. In fact, we cannot derive any asymptotic results on allocating the total number of observations over $\Theta$, $\sum_{\xv \in \Theta} n_{v_k}(\xv)$, with changing ${\cal Q}_k$. %Instead, $\lim_{k \rightarrow \infty} \frac{\Delta n_{v_k}(\xv)}{\Delta n_{v_k}(\yv)} = \frac{\beta(\xv)}{\beta(\yv)}$ clearly holds for any $\xv \ne \yv$, $\xv \in {\cal Q}_k$, if $\hat{\beta}_k(\xv)$ is a strongly consistent estimator of $\beta(\xv)$. \section{Numerical Result} \label{chap4:sec:numericalexample} In this section, we revisit the three numerical examples from Section~\ref{chap2:sec:numericalexamples}: (i) the three-system example, (ii) the Goldstein-Price problem, and (iii) the $(s,S)$ inventory policy problem. We test ${\cal D}$+PFM+ABA on these three examples with both tight optimal and strictly feasible optimal solutions. ${\cal D}$+PFM(APS$_c$)+ABA and ${\cal D}$+PFM(PS$_f$)+ABA represent ${\cal D}$+PFM+ABA with APS$_c$ and $PS_f$, respectively. APS$_c$ uses same parameters from Table \ref{tab:expsftable}, but $\gamma^k_{\ell}$ is set to $\gamma^k_{\ell} = 0$. For the three-system example, we do not need a DOvS algorithm ${\cal D}$ because there are only three solutions. We sample all three solutions at each iteration (i.e., ${\cal Q}_k = \Theta$); apply PFM(APS$_c$)+ABA; and compare their performances with those of PFM(APS$_c$), PFM(PS$_f$), and OCBA-CO. We set $\Delta T_k=9$ when APS$_c$ is used and $\Delta T_k = 3(n_0 + \lceil \log k \rceil)$ when PS$_f$ is used. For the Goldstein-Price problem and the $(s,S)$ inventory policy problem, the search space is large and a DOvS algorithm is needed. We take NP as ${\cal D}$ and combine it with PFM+ABA. The performance of NP+PFM+ABA is compared with those of NP+PFM. When we combine NP+PFM with ABA, we take ${\cal Q}_k$ as a set of sampled solutions at iteration $k$ and set $\Delta T_k= \Delta n \cdot |{\cal Q}_k \cap {\cal V}_{k-1}|$ for APS$_c$ and $\Delta T_k = \sum_{\xv \in {\cal Q}_k \cap {\cal V}_{k-1}}(n_0+ \lceil \log v_k(\xv) \rceil)$ for PS$_f$, where $|\cdot|$ represents the cardinality of a set. The rest of parameter settings are the same as in Section~\ref{chap2:sec:numericalexamples}. NP+PFM without ABA uses equal allocation in a sense that the same number of additional observations across all sampled solutions except newly visited solutions are obtained at each iteration for APS$_c$ or the same number of additional observations are obtained for solutions with the same number of visits for PS$_f$. %Note that when a solution has ever been visited, NP+PFM(APS$_c$) without ABA uses equal allocation (i.e., $\Delta n_{v_k}(\xv) = \Delta n$, for $v_k(\xv) \geq 2$) and NP+PFM(PS$_f$) without ABA obtains the same number of additional observations at the same number of visits (i.e., $\Delta n_{v_k}(\xv) =n_0+ \lceil \log v_k(\xv)\rceil$, for $v_k(\xv) \geq 2$). \subsection{Three-System Example} Mean and variance configuration of three systems are given in Section~\ref{chap2:sec:threesystem}. In this problem, we arbitrary select $\epsilon_{c1} = \epsilon_{ob} = 0.01$ for ABA and $\epsilon_{10} = 0.04$ for PFM. Figure~\ref{fig:TS_active} shows the percentage of time that $\xv^*_k = \xv^b_o$ over 500 macro replications for the three-system example. Each run terminates when the total number of observations obtained from all visited solutions so far reaches 3,000. As shown in Figure \ref{fig:TS_active}, PFM(PS$_{f}$)+ABA achieves the highest percentage of time over $90\%$ while OCBA-CO achieves the lowest percentage around $50\%$. It is not surprising that OCBA-CO does not perform well because OCBA-CO cannot handle tight solutions. Also, comparing PFM(PS$_{f}$)+ABA with PFM(PS$_{f}$) (or PFM(APS$_{c}$)+ABA with PFM(APS$_{c}$)), one can observe improvement due to ABA over equal allocation. The percentage of time that $\xv^*_k = \xv^b_o$ is about $5\% \sim 8\%$ higher in both small and large observation sizes when ABA is combined with PFM(PS$_f$). However, for APS$_c$, ABA shows about $10\%$ increase in the percentage compared to equal allocation only when the total number of observations is small but the advantage disappears as more observations are taken. \begin{figure}[!tb] \begin{center} \scalebox{0.50}{\includegraphics[width=11.0in,height=6.0in]{ts_active.ps}} \caption{Percentage of time that $\hat{\xv}^*_k = \xv^b_o$ for the three-system example with a tight solution. \label{fig:TS_active}} \end{center} \end{figure} In order to examine the small-sample behaviors more closely, we change $\E[H_1(2)] = 0$ to $\E[H_1(2)] = 0.03$ so that there is no tight solution while system 2 is still the true best feasible solution. Each run terminates when the total number of observations obtained from all visited solution reaches 1,000. Figure~\ref{fig:TS_notactive} shows the percentage of time that $\xv^*_k = \xv^b_o$ over 500 macro replications for the three-system example. In this case, OCBA-CO achieves a higher percentage up to $80\%$ but does not outperform any variant procedure of PFM. ABA improves the performance of PFM compared to equal allocation, especially when APS$_c$ is used. \begin{figure}[!tb] \begin{center} \scalebox{0.50}{\includegraphics[width=11.0in,height=6.0in]{ts_notactive.ps}} \caption{Percentage of time that $\hat{\xv}^*_k = \xv^b_o$ for the three-system example without any tight solution. \label{fig:TS_notactive}} \end{center} \end{figure} \subsection{Goldstein-Price Problem} \label{chap4:section:numex:GS} We consider the Goldstein-Price problem with a difficult constraint (\ref{eqn:hardsingle}) from Section~\ref{subsec:gp} to examine the performance of NP+PFM+ABA. We arbitrary select an error tolerance, $\epsilon_{c1} = 0.005$, and an indifference zone paramenter $\epsilon_{ob} = 0.1$ for ABA. For PFM with PS$_f$, we use $\epsilon_{10} = 0.04$. \begin{figure}[!tb] \begin{center} \scalebox{0.50}{\includegraphics[width=11.0in,height=6.0in]{gs_diff_optimal.ps}} \caption{Percentage of time that $\hat{\xv}^*_k = \xv^b_o$ with Constraint (\ref{eqn:hardsingle}). \label{fig:gs_diff_optimal}} \end{center} \end{figure} Figure~\ref{fig:gs_diff_optimal} represents the percentage of time that $\xv^*_k = \xv^b_o$ over 500 macro replications for the Goldstein-Price problem with Constraint (\ref{eqn:hardsingle}) and each replication is terminated with one million total number of observations. When PS$_f$ is used as a penalty sequence, ABA slightly improves the performance of the combined procedure compared to equal allocation at the beginning, but as the total number of observations increases, both NP+PFM(PS$_f$)+ABA and NP+PFM(PS$_f$) show similar performance, achieving $90\%$ of time that $\xv^*_k = \xv^b_o$. On the other hand, when APS$_c$ is used as a penalty sequence, NP+PFM(APS$_c$)+ABA shows a significant improvement over NP+PFM(APS$_c$). More specifically, NP+PFM(APS$_c$)+ABA achieves $90\%$ and performs almost similar to procedures with PS$_f$ after the total number of observations reaches 310,000. However, the percentage of NP+PFM(PS$_f$) is significantly smaller than that of NP+PFM(PS$_f$)+ABA, showing up to $20\%$ difference. As we observe in the three-system example, ABA brings more significant improvements when it is used with APS$_c$. Figure~\ref{fig:gs_diff_objective} shows average objective values at the sample best. The figure implies that NP+PFM(APS$_c$) tends to select superior infeasible solutions as the sample best at the beginning of the search while the other three procedures do not. \begin{figure}[!t] \begin{center} \scalebox{0.50}{\includegraphics[width=11.0in,height=6.0in]{gs_diff_objective.ps}} \caption{Average estimated objective value of $\hat{\xv}^*_k$ with Constraint (\ref{eqn:hardsingle}).\label{fig:gs_diff_objective}} \end{center} \end{figure} \begin{figure}[!b] \begin{center} \scalebox{0.50}{\includegraphics[width=5.0in,height=5.0in]{Localproblem.ps}} \caption{Local optimal solution in the Goldstein-Price problem with Constraint (\ref{eqn:hardsingle}).\label{fig:gslocalexample}} \end{center} \end{figure} If a locally convergent DOvS algorithm, such as COMPASS of \cite{hncompass}, is considered as ${\cal D}$, many local optima can be created unexpectedly by the definition of the neighbor solutions used in \cite{hncompass} when stochastic constraints exist. \cite{hncompass} and \cite{xu:AHA} define neighbor solutions as solutions that differ by $\pm 1$ in only one coordinate. For the Goldstein-Price problem with Constraint (\ref{eqn:hardsingle}), Figure~\ref{fig:gslocalexample} shows that any solution on the constraint line becomes a local optimal solution. More specifically, solution $\xv_0$ has four solutions in the neighborhood, $\xv_1$, $\xv_2$, $\xv_3$ and $\xv_4$. Note that$\xv_1$ and $\xv_2$ are infeasible but have better $\E[G(\xv)]$ than $\xv_0$ while $\xv_3$ and $\xv_4$ are feasible and have worse $\E[G(\xv)]$ than $\xv_0$. In this case, $\xv_0$ becomes a local optimal solution because $\xv_0$ is the best feasible solution among all solutions in the neighborhood. As a result, any solution on the constraint line becomes a local optimal solution and COMPASS is likely to stop when it reaches a solution on the constraint line. Thus, when there exist stochastic constraints, using a locally convergent algorithm as ${\cal D}$ needs some cautions. Now we consider an easier constraint: \begin{eqnarray} \E[- x_1 - x_2 + \psi_{1 i}] \geq 1.499 \label{eqn:relexhardsingle}. \end{eqnarray} With Constraint (\ref{eqn:relexhardsingle}), the true optimal solution is still the same but now it is strictly feasible. Figure~\ref{fig:gseasyoptimal} shows the percentage of time that $\xv^*_k = \xv^b_o$ and Figure~\ref{fig:gseasyobjective} shows the average objective value of the sample best over 500 macro replications. To focus on behaviors of procedures when the total number of observations is small, we terminate each macro replication when the total number of observations reaches 200,000. NP+PFM(PS$_f$)+ABA performs the best. ABA improves the performance of combined procedures compared to equal allocation but greater improvement is achieved when APS$_c$ is used as a penalty sequence for PFM. \begin{figure}[!b] \begin{center} \scalebox{0.50}{\includegraphics[width=11.0in,height=6.0in]{gs_relex_optimal.ps}} \caption{Percentage of time that $\hat{\xv}^*_k = \xv^b_o$ with constraint (\ref{eqn:relexhardsingle}). \label{fig:gseasyoptimal}} \end{center} \end{figure} \begin{figure}[!tb] \begin{center} \scalebox{0.50}{\includegraphics[width=11.0in,height=6.0in]{gs_relex_objective.ps}} \caption{Average estimated objective value of $\hat{\xv}^*_k$ with constraint (\ref{eqn:relexhardsingle}).\label{fig:gseasyobjective}} \end{center} \end{figure} \subsection{$(s,S)$ Inventory Policy Problem} \label{chap4:section:numex:sS} We revisit the $(s,S)$ inventory policy problem in Section~\ref{chap2:sec:numericalexamples} to test the performance of NP+PFM+ABA when a problem includes non-normal observations for the primary and secondary performance measures and there exists correlation across performance measures. We arbitrary select an error tolerance, $\epsilon_{c1} = 0.0001$, and an indifference zone paramenter $\epsilon_{ob} = 0.1$ for ABA. For PFM with PS$_f$, we select $\epsilon_{10} = 0.0001$. We first consider the same problem in Section~\ref{chap2:sec:numericalexamples} where the true optimal solution is strictly feasible but nearly tight. Figure \ref{fig:sSdiffopt} shows the percentage of time that $\hat{\xv}^*_k= \xv_o^b$ over 500 macro replications when each macro replication is terminated with 2,000,000 total observations. At the end of the search, NP+PFM(APS$_{c}$)+ABA and NP+PFM(PS$_{f}$)+ABA achieve over $90\%$ while NP+PFM(APS$_{c}$) and NP+PFM(PS$_{f}$) achieve under $90\%$. Figure \ref{fig:sSdiffobj} shows average objective value of the sample best. Average objective values of NP+PFM(APS$_c$)+ABA and NP+PFM(PS$_f$)+ABA start above the true optimal value while NP+PFM(APS$_c$) and NP+PFM(PS$_f$) start under the true optimal value. This supports that ABA helps accurate feasibility checks at the beginning of the search. \begin{figure}[!tb] \begin{center} \scalebox{0.50}{\includegraphics[width=11.0in,height=6.0in]{sS_diff_optimal.ps}} \caption{Percentage of time that $\hat{\xv}^*_k = \xv^b_o$ in the $(s,S)$ inventory problem with a difficult constraint. \label{fig:sSdiffopt}} \end{center} \vspace*{-2.5ex} \end{figure} \begin{figure}[!b] \begin{center} \scalebox{0.50}{\includegraphics[width=11.0in,height=6.0in]{sS_diff_objective.ps}} \caption{Average estimated objective value of $\hat{\xv}^*_k$ in the $(s,S)$ inventory problem with a difficult constraint.. \label{fig:sSdiffobj}} \end{center} \vspace*{-2.5ex} \end{figure} Now we consider an easier constraint where failure probability should be less than equal to 0.05. The true optimal solution changes to $\xv^b_o = (24,58)$ and its expected cost and failure probability are 113.0864 and 0.04878, respectively. \begin{figure}[!t] \begin{center} \scalebox{0.50}{\includegraphics[width=11.0in,height=6.0in]{sS_easy_optimal.ps}} \caption{Percentage of time that $\hat{\xv}^*_k = \xv^b_o$ in the $(s,S)$ inventory problem with a relaxed constraint. \label{fig:sSeasyopt}} \end{center} \vspace*{-2.5ex} \end{figure} \begin{figure}[!b] \begin{center} \scalebox{0.50}{\includegraphics[width=11.0in,height=6.0in]{sS_easy_objective.ps}} \caption{Average estimated objective value of $\hat{\xv}^*_k$ in the $(s,S)$ inventory problem with a relaxed constraint. \label{fig:sSeasyobj}} \end{center} \vspace*{-2.5ex} \end{figure} Figures \ref{fig:sSeasyopt} and \ref{fig:sSeasyobj} show the percentage of time that $\hat{\xv}^*_k= \xv_o^b$ and the average objective value at the sample best over 500 macro replications. To focus on the small sample behaviors, each macro replication is terminated with 100,000 total observations. Both figures show that ABA brings significant savings compared to equal allocation. When APS$_c$ is used as a penalty sequence in PFM, NP+PFM(APS$_c$)+ABA achieves about $20\%$ higher percentage than NP+PFM(APS$_c$). On the other hand, when PS$_f$ is used as a penalty sequence in PFM, the combined procedure with ABA achieves about $15\%$ higher percentage than the procedure with equal allocation. \section{Conclusions} \label{chap4:sec:conclusion} In this chapter, we improve the performance of ${\cal D}$+PFM by combining it with ABA. ABA is a modified version of OCBA-CO of \cite{Lee12OCBACO}. ABA can handle active constraints and satisfies necessary conditions for convergence properties of PFM. Convergence properties of the combined procedure is provided with proofs. Our experimental results on the three numerical examples show that ABA improves the performance of ${\cal D}$+PFM. Greater improvement is observed when the total number of observations is small/medium or APS$_c$ is used as a penalty sequence in PFM. \chapter{Contributions} We present a novel method, PFM, which converts a stochastically constrained DOvS problem into a series of stochastically unconstrained DOvS problems. To the best of our knowledge, PFM is the first work that (i) adaptively determines penalty values based on observed feasibility of solutions and (ii) can handle tight solutions among simulation-based optimization algorithms. The proposed method is general enough to handle problems that occur in a wide range of applications including IE/OR, health care, environmental and energy management. We apply PFM to an important problem in environmental management, namely the water quality monitoring network design problem for river systems. The purpose of the problem is to find the optimal location of a finite number of monitoring devices that minimizes the expected detection time of a contaminant spill event while guaranteeing good detection reliability. We formulate this problem as a DOvS problem with a stochastic constraint and present ${\cal D}$+PFM. Experimental results on the Altamaha River shows that our algorithm performs better than an existing popular method in environmental management in terms of both efficiency and accuracy. Finally, we present ABA, that allocates simulation budget among sampled solutions at each iteration instead of taking equal number of observations from each sampled solution. Our experimental results on three examples shows that ABA improves the performance of ${\cal D}$+PFM significantly, especially when the total number of observations is small or a small budget needs to be allocated at each iteration as in APS$_c$. \chapter*{APPENDIX A} \markboth{\MakeUppercase{APPENDIX A}}{} \addcontentsline{toc}{chapter}{APPENDIX A} %\section{Proofs of Chapter 2} \noindent {\bf Proof of Theorem~\ref{thm:convZkx}.} Assumption \ref{assump:limit} and SLLN ensure that $\overline{G}_k (\xv)$ converges to $\E [G(\xv)]$ and $\overline{H}_{\ell k} (\xv)$ converges to $\E [H_{\ell} (\xv)]$ almost surely. By the continuous mapping theorem (CMT, Theorem 5.5 of \cite{billingsley68}) and Condition~1, the results follow. $\Box$ \noindent {\bf Proof of Theorem~\ref{thm:convDPFM}.} Recall that $\Vs_k$ is a set of all solutions visited by $\mathcal{D}+$PFM up to iteration $k$ and $\Theta_f$ is the set of all feasible solutions. At iteration $k$, $\mathcal{D}+$PFM selects $\hat{\xv}^{*}_{k}$ such that $\hat{\xv}^*_{k} \equiv \mbox{argmin}_{\xv \in \Vs_k} Z_{k}(\xv).$\\ As $k \rightarrow \infty$, \begin{eqnarray*} \min_{\xv \in \Vs_k} Z_k (\xv) & \overset{a.s.}{\longrightarrow} & \min_{\xv \in \Theta} Z_k (\xv) \quad \mbox{ (by Assumption~\ref{assump:limit})}\\ & \overset{a.s.}{\longrightarrow} & \min_{\xv \in \Theta_f} \E[G(\xv)]. \quad \mbox{ (by Theorem \ref{thm:convZkx})} \end{eqnarray*} This implies that $\Pr \{\lim_{k \rightarrow \infty} Z_k(\hat{\xv}_k^*) = \min_{\xv \in \Theta_f} \E[G(\xv)] \} = 1$. $\Box$ %} \noindent {\bf Proof of Theorem~\ref{thm:PScstrict}.} We start with the case for $\ell \in \Lambda_{S(\xv)}$. Let $\zeta_{\ell r}^s(\xv)$ represent a random variable $\zeta_{\ell r}^s(\xv) = \sum_{i= n_{r-1}(\xv) + 1}^{n_{r-1}(\xv)+\Delta n_{r}(\xv)} \frac{H_{\ell i} (\xv) - \E[H_{\ell}(\xv)]}{\sqrt{\Delta n_{r}(\xv)}}$. Note that $\zeta_{\ell r}^s(\xv)$, $r=1,2,\ldots$, are independent random variables with mean zero and a finite variance $\Var( H_{\ell i} (\xv))$. Let $\mathbb{I}(\cdot)$ represent an indicator function. Then, \begin{eqnarray*} \lefteqn{\mathbb{I} \left\{ S^{v_k}_{\ell}(\xv) \ge 0 \right\}}\\ & = & \mathbb{I} \left\{ \frac{\sum_{r=1}^{v_k(\xv)} \zeta_{\ell r}^s(\xv)}{v_k(\xv)} + \frac{\sum_{r=1}^{v_k(\xv)} \sum_{i=n_{r-1}(\xv)+1}^{i=n_{r-1}(\xv)+\Delta n_r(\xv)} \frac{\E[H_{\ell}(\xv)] - \ql}{\sqrt{\Delta n_r(\xv)}}}{v_k(\xv)} \ge 0 \right\}\\ & = & \mathbb{I} \left\{ \frac{\sum_{r=1}^{v_k(\xv)} \zeta_{\ell r}^s(\xv)}{v_k(\xv)} \ge - \frac{\sum_{r=1}^{v_k(\xv)} \sqrt{\Delta n_r(\xv)} \cdot (\E[H_{\ell}(\xv)] - \ql)}{v_k(\xv)} \right\}\\ & \ge & \mathbb{I} \left\{ \frac{\sum_{r=1}^{v_k(\xv)} \zeta_{\ell r}^s(\xv)}{v_k(\xv)} \ge \ql- \E[H_{\ell}(\xv)] \right\} \quad \quad(\because \ \Delta n_r(\xv) \geq 1 \mbox{ and } \E[H_{\ell}(\xv)] - \ql > 0 ) \end{eqnarray*} By Lemma~\ref{lemma:kolmogorovLLN}, there exists $N_{\ell}(\xv) \in \mathbb{Z}^{+}$ such that $\frac{\sum_{r=1}^{v_k(\xv)} \zeta_{\ell r}^S(\xv)}{v_k(\xv)} \ge \ql - \E[H_{\ell}(\xv)]$ for any $k \geq N_{\ell}(\xv)$. This implies that solution $\xv$ is declared as feasible after $k \geq N_{\ell}(\xv)$ and $\theta_d$ is kept multiplied to $\lambda^{v_k}_{\ell}(\xv)$, resulting the sequence to converge to 0 almost surely. Using similar arguments, it can be shown that $\lambda_{\ell}^{v_k}(\xv) \overset{a.s.}{\longrightarrow} \infty$ for any $\ell \in \Lambda_{I(\xv)}$.\\ $\Box$ \noindent {\bf Proof of Lemma~\ref{lemma:arithmetic}.} For any $\epsilon_b >0$, there exists a positive integer $N_b$ such that $|b_n - c| < \frac{\epsilon_b}{2}$ for all $n \geq N_b$. Since $b_n$ converges to a finite value, there exists a finite constant $M_b$ such that $|b_n - c| < M_b$ for all $n < N_b$. Therefore, we can select $N \geq N_b$ such that $\frac{(N_b-1) M_b}{N} < \frac{\epsilon_b}{2}$. By the triangular inequality, for $n \geq N$, \begin{eqnarray*} \left|\frac{1}{n} \sum_{r=1}^n b_r - c \right| & \leq & \frac{1}{n} \sum_{r=1}^n |b_r-c|\\ & < & \frac{(N_b -1)M}{n} + \frac{(n-N_b +1)\epsilon_b}{2n}\\ & < & \frac{\epsilon_b}{2} + \frac{\epsilon_b}{2} = \epsilon_b.\\ \end{eqnarray*} Thus, $\lim_{n \rightarrow \infty} \frac{1}{n} \sum_{r=1}^n b_n = c$. $\Box$ \noindent {\bf Proof of Theorem~\ref{thm:PScactived}.} % = \sum_{r=1}^{v_k(\xv)} \mathbb{I} \{\frac{S_\ell^{r}(\xv)}{\sigma_{\ell}(\xv)} \geq 0 \} By Assumption~\ref{assump:normal}, $\sigma_{\ell}^2(\xv) = \Var(H_{\ell i}(\xv)) <\infty$. Let $T^{v_k}_{\ell}(\xv)= \sum_{r=1}^{v_k(\xv)} \mathbb{I} \{S_\ell^{r}(\xv) \geq 0 \}$ (the number of feasible decisions up to the $v_k(\xv)$th visit). Then for any $\epsilon > 0$, \begin{eqnarray*} \lefteqn{\lim_{k\rightarrow \infty} \Pr \left[ \lambda_{\ell}^{v_k}(\xv) < \epsilon \right]}\\ & = & \lim_{k\rightarrow \infty}\Pr\left[\theta_d^{T^{v_k}_{\ell}(\xv)} \cdot \theta_a^{v_k(\xv)-T^{v_k}_{\ell}(\xv)}<\epsilon\right]\\ & = & \lim_{k\rightarrow \infty}\Pr\left[\theta_a^{v_k(\xv)} \cdot \Big( \frac{\theta_d}{\theta_a} \Big)^{T^{v_k}_{\ell}(\xv)}<\epsilon\right]\\ & = & \lim_{k\rightarrow \infty}\Pr\left[T^{v_k}_{\ell}(\xv) (\log \theta_d - \log \theta_a) < \log \epsilon - v_k(\xv)\log \theta_a\right]\\ & = & \lim_{k\rightarrow \infty}\Pr\left[ \frac{T^{v_k}_{\ell}(\xv)}{v_k} > \frac{ {\log \epsilon \over v_k} - \log \theta_a}{\log \theta_d - \log \theta_a} \right] \\ & = & 1 - \frac{2}{\pi} \arcsin \sqrt{\frac{- \log \theta_a}{\log \theta_d - \log \theta_a}} \quad \mbox{ (by Lemma~\ref{lemma:arcsinlaw})}. \end{eqnarray*} Therefore, $\lambda_{\ell}^{v_k}(\xv)$ converges to 0 with probability $1 - \frac{2}{\pi} \arcsin \sqrt{\frac{- \log \theta_a}{\log \theta_d - \log \theta_a}}$ and diverges to $\infty$ with probability $\frac{2}{\pi} \arcsin \sqrt{\frac{- \log \theta_a}{\log \theta_d - \log \theta_a}}$ as $k\rightarrow \infty$. $\Box$ \noindent {\bf Proof of Theorem~\ref{thm:convinfprob}.} Let $J_r \equiv \mathbb{I} \{ \zeta_{\ell r}(\xv)< 0 \}$. Then, $J_r$, $r=1,2,\ldots$ are independent Bernoulli random variables with success probability $\E[J_r]$. \begin{eqnarray*} \lefteqn{\lim_{r \rightarrow \infty} \E[J_r]}\\ & = &\lim_{\Delta n_r(\xv) \rightarrow \infty} \Pr\left\{ \frac{1}{\Delta n_r(\xv)} \sum_{i=1}^{\Delta n_r(\xv)} {H}_{\ell i}(\xv) < \ql \right\} \\ % \quad \mbox{ (By Assumption \ref{assump:asymmetric})}\\ & = &\lim_{\Delta n_r(\xv) \rightarrow \infty} \Pr\left\{ \frac{\sum_{i=1}^{\Delta n_r(\xv)} ({H}_{\ell i}(\xv)-\E[{H}_{\ell i}(\xv)])}{\sigma_{\ell}(\xv) \sqrt{\Delta n_r(\xv)}} < \frac{\sqrt{\Delta n_r(\xv)} (\ql-\E[{H}_{\ell i}(\xv)])}{\sigma_{\ell}(\xv)} \right\} \end{eqnarray*} By Lemma~\ref{lemma:arithmetic} and the CLT, we have \begin{equation} \label{eqn:convsum} \lim_{r \rightarrow \infty} \frac{\sum_{j=1}^{r}\E[J_j]}{r} = \lim_{r \rightarrow \infty} \E[J_r] = \left\{ \begin{array}{ll} \Phi(-\infty) = 0, & \mbox{ if } \ell \in \Lambda_{S(\xv)};\\ \Phi(\infty) = 1, & \mbox{ if } \ell \in \Lambda_{I(\xv)};\\ \Phi(0) = 0.5, & \mbox{ if } \ell \in \Lambda_{A(\xv)}, \end{array} \right. \end{equation} where $\Phi(\cdot)$ represents the cumulative distribution of the standard normal random variable. For $\ell \in \Lambda_{A(\xv)}$, by Lemma~\ref{lemma:kolmogorovLLN} and (\ref{eqn:convsum}) we have \begin{eqnarray*} \Pr \left\{\lim_{v_k(\xv) \rightarrow \infty} \left| \hat{p}^{v_k}_{\ell}(\xv) - \frac{\sum_{r=1}^{v_k(\xv)}\E[J_r]}{v_k(\xv)} \right| + \left| \frac{\sum_{r=1}^{v_k(\xv)}\E[J_r]}{v_k(\xv)} - 0.5 \right| = 0 \right\} & = & 1, \end{eqnarray*} and the triangular inequality implies that \begin{eqnarray*} \Pr \left\{\lim_{v_k(\xv) \rightarrow \infty} |\hat{p}^{v_k}_{\ell}(\xv) - 0.5| = 0 \right\} & = & 1. \end{eqnarray*} Similar arguments apply to $\ell \in \Lambda_{I(\xv)} \cup \Lambda_{S(\xv)}$, which completes the proof. $\Box$ \noindent {\bf Proof of Theorem~\ref{thm:PSfaconvI}.} \underline{(i) When $\ell \in \Lambda_{S(\xv)}$}: By Theorem \ref{thm:PScstrict}, there exists $N_{\ell}(\xv) \in \mathbb{Z}^{+}$ such that $\xv$ is declared as feasible for any $k \geq N_{\ell}(\xv)$. Thus, the penalty sequence receives $\delta_\ell^{v_k}(\xv)$ which is a number smaller than 1. As a result, the sequence converges to $0$ almost surely. \underline{When $\ell \in \Lambda_{A(\xv)}$}: By Theorem~\ref{thm:convinfprob}, for any $k \geq N^{'}_\ell(\xv)$, \begin{equation}\label{eq:ineq1} 0.5-\epsilon_{\ell 0} \leq \hat{p}^{v_k}_{\ell}(\xv) \leq 0.5+\epsilon_{\ell 0}, \end{equation} for $0 < \epsilon_{\ell 0} < \gamma_\ell^k$. Thus, the penalty sequence receives either $\alpha^{v_k}_{\ell}(\xv)$ or $\delta_\ell^{v_k}(\xv)$ for any $k \geq N^{'}_\ell(\xv)$. As both $\alpha^{v_k}_{\ell}(\xv)$ and $\delta_\ell^{v_k}(\xv)$ are less than 1, the sequence converges to $0$ almost surely. \underline{(ii) When $\ell \in \Lambda_{I(\xv)}$}: By Theorem \ref{thm:PScstrict}, there exists $N_{\ell}(\xv)\in \mathbb{Z}^{+}$ such that the penalty sequence for the constraint $\ell$ of an infeasible solution $\xv$ receives only appreciation function $\alpha_\ell^{v_k}(\xv)$ after $N_{\ell}(\xv)$. Also, by Theorem~\ref{thm:convinfprob}, there exists $N^{'}_\ell(\xv)$ such that for any $k \ge N^{'}_\ell(\xv)$ \[ \hat{p}_\ell^{v_k}(\xv) > h_{u+1}. \] Then, for any $k \ge \max(N_{\ell}(\xv),N^{'}_\ell(\xv))$, $\xv$ is declared infeasible and a penalty sequence keeps receiving an appreciating factor which is greater than 1. This results in $\lambda^{v_k}_{\ell}(\xv) \overset{a.s.}{\longrightarrow} \infty$ for any $\ell \in \Lambda_{I(\xv)}$. $\Box$ \noindent {\bf Proof of Theorem~\ref{thm:PSfconvI}.} \noindent{\underline{(i) When $\ell \in \Lambda_{I(\xv)}$}:} By Theorem \ref{thm:PScstrict}, there exists $N_{\ell}(\xv) \in \mathbb{Z}^{+}$ such that the penalty sequence for the infeasible constraint $\ell$ of $\xv$ receives only $\alpha_\ell^{v_k}(\xv)$ after $N_{\ell}(\xv)$. Now, let $p_\ell^I$ represent the infeasible probability for constraint $\ell$ that is greater than but the closest to $0.5$. By Lemma \ref{lemma:arithmetic} and similar argument used in the proof of Theorem \ref{thm:convinfprob}, there exists $N^{'}_{\ell}(\xv) \in \mathbb{Z}^{+}$ such that \begin{equation} \label{eqn:equality1} p_\ell (\xv) - \epsilon_{\ell 0} \le \hat{p}_\ell^{v_k}(\xv) \le p_\ell (\xv) + \epsilon_{\ell 0}. \end{equation} Set a constant $ 0 < \epsilon_{\ell0} < \frac{p^I_{\ell}-0.5}{4}$ and $\epsilon_{\ell 1} = {p_\ell^I - 0.5 \over 4}$. Note that Assumption~\ref{assump:symmetric} implies $p_\ell(\xv) \ge p_\ell^I > 0.5$ for any infeasible solution $\xv$. Then the left hand side of (\ref{eqn:equality1}) satisfies \[ p_\ell (\xv) - \epsilon_{\ell 0} \ge p_\ell^I - \epsilon_{\ell 0} > p_\ell^I - \left({ p_\ell^I - 0.5 \over 4}\right) = {3 \over 4} p_\ell^I + {0.5 \over 4}. \] Thus, we get \begin{equation} \label{eqn:equality2} {3 \over 4} p_\ell^I + {0.5 \over 4} < \hat{p}_\ell^{v_k}(\xv) \le p_\ell (\xv) + \epsilon_{\ell 0}. \end{equation} Recall \[ \gamma^k_\ell= \left\{ \begin{array}{ll} \min\left(h_{u+1}-0.5, 0.5 -h_u, \min_{\xv \in \{\xv | \hat{p}^{v_{k}}_{\ell}(\xv) > 0.5+\epsilon_{\ell 0}, \ \xv \in \mathcal{V}_{k} \}} \frac{\hat{p}^{v_{k}}_{\ell}(\xv)-0.5}{2}\right), \\ \quad \quad \quad \quad \quad \;\;\; \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \mbox{ if } \{\xv | \hat{p}^{v_{k}}_{\ell}(\xv) > 0.5+\epsilon_{\ell 0}, \ \xv \in \mathcal{V}_{k} \} \ne \emptyset; & \vspace{12pt}\\ \min{(h_{u+1}-0.5, 0.5- h_u)}, \quad \mbox{ otherwise}. \end{array} \right. \] Consequently, when the set of infeasible solutions is not empty, as $k\rightarrow \infty$, \begin{eqnarray*} \gamma_\ell^k & \overset{a.s.}{\longrightarrow} & \min\left(h_{u+1}-0.5, 0.5 -h_u, \min_{\xv \in \{\xv | \hat{p}^{v_{k}}_{\ell}(\xv) > 0.5+\epsilon_{\ell 0}, \ \xv \in \Theta \}} \frac{\hat{p}^{v_{k}}_{\ell}(\xv)-0.5}{2}\right)\\ & & \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \mbox{(by Assumption~\ref{assump:limit})}, \\ & \overset{a.s.}{\longrightarrow} & \min\left(h_{u+1}-0.5, 0.5 -h_u, {p_\ell^I - 0.5 \over 2}\right) \quad \ \mbox{ (by Lemma \ref{lemma:arithmetic} and CMT)}. \end{eqnarray*} Also, there exists $N^{''}_\ell(\xv) \in \mathbb{Z}^{+}$ such that for any $k \ge \max{(N^{'}_\ell(\xv), N^{''}_\ell(\xv))}$, \[ \gamma_\ell^k \le \min\left(h_{u+1}-0.5, 0.5 -h_u, {p_\ell^I - 0.5 \over 2} \right) + \epsilon_{\ell 1} \quad \mbox{and} \quad \gamma_\ell^k \le {p_\ell^I - 0.5 \over 2} + \epsilon_{\ell 1}. \] This implies that for any $k \ge \max{(N^{'}_\ell(\xv), N^{''}_\ell(\xv))}$ \begin{eqnarray} & 0.5 + \gamma_\ell^k \le 0.5 + {3(p_\ell^I - 0.5) \over 4} = {3 \over 4} p_\ell^I + {0.5 \over 4}. \label{eqn:equality3} \end{eqnarray} For $k \geq \max(N_{\ell}(\xv),N^{'}_{\ell}(\xv), N^{''}_\ell(\xv))$, $\xv$ is declared infeasible, a penalty sequence of $\xv$ keeps receiving an appreciation factor, and (\ref{eqn:equality2}) and (\ref{eqn:equality3}) ensure that the appreciation factor is greater than 1. Thus, the penalty sequence diverges to infinity almost surely. \noindent {\underline{(ii) When $\ell \in \Lambda_{A(\xv)}$}:} For any $\ell \in \Lambda_{A(\xv)}$, $p_\ell = 0.5$ by Assumption~\ref{assump:symmetric}. We need to consider two cases: when there exists solution $\xv$ that violates constraint $\ell$ and thus $p_\ell^I$ exists and when all solutions are feasible with respect to constraint $\ell$ and thus $p_\ell^I$ does not exist. If $p^I_{\ell}$ exists, there exists $N^{'''}_{\ell}(\xv) \in \mathbb{Z}^{+}$ such that for $k \geq N^{'''}_\ell(\xv)$, \begin{equation}\label{eq:equality4} \frac{\min\left(h_{u+1}-0.5, 0.5 -h_u, {p_\ell^I - 0.5 \over 2} \right)}{2} \leq \gamma^{k}_{\ell}. \end{equation} Also, there exists $N^{''''}_{\ell}(\xv) \in \mathbb{Z}^{+}$ such that for any $k \geq N^{''''}_\ell(\xv)$, \begin{equation}\label{eq:ineq1} 0.5 - \frac{\epsilon_{\ell 2}}{2} \leq \hat{p}^{v_k}_{\ell}(\xv) \leq 0.5+\frac{\epsilon_{\ell 2}}{2}, \end{equation} where $\epsilon_{\ell 2} = \frac{\min\left(h_{u+1}-0.5, 0.5 -h_u, {p_\ell^I - 0.5 \over 2} \right)}{2}$. From (\ref{eq:equality4}), for $k \geq N^{'''}_{\ell}$, \begin{eqnarray*} 0.5-\gamma^k_{\ell} &\leq& 0.5-\frac{\min\left(h_{u+1}-0.5, 0.5 -h_u, {p_\ell^I - 0.5 \over 2} \right)}{2},\; \mbox{ and}\\ 0.5+\gamma^k_{\ell} &\geq& 0.5+\frac{\min\left(h_{u+1}-0.5, 0.5 -h_u, {p_\ell^I - 0.5 \over 2} \right)}{2}. \end{eqnarray*} Then from (\ref{eq:equality4}) and (\ref{eq:ineq1}), for $k \geq \max(N^{'''}_{\ell}(\xv), N^{''''}_\ell(\xv))$, \begin{equation*} 0.5-\gamma^k_{\ell} \leq 0.5 - \frac{\epsilon_{\ell 2}}{2} \leq \hat{p}^{v_k}_{\ell}(\xv) \leq 0.5+\frac{\epsilon_{\ell 2}}{2} \leq 0.5+\gamma^k_{\ell}, \end{equation*} and $\alpha^{v_k}_{\ell}(\xv)=w_a<1$ which proves $\lambda^{v_k}_{\ell}(\xv) \overset{a.s.}{\longrightarrow} 0$ as $k \rightarrow \infty$ for any $\ell \in \Lambda_{A(\xv)}$. If $p^I_{\ell}$ does not exist, $\min\left(h_{u+1}-0.5, 0.5 -h_u, {p_\ell^I - 0.5 \over 2} \right)$ needs to be replaced with $\min(0.5 - h_u, h_{u+1}-0.5)$ and the results follow by similar argument. $\Box$ \chapter*{APPENDIX B} \markboth{\MakeUppercase{APPENDIX B}}{} \addcontentsline{toc}{chapter}{APPENDIX B} %\section{Table of Notation for Chapter 3} \begin{table}[h]\small {\begin{tabular}{@{}ccl}\hline & $N$ & the number of nodes in a river system\\ & $I$ & the index set, $I = \{1,2,..., N\}$\\ & $M$ & the number of monitoring devices ($M < N$)\\ & $\xv$ & a solution vector representing locations of \\ & &$M$ monitoring devices.\\ & $t_d(x_u)$ & spill detection time at the monitoring location $x_u$\\ PROBLEM & $S^{S}$ & spill starting time\\ FORMULATION & $d(x_u)$ & if detected at $x_u$, $d(x_u)=t_d(x_u) - S^{S}$; otherwise, $d(x_u)=\infty$\\ & $t(\xv)$ & the minimum elapsed detection time for $\xv$\\ & & (i.e., $t(\xv)=\min_{1\leq u \leq M} d(x_u)$)\\ & $R(\xv)$ & if a spill is detected, $R(\xv)=1$; otherwise, $R(\xv)=0$\\ &$t_i(\xv)$& an observed value of $t(\xv)$ from the $i$th simulation run\\ &$R_i(\xv)$& an observed value of $R(\xv)$ from the $i$th simulation run\\ \hline & $C_{th}$& detection threshold of a monitoring device\\ & $L_i^S$& spill location for the $i$th simulation run\\ PROCESS & $I_i^S$& spill intensity for the $i$th simulation run\\ SIMULATION & $S_i^S$& spill starting time for the $i$th simulation run\\ &$P^{im}_R$& a rain pattern for the $m$th sub-catchment in the \\ & &$i$th simulation run\\ &$P^{i}_R$& $(P^{i1}_R, \ldots, P^{iM^{s}}_R)$ where $M^S$ is the number of sub-catchment\\ \hline &$v_k(\xv)$& the number of visits to $\xv$ up to iteration $k$\\ DOvS &$n_{v_k}(\xv)$& the total number of observations obtained up to \\ & &iteration $k$ for $\xv$\\ $\&$ PFM &$\Tkbar(\xv)$& conditional sample mean of $t_i(\xv)$ up to iteration $k$\\ & & given that a spill is detected\\ &$\Rkbar(\xv)$& cumulative sample mean of $R_i(\xv)$ up to iteration $k$\\ \hline &${\cal{R}}_k$& the most promising region at search iteration $k$\\ &${\cal{R}}_k (\ell)$& the $\ell$th subregion at search iteration $k$\\ &$\Theta \setminus {\cal{R}}_k$& the surrounding region at search iteration $k$\\ &$\Sk$& the set of solutions sampled at iteration $k$\\ NP+PFM &$\Vs_{k}$& the set of all solutions visited up to iteration $k$\\ &$\omega$& the number of subregions\\ &$\tau_k$& the total number of sampled solutions at iteration $k$\\ &$\tau_k (\ell)$& the number of sampled solutions at iteration $k$ \\ & &from subregion $\ell$\\ &$\hat{\xv}_k^*$& the current best solution obtained by NP+PFM \\ & &(i.e., a solution with the smallest $Z_k(\xv)$ at iteration $k$)\\ \hline \end{tabular}} \end{table} %\chapter{Some Ancillary Stuff} %Ancillary material should be put in appendices, which %appear just before the bibliography. \newpage \markboth{\MakeUppercase{REFERENCES}}{} \addcontentsline{toc}{chapter}{REFERENCES} \bibliographystyle{plain} \bibliography{parkbib} %\begin{postliminary} %\references %\postfacesection{Index}{% %% ... generate an index here %% look into gatech-thesis-index.sty %} %\begin{vita} %Chuljin Park was born in .. %\end{vita} %\end{postliminary} \end{document}