H. Milton Stewart School of Industrial and Systems Engineering (ISyE)
http://hdl.handle.net/1853/6047
Industrial engineering (IE) and operations research (OR) are fields of study intended for individuals who are interested in analyzing complex systems, formulating abstract models of these systems, and solving them with the intention of improving system performance.2019-06-18T00:59:27ZSimulation optimization under input uncertainty: formulations, algorithms, and insights
http://hdl.handle.net/1853/61252
Simulation optimization under input uncertainty: formulations, algorithms, and insights
Wu, Di
Simulation optimization is concerned with identifying the best design for large, complex and stochastic physical systems via computer simulation models. In building a stochastic simulation model, one often needs to specify a set of distributions, known as "input distributions". However, since these distributions are usually estimated using finite real-world data, the simulation output is subject to the so-called "input uncertainty". Existing studies indicate that ignoring input uncertainty can cause a higher risk of selecting an inferior design in simulation optimization. This thesis is therefore devoted to addressing input uncertainty in the context of simulation optimization by proposing new formulations, devising new algorithms, and developing insights for improving existing algorithms.
In Chapter 2, we study a simulation optimization problem with a general design space. The scenario of interest is when a set of fixed input data is given and no additional data is available. To hedge against the risk of input uncertainty, we propose a Bayesian Risk Optimization (BRO) framework that (i) models input uncertainty using posterior distributions; (ii) incorporate a decision maker's risk preference via risk measures such as value-at-risk (VaR) and conditional value-at-risk (CVaR). We establish the asymptotic properties of BRO, and reveal that BRO essentially seeks a balance between predicted average performance and the uncertainty in a design's actual performance.
Chapter 3 considers optimizing over a finite design space, a problem known as Ranking and Selection (R&S) in statistics literature, or Best-Arm Identification in Multi-Armed Bandits literature. We look closely into the classical fixed budget R&S without input uncertainty, which will be a fundamental building block of Chapter 4 for studying R&S under input uncertainty. Specifically, we investigate the performance of a widely used algorithm called Optimal Computing Budget Allocation (OCBA). Our analysis leads to a surprising insight: the popular implementation of OCBA suffers from a slow convergence rate. We then propose a modification to boost its performance, where the improvement is shown by a theoretical bound and numerical results. In addition, we explicitly characterize the convergence rate of several simplified algorithms, showcasing some interesting findings as well as useful techniques for convergence analysis.
In Chapter 4, we study R&S under input uncertainty where additional data can be acquired to reduce input uncertainty. To the best of our knowledge, this setting has rarely been studied in existing literature. Two classical formulations of R&S, i.e., fixed confidence and fixed budget, are extended to our new settings. New algorithms are developed to (i) achieve a statistical selection guarantee when new data arrive sequentially; (ii) efficiently allocate a finite budget between data collection and simulation experimentation. Theoretical guarantees are provided for our algorithms, and numerical results demonstrate their effectiveness.
2019-04-01T00:00:00ZWu, DiSimulation optimization is concerned with identifying the best design for large, complex and stochastic physical systems via computer simulation models. In building a stochastic simulation model, one often needs to specify a set of distributions, known as "input distributions". However, since these distributions are usually estimated using finite real-world data, the simulation output is subject to the so-called "input uncertainty". Existing studies indicate that ignoring input uncertainty can cause a higher risk of selecting an inferior design in simulation optimization. This thesis is therefore devoted to addressing input uncertainty in the context of simulation optimization by proposing new formulations, devising new algorithms, and developing insights for improving existing algorithms.
In Chapter 2, we study a simulation optimization problem with a general design space. The scenario of interest is when a set of fixed input data is given and no additional data is available. To hedge against the risk of input uncertainty, we propose a Bayesian Risk Optimization (BRO) framework that (i) models input uncertainty using posterior distributions; (ii) incorporate a decision maker's risk preference via risk measures such as value-at-risk (VaR) and conditional value-at-risk (CVaR). We establish the asymptotic properties of BRO, and reveal that BRO essentially seeks a balance between predicted average performance and the uncertainty in a design's actual performance.
Chapter 3 considers optimizing over a finite design space, a problem known as Ranking and Selection (R&S) in statistics literature, or Best-Arm Identification in Multi-Armed Bandits literature. We look closely into the classical fixed budget R&S without input uncertainty, which will be a fundamental building block of Chapter 4 for studying R&S under input uncertainty. Specifically, we investigate the performance of a widely used algorithm called Optimal Computing Budget Allocation (OCBA). Our analysis leads to a surprising insight: the popular implementation of OCBA suffers from a slow convergence rate. We then propose a modification to boost its performance, where the improvement is shown by a theoretical bound and numerical results. In addition, we explicitly characterize the convergence rate of several simplified algorithms, showcasing some interesting findings as well as useful techniques for convergence analysis.
In Chapter 4, we study R&S under input uncertainty where additional data can be acquired to reduce input uncertainty. To the best of our knowledge, this setting has rarely been studied in existing literature. Two classical formulations of R&S, i.e., fixed confidence and fixed budget, are extended to our new settings. New algorithms are developed to (i) achieve a statistical selection guarantee when new data arrive sequentially; (ii) efficiently allocate a finite budget between data collection and simulation experimentation. Theoretical guarantees are provided for our algorithms, and numerical results demonstrate their effectiveness.SPATIO-TEMPORAL CHANGE-POINT DETECTION AND CONSTRAINED BAYESIAN OPTIMIZATION
http://hdl.handle.net/1853/61247
SPATIO-TEMPORAL CHANGE-POINT DETECTION AND CONSTRAINED BAYESIAN OPTIMIZATION
Chen, Junzhuo
This thesis makes contributions to two research topics: spatio-temporal change-point detection and constrained Bayesian optimization. Spatio-temporal change-point detection is concerned with detecting statistical anomalies based on multiple data streams collected at different locations. In Chapter 2 and Chapter 3, we address two challenges in spatio-temporal change-point detection: (i) how to deal with data with high dimensionality, and (ii) how to capture spatial and temporal correlations. Bayesian optimization is a prevalent approach for optimization problems defined by expensive-to-evaluate black-box functions. In Chapter 4, we develop a practical algorithm for optimization problems with black-box objective function and constraints.
2019-04-02T00:00:00ZChen, JunzhuoThis thesis makes contributions to two research topics: spatio-temporal change-point detection and constrained Bayesian optimization. Spatio-temporal change-point detection is concerned with detecting statistical anomalies based on multiple data streams collected at different locations. In Chapter 2 and Chapter 3, we address two challenges in spatio-temporal change-point detection: (i) how to deal with data with high dimensionality, and (ii) how to capture spatial and temporal correlations. Bayesian optimization is a prevalent approach for optimization problems defined by expensive-to-evaluate black-box functions. In Chapter 4, we develop a practical algorithm for optimization problems with black-box objective function and constraints.Assessing Self-Similarity in Redundant Complex and Quaternion Wavelet Domains: Theory and Applications
http://hdl.handle.net/1853/61244
Assessing Self-Similarity in Redundant Complex and Quaternion Wavelet Domains: Theory and Applications
Kong, Tae Woon
Theoretical self-similar processes have been an essential tool for modeling a wide range of real-world signals or images that describe phenomena in engineering, physics, medicine, biology, economics, geology, chemistry, and so on. However, it is often difficult for general modeling methods to quantify a self-similarity due to irregularities in the signals or images. Wavelet-based spectral tools have become standard solutions for such problems in signal and image processing and achieved outstanding performances in real applications.
This thesis proposes three novel wavelet-based spectral tools to improve the assessment of self-similarity.
First, we propose spectral tools based on non-decimated complex wavelet transforms implemented by their matrix formulation. A structural redundancy in non-decimated wavelets and a componential redundancy in complex wavelets act in a synergy when extracting wavelet-based informative descriptors.
Next, we step into the quaternion domain and propose a matrix-formulation for non-decimated quaternion wavelet transforms and define spectral tools for use in machine learning tasks. We define non-decimated quaternion wavelet spectra based on the modulus and three phase-dependent statistics as low-dimensional summaries for 1-D signals or 2-D images.
Finally, we suggest a dual wavelet spectra based on non-decimated wavelet transform in real, complex, and quaternion domains. This spectra is derived from a new perspective that draws on the link of energies of the signal with the temporal or spatial scales in the multiscale representations.
2019-03-25T00:00:00ZKong, Tae WoonTheoretical self-similar processes have been an essential tool for modeling a wide range of real-world signals or images that describe phenomena in engineering, physics, medicine, biology, economics, geology, chemistry, and so on. However, it is often difficult for general modeling methods to quantify a self-similarity due to irregularities in the signals or images. Wavelet-based spectral tools have become standard solutions for such problems in signal and image processing and achieved outstanding performances in real applications.
This thesis proposes three novel wavelet-based spectral tools to improve the assessment of self-similarity.
First, we propose spectral tools based on non-decimated complex wavelet transforms implemented by their matrix formulation. A structural redundancy in non-decimated wavelets and a componential redundancy in complex wavelets act in a synergy when extracting wavelet-based informative descriptors.
Next, we step into the quaternion domain and propose a matrix-formulation for non-decimated quaternion wavelet transforms and define spectral tools for use in machine learning tasks. We define non-decimated quaternion wavelet spectra based on the modulus and three phase-dependent statistics as low-dimensional summaries for 1-D signals or 2-D images.
Finally, we suggest a dual wavelet spectra based on non-decimated wavelet transform in real, complex, and quaternion domains. This spectra is derived from a new perspective that draws on the link of energies of the signal with the temporal or spatial scales in the multiscale representations.STATISTICAL MODELING AND EXPERIMENTAL DESIGN WITHCONTRIBUTIONS IN ENVIRONMENT, HEALTH CARE, AND E-COMMERCE
http://hdl.handle.net/1853/61225
STATISTICAL MODELING AND EXPERIMENTAL DESIGN WITHCONTRIBUTIONS IN ENVIRONMENT, HEALTH CARE, AND E-COMMERCE
Zhao, Yuanshuo
Design of experiment and statistical modeling have played an increasingly importantrole in science and business and received enormous attention from industries and research institutes. Motivated from real-world examples, this dissertation develops new statistical methodologies in the field of experimental design and causality inferences. First two chapters of this dissertation focus on online experimental design. E-commerce companies like Linkedin and Amazon perform hundreds of experiments each day, with the goal of testing certain website functions and design in order to best serve customers and maximize profits.New experiment design and testing scheme based on multi-armed bandit and conditional main-effect have been developed to let companies run experiment more efficiently. In chapter three, we develop a new statistical model based on combining information from physical experiment and computer experiment. The new method has been applied to model the Solar Irradiance data in the U.S. that were provided by IBM. Chapter four extends the linear G-formula method in the field of causality inference to non-linear set-up to study the causality relationship between physical activity level and health outcomes
2019-03-11T00:00:00ZZhao, YuanshuoDesign of experiment and statistical modeling have played an increasingly importantrole in science and business and received enormous attention from industries and research institutes. Motivated from real-world examples, this dissertation develops new statistical methodologies in the field of experimental design and causality inferences. First two chapters of this dissertation focus on online experimental design. E-commerce companies like Linkedin and Amazon perform hundreds of experiments each day, with the goal of testing certain website functions and design in order to best serve customers and maximize profits.New experiment design and testing scheme based on multi-armed bandit and conditional main-effect have been developed to let companies run experiment more efficiently. In chapter three, we develop a new statistical model based on combining information from physical experiment and computer experiment. The new method has been applied to model the Solar Irradiance data in the U.S. that were provided by IBM. Chapter four extends the linear G-formula method in the field of causality inference to non-linear set-up to study the causality relationship between physical activity level and health outcomes