Efficient and distributed computational methods for complex systems
MetadataShow full item record
Many statistical inference problems for large-scale complex systems involve using analytical tools, such as statistics, mathematical optimization and machine learning algorithms. We demonstrate the importance of innovation in computational statistics with rigorous analysis of large medical datasets. Every patient in any medical setting generates an invaluable data point that can contribute to understanding what works, for whom and where. Developing analytical methods to translate these types of data into meaningful knowledge is crucial to help us better understand behavior patterns in seeking care and adherence to recommended care guidelines, and derive knowledge for decision support. There are several challenges in modeling complex systems like the health care system. First, while different components of the system are mostly heterogeneous, there are much homogeneity in characteristics that can be explored. Second, interactions between different components interdependently give rise to collaborative patterns in the system. Third, the quality of the data is polluted by unquantifiable random noise or errors. Fourth, the datasets are often extremely large in scale and dimensionality, since advanced technology allows us to collect and store every detailed information about each sample. Therefore, it has become of central importance to develop scalable computational algorithms that can describe, profile, and model these systems and help make robust decisions. One remedy for the computational challenge is to utilize the power of distributed computing and distributed data storage. In this dissertation thesis, we propose several computational efficient methods that model the complex health systems in different settings. In Chapter 2, we introduce a framework for analyzing and visualizing the healthcare utilization for millions of children, with a focus on pediatric asthma. Using individual-level claims data across 10 southeast states for the Medicaid system, we model the heterogeneity in patients’ multi-year longitudinal utilization patterns via mixture Markov renewal processes. In Chapter 3, we introduce a regularized optimization approach to control the trade-off between optimality and sensitivity of the solution to large-scale optimization problems that has intrinsic spatial structure among decision variables. We illustrate the proposed approach using a specific application in health care access measurement, in which a smooth solution that is robust to perturbations of model parameter leads to reliable decision-making. In Chapter 4, we propose a novel method to find a partition of decision variables for decomposing large-scale optimization problems, focusing on minimizing the number of dualized constraints. We present an improved variation of the distributed sub-gradient method using block dual decomposition. In Chapter 5, we develop a computationally tractable algorithm for clustering spatially dependent data using the EM algorithm, and cluster the prevalence of chronic conditions among children with Medicaid in the entire United States at the community level. The implementation of the spatial clustering approach relies on distributed computing to overcome the computational effort needed to perform the analysis.