Modeling high dimensional multi-stream data for monitoring and prediction
MetadataShow full item record
This dissertation concentrates on solving problems related to monitoring and predicting high-dimensional, streaming data using new data mining methods. Of the plethora of problems that exist, this dissertation attempts to focus on three of them. As a first topic, we propose a new monitoring and diagnosis approach based on PCA for monitoring high-dimensional, multi-stream data. For monitoring, one commonly used method in high dimensions are based on Principal Component Analysis (PCA). For PCA-based monitoring, most of the existing methods focus on PCs with the highest variance. However, we argue that this is an inappropriate approach for the purpose of monitoring. Therefore, we show that adaptively chosen PCs are significantly better for process monitoring. Consequently, we develop a novel monitoring method based on this principle named Adaptive PC Selection (APC). More importantly, we integrate a novel diagnostic approach to enable a streamlined SPC. The PC-based Signal Recovery (PCSR) diagnostics approach draws inspiration from Compressed Sensing to use Adaptive Lasso for identifying the sparse change in the process. We theoretically motivate our approach and do performance evaluation of our integrated Monitoring and Diagnostics method through simulation and case studies. For the second topic, we propose a novel methodology for dynamically monitoring sparse networks. For this, we focus on modeling the network connections in financial institutions. The interconnectedness of financial institutions can function as a mechanism for the propagation and amplification of shocks throughout the economy, thus contributing to financial crises. As such, network analysis has become a critical tool for assessing interconnectedness and systemic risk levels. Hence, we create a monitoring system to detect changes within a sequence of sparse networks constructed from an interbank lending market in the European Union. The approach combines a state space model with the Hurdle model to capture temporal dynamics of the edge formation process, which is modeled as a function of node and edge attributes and estimated using an extended Kalman Filter. Afterwards, Exponential Weighted Moving Average (EWMA) control charts are used to monitor the network sequence in real time in order to distinguish the gradual change resulting from the typical edge dynamics from abrupt changes in trading patterns caused by fundamental changes in market conditions. We find that the proposed methodology would have raised alarms for regulators prior to several key events and announcements by the European Central Bank during the 2007-2009 financial crisis, demonstrating the promise of the approach as an early warning system. In the last topic, we propose a novel deep learning approach for classification of multimedia data. The method is the extension to the Classification Restricted Boltzmann Machine (ClassRBM). The Restricted Boltzmann Machines (RBMs) and Deep Boltzmann Machines (DBMs) methods have been successfully applied for unsupervised learning. A new discriminative RBM for supervised learning known as Classification RBM (ClassRBM) was proposed in 2008. Due to estimation intractability, an effective deep extension of ClassRBM has not been used in the literature. In this chapter, we introduce an estimation approach for learning the weights in deep ClassRBM (ClassDBM) based on mean field inference and Gibbs sampling. Besides, for predicting the class label of new observations, we introduce a new prediction algorithm based on mean field inference. Lastly, we implement our proposed method on two benchmark data and advertisement multimedia data for validation.