• Login
    View Item 
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Robust sparse learning and monitoring of high-dimensional data

    Thumbnail
    View/Open
    ZHANG-DISSERTATION-2019.pdf (1.171Mb)
    Date
    2019-05-16
    Author
    Zhang, Ruizhi
    Metadata
    Show full item record
    Abstract
    With the rapid development of advanced sensing technology, rich and complex real-time high-dimensional streaming data are available in many systems, such as manufacturing, wireless communication, biosurveillance, and social systems. As information is accumulated over time at a fast rate by multiple sensors, it is highly desirable to develop efficient methodologies that enable to (1) extract informatic features, (2) learn the process status and detect possible changes or faults quickly, (3) implement and compute online fast, (4) be robust to outliers or model misspecification. Therefore, efficient robust and scalable schemes and algorithms, which enable real-time monitoring of high-dimensional data streams, are highly demanded. This thesis focuses on statistical modeling to extract informative and robust features, to interpret the characteristic of the system, and to develop efficient and robust monitoring schemes that can be implemented recursively and in parallel to reduce unnecessary transition costs in the data fusion systems. The methodologies developed in the thesis are generic and can be applied to a variety of fields ranging from manufacturing processes (e.g. forging, stamping processes, semiconductor process), where functional profile data are observed sequentially, to video monitoring (e.g. Solar flare detection), where image data are collected for sequential decision making. This thesis starts with theoretical research on change-point detection and robust M-estimation. In Chapter 1, we propose a scalable robust monitoring scheme that can detect the small but systematic change of the system efficiently and in real-time when there are some random transient outliers. We construct a new robust local detection statistic called $L_{\alpha}$-CUSUM statistic that can reduce the effect of outliers by using the Box-Cox transformation of the likelihood function. Moreover, we propose a new concept called false-alarm breakdown point to measure the robustness of online monitoring schemes and characterize the breakdown point of our proposed schemes. In Chapter 2, we develop some families of communication-efficient schemes for monitoring large-scale data streams. We use some shrinkage transformations such as soft-thresholding, hard-thresholding and order-thresholding on the local monitoring statistics so that to filter out unaffected data streams and save communication costs in the data fusion networks. Moreover, we conduct the detection delay analysis on our proposed schemes in both classical low-dimensional regime and modern high-dimensional regime and show that under certain conditions, our schemes are asymptotical optimal by only receiving a small proportion of data, which can reduce the transition costs. In Chapter 3, we investigate two important properties of M-estimator, namely, robustness and tractability, in linear regression setting, when the observations are contaminated by some arbitrary outliers. By learning the landscape of the empirical risk, we show that under mild conditions when the percentage of outliers is small, many M-estimators enjoy nice robustness, which means the estimator is close to the true underlying parameter, and tractability properties, which means the estimator can be computed efficiently, even if the loss function is non-convex. Then, in Chapter 4, we work on the applied research on nonlinear profile monitoring based on discrete Wavelet transform. We proposed the recursive CUSUM procedure that can learn the out-of-control parameters adaptively and detect unknown change efficiently. In Chapter 5, we develop a functional Poisson regression model for papers’ cumulative citations data. Based on our model, we can fit and learn the individual paper’s citation characteristic well. Our proposed model is also used for clustering different citation patterns, which can provide implications for bibliometric studies and research evaluations. Finally, we summarize our original contributions and future research plans in Chapter 6.
    URI
    http://hdl.handle.net/1853/61721
    Collections
    • Georgia Tech Theses and Dissertations [23877]
    • School of Industrial and Systems Engineering Theses and Dissertations [1457]

    Browse

    All of SMARTechCommunities & CollectionsDatesAuthorsTitlesSubjectsTypesThis CollectionDatesAuthorsTitlesSubjectsTypes

    My SMARTech

    Login

    Statistics

    View Usage StatisticsView Google Analytics Statistics
    facebook instagram twitter youtube
    • My Account
    • Contact us
    • Directory
    • Campus Map
    • Support/Give
    • Library Accessibility
      • About SMARTech
      • SMARTech Terms of Use
    Georgia Tech Library266 4th Street NW, Atlanta, GA 30332
    404.894.4500
    • Emergency Information
    • Legal and Privacy Information
    • Human Trafficking Notice
    • Accessibility
    • Accountability
    • Accreditation
    • Employment
    © 2020 Georgia Institute of Technology