• Login
    View Item 
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Penalized method based on representatives and nonparametric analysis of gap data

    Thumbnail
    View/Open
    park_soyoun_201012_phd.pdf (2.945Mb)
    Date
    2010-09-14
    Author
    Park, Soyoun
    Metadata
    Show full item record
    Abstract
    When there are a large number of predictors and few observations, building a regression model to explain the behavior of a response variable such as a patient's medical condition is very challenging. This is a "p ≫n " variable selection problem encountered often in modern applied statistics and data mining. Chapter one of this thesis proposes a rigorous procedure which groups predictors into clusters of "highly-correlated" variables, selects a representative from each cluster, and uses a subset of the representatives for regression modeling. The proposed Penalized method based on Representatives (PR) extends the Lasso for the p ≫ n data and highly correlated variables, to build a sparse model practically interpretable and maintain prediction quality. Moreover, we provide the PR-Sequential Grouped Regression (PR-SGR) to make computation of the PR procedure efficient. Simulation studies show the proposed method outperforms existing methods such as the Lasso/Lars. A real-life example from a mental health diagnosis illustrates the applicability of the PR-SGR. In the second part of the thesis, we study the analysis of time-to-event data called a gap data when missing time intervals (gaps) possibly happen prior to the first observed event time. If a gap occurs prior to the first observed event, then the first observed event may or may not be the first true event. This incomplete knowledge makes the gap data different from the well-studied regular interval censored data. We propose a Non-Parametric Estimate for the Gap data (NPEG) to estimate the survival function for the first true event time, derive its analytic properties and demonstrate its performance in simulations. We also extend the Imputed Empirical Estimating method (IEE), which is an existing nonparametric method for the gap data up to one gap, to handle the gap data with multiple gaps.
    URI
    http://hdl.handle.net/1853/37307
    Collections
    • Georgia Tech Theses and Dissertations [23877]
    • School of Industrial and Systems Engineering Theses and Dissertations [1457]

    Browse

    All of SMARTechCommunities & CollectionsDatesAuthorsTitlesSubjectsTypesThis CollectionDatesAuthorsTitlesSubjectsTypes

    My SMARTech

    Login

    Statistics

    View Usage StatisticsView Google Analytics Statistics
    facebook instagram twitter youtube
    • My Account
    • Contact us
    • Directory
    • Campus Map
    • Support/Give
    • Library Accessibility
      • About SMARTech
      • SMARTech Terms of Use
    Georgia Tech Library266 4th Street NW, Atlanta, GA 30332
    404.894.4500
    • Emergency Information
    • Legal and Privacy Information
    • Human Trafficking Notice
    • Accessibility
    • Accountability
    • Accreditation
    • Employment
    © 2020 Georgia Institute of Technology