Show simple item record

dc.contributor.authorPark, Soyounen_US
dc.date.accessioned2011-03-04T21:18:37Z
dc.date.available2011-03-04T21:18:37Z
dc.date.issued2010-09-14en_US
dc.identifier.urihttp://hdl.handle.net/1853/37307
dc.description.abstractWhen there are a large number of predictors and few observations, building a regression model to explain the behavior of a response variable such as a patient's medical condition is very challenging. This is a "p ≫n " variable selection problem encountered often in modern applied statistics and data mining. Chapter one of this thesis proposes a rigorous procedure which groups predictors into clusters of "highly-correlated" variables, selects a representative from each cluster, and uses a subset of the representatives for regression modeling. The proposed Penalized method based on Representatives (PR) extends the Lasso for the p ≫ n data and highly correlated variables, to build a sparse model practically interpretable and maintain prediction quality. Moreover, we provide the PR-Sequential Grouped Regression (PR-SGR) to make computation of the PR procedure efficient. Simulation studies show the proposed method outperforms existing methods such as the Lasso/Lars. A real-life example from a mental health diagnosis illustrates the applicability of the PR-SGR. In the second part of the thesis, we study the analysis of time-to-event data called a gap data when missing time intervals (gaps) possibly happen prior to the first observed event time. If a gap occurs prior to the first observed event, then the first observed event may or may not be the first true event. This incomplete knowledge makes the gap data different from the well-studied regular interval censored data. We propose a Non-Parametric Estimate for the Gap data (NPEG) to estimate the survival function for the first true event time, derive its analytic properties and demonstrate its performance in simulations. We also extend the Imputed Empirical Estimating method (IEE), which is an existing nonparametric method for the gap data up to one gap, to handle the gap data with multiple gaps.en_US
dc.publisherGeorgia Institute of Technologyen_US
dc.subjectSurvival functionen_US
dc.subjectGap dataen_US
dc.subjectRepresentativesen_US
dc.subjectPenalized methoden_US
dc.subject.lcshRegression analysis
dc.subject.lcshCorrelation (Statistics)
dc.subject.lcshSimulation methods
dc.titlePenalized method based on representatives and nonparametric analysis of gap dataen_US
dc.typeDissertationen_US
dc.description.degreePh.D.en_US
dc.contributor.departmentIndustrial and Systems Engineeringen_US
dc.description.advisorCommittee Chair: Lu, Jye-Chyi; Committee Member: Grover, Martha; Committee Member: Huo, Xiaoming; Committee Member: Mei, Yajun; Committee Member: Serban, Nicoletaen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record