Show simple item record

dc.contributor.authorLo, Shin-Lianen_US
dc.date.accessioned2011-03-04T20:21:52Z
dc.date.available2011-03-04T20:21:52Z
dc.date.issued2010-08-27en_US
dc.identifier.urihttp://hdl.handle.net/1853/37193
dc.description.abstractThis thesis consists of two parts. The first part focuses on high-dimensional classification problems in microarray experiments. The second part deals with forecasting problems with a large number of categories in predictors. Classification problems in microarray experiments refer to discriminating subjects with different biologic phenotypes or known tumor subtypes as well as to predicting the clinical outcomes or the prognostic stages of subjects. One important characteristic of microarray data is that the number of genes is much larger than the sample size. The penalized logistic regression method is known for simultaneous variable selection and classification. However, the performance of this method declines as the number of variables increases. With this concern, in the first study, we propose a new classification approach that employs the penalized logistic regression method iteratively with a controlled size of gene subsets to maintain variable selection consistency and classification accuracy. The second study is motivated by a modern microarray experiment that includes two layers of replicates. This new experimental setting causes most existing classification methods, including penalized logistic regression, not appropriate to be directly applied because the assumption of independent observations is violated. To solve this problem, we propose a new classification method by incorporating random effects into penalized logistic regression such that the heterogeneity among different experimental subjects and the correlations from repeated measurements can be taken into account. An efficient hybrid algorithm is introduced to tackle computational challenges in estimation and integration. Applications to a breast cancer study show that the proposed classification method obtains smaller models with higher prediction accuracy than the method based on the assumption of independent observations. The second part of this thesis develops a new forecasting approach for large-scale datasets associated with a large number of predictor categories and with predictor structures. The new approach, beyond conventional tree-based methods, incorporates a general linear model and hierarchical splits to make trees more comprehensive, efficient, and interpretable. Through an empirical study in the air cargo industry and a simulation study containing several different settings, the new approach produces higher forecasting accuracy and higher computational efficiency than existing tree-based methods.en_US
dc.publisherGeorgia Institute of Technologyen_US
dc.subjectClassificationen_US
dc.subjectMicroarray experimentsen_US
dc.subjectTree-based methodsen_US
dc.subjectVariable selectionen_US
dc.subjectPenalized logistic regressionen_US
dc.subjectForecastingen_US
dc.subject.lcshComputational biology
dc.subject.lcshBioinformatics
dc.subject.lcshPattern recognition systems
dc.subject.lcshDNA microarrays
dc.subject.lcshClassification
dc.subject.lcshLogistic regression analysis
dc.titleHigh-dimensional classification and attribute-based forecastingen_US
dc.typeDissertationen_US
dc.description.degreePh.D.en_US
dc.contributor.departmentIndustrial and Systems Engineeringen_US
dc.description.advisorCommittee Chair: Tsui, Kwok-Leung; Committee Co-Chair: Hung, Ying; Committee Member: Abayomi, Kobi A.; Committee Member: Goldsman, David M.; Committee Member: Yuan, Mingen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record