Any-way and Sparse Analyses for Multimodal Fusion and Imaging Genomics
MetadataShow full item record
This dissertation aims to develop new algorithms that leverage sparsity and mutual information across data modalities built upon the independent component analysis (ICA) framework to improve the performance of current ICA-based multimodal fusion approaches. These algorithms are further applied to both simulated data and real neuroimaging and genomic data to examine their performance. The identified neuroimaging and genomic patterns can help better delineate the pathology of mental disorders or brain development. To alleviate the signal-background separation difficulties in infomax-decomposed sources for genomic data, we propose a sparse infomax by enhancing a robust sparsity measure, the Hoyer index. Hoyer index is scale-invariant and well suited for ICA frameworks since the scale of decomposed sources is arbitrary. Simulation results demonstrate that sparse infomax increases the component detection accuracy for situations where the source signal-to-background (SBR) ratio is low, particularly for single nucleotide polymorphism (SNP) data. The proposed sparse infomax is further extended into two data modalities as a sparse parallel ICA for applications to imaging genomics in order to investigate the associations between brain imaging and genomics. Simulation results show that sparse parallel ICA outperforms parallel ICA with improved accuracy for structural magnetic resonance imaging (sMRI)-SNP association detection and component spatial map recovery, as well as with enhanced sparsity for sMRI and SNP components under noisy cases. Applying the proposed sparse parallel ICA to fuse the whole-brain sMRI and whole-genome SNP data of 24985 participants in the UK biobank, we identify three stable and replicable sMRI-SNP pairs. The identified sMRI components highlight frontal, parietal, and temporal regions and associate with multiple cognitive measures (with different association strengths in different age groups for the temporal component). Top SNPs in the identified SNP factor are enriched in inflammatory disease and inflammatory response pathways, which also regulate gene expression, isoform percentage, transcription expression, or methylation level in the frontal region, and the regulation effects are significantly enriched. Applying the proposed sparse parallel ICA to imaging genomics in attention-deficit/hyperactivity disorder (ADHD), we identify and replicate one SNP component related to gray matter volume (GMV) alterations in superior and middle frontal gyri underlying working memory deficit in adults and adolescents with ADHD. The association is more significant in ADHD families than controls and stronger in adults and older adolescents than younger ones. The identified SNP component highlights SNPs in long non-coding RNAs (lncRNAs) in chromosome 5 and in several protein-coding genes that are involved in ADHD, such as MEF2C, CADM2, and CADPS2. Top SNPs are enriched in human brain neuron cells and regulate gene expression, isoform percentage, transcription expression, or methylation level in the frontal region. Moreover, to increase the flexibility and robustness in mining multimodal data, we propose aNy-way ICA, which optimizes the entire correlation structure of linked components across any number of modalities via the Gaussian independent vector analysis and simultaneously optimizes independence via separate (parallel) ICAs. Simulation results demonstrate that aNy-way ICA recover sources and loadings, as well as the true covariance patterns with improved accuracy compared to existing multimodal fusion approaches, especially under noisy conditions. Applying the proposed aNy-way ICA to integrate structural MRI, fractal n-back, and emotion identification task functional MRIs collected in the Philadelphia Neurodevelopmental Cohort (PNC), we identify and replicate one linked GMV-threat-2-back component, and the threat and 2-back components are related to intelligence quotient (IQ) score in both discovery and replication samples. Lastly, we extend the proposed aNy-way ICA with a reference constraint to enable prior-guided multimodal fusion. Simulation results show that aNy-way ICA with reference recovers the designed linkages between reference and modalities, cross-modality correlations, as well as loading and component matrices with improved accuracy compared to multi-site canonical correlation analysis with reference (MCCAR)+joint ICA under noisy conditions. Applying aNy-way ICA with reference to supervise structural MRI, fractal n-back, and emotion identification task functional MRIs fusion in PNC with IQ as the reference, we identify and replicate one IQ-related GMV-threat-2-back component, and this component is significantly correlated across modalities in both discovery and replication samples.