Enabling precision medicine by integrating multi-modal biomedical data
MetadataShow full item record
With the advancement of technologies such as high-throughput sequencing, a massive amount of multi-modal biomedical data has been generated at an unprecedented speed and volume every year. However, extracting information and obtaining knowledge from these multi-modal biomedical data remains a major challenge in research and clinical applications. Multiple computational approaches were proposed for multi-modal biomedical data integration, aiming to combine data from disparate sources to increase the value of data and improve data integrity. Multi-modal biomedical data were hypothesized to contain both dependent and independent information based on multi-view learning's consensus and complementary principles. For modalities with independent information or few connections (e.g., genetic factors vs. environmental factors), the complementary principle was utilized to integrate data from different modalities by concatenating the hidden features learned with independent feature representation. Thus, the unique information in each data modality can jointly contribute to the final decision. The proposed framework has been applied to integrate electronic health records (EHRs) with MRI Imaging and single nucleotide polymorphisms (SNPs) data for improved prediction of Alzheimer's Disease. For modalities with dependent information (e.g., multi-omics data), the complex interactions between modalities were modeled implicitly with the consensus principle. As features from dependent modalities are connected by either association or causal relationships, they can be integrated by the consensus principle to improve the robustness and eliminate inconsistencies. A consensus regularization was achieved by requiring the features encoded from various modalities of the same subject to consent in a common feature space. The proposed frameworks have been applied to integrate multi-omics data (e.g., mRNA expression, DNA methylation, miRNA expression, and copy number variations) for improved breast cancer overall survival prediction. Generalized data integration models such as autoencoder-based semi-supervised learning frameworks have also been explored to improve computer-aided decision support performance. By integrating multi-modal biomedical data with the proposed frameworks, the healthcare quality is expected to be improved with a more comprehensive evaluation of the patient.