Development of algorithms for metagenomics and applications to the study of evolutionary processes that maintain microbial biodiversity
MetadataShow full item record
Understanding microbial evolution lies at the heart of microbiology and environmental sciences. Numerous studies have been dedicated to elucidating the underlying mechanisms that create microbial genetic diversity and adaptation. However, due to technical limitations such as the high level of uncultured cells in almost every natural habitat, most of current knowledge is primarily based on axenic cultures grown under laboratory conditions, which typically do not simulate well the natural environment. How well the knowledge from isolates translates to in-situ processes and natural microbial communities remains essentially speculative. The recent development of culture-independent genomic techniques (aka metagenomics) provides possibilities to bypass some of these limitations and provide new insights into microbial evolution in-situ. To date, most of metagenomic studies have been focused on a few reduced-diversity model communities, e.g., acid mine drainage. Highly complex communities such as those of soil and sediment habitats remain comparatively less understood. Furthermore, a great power of metagenomics, which has not been fully capitalized yet, is the ability to follow the evolution of natural microbial communities over time and environmental perturbations, i.e., times-series metagenomics. Although the recent developments in DNA sequencing technologies have enabled (inexpensive) time-series studies, the bioinformatics approaches to analyze the resulting data have clearly fallen behind. Taken together, to scale up metagenomics for complex community studies, three major challenges remain: 1) the difficulty to process and analyze massive short read sequencing data, often at the terabyte level; 2) the difficulty to effectively assemble genomes from complex metagenomes; and 3) the lack of methods for tracking genotypes and mutational events such as horizontal gene transfer (HGT) through time. Therefore, developing efficient bioinformatics approaches to address these challenges represents an important and timely issue. This thesis aimed to develop novel bioinformatics pipelines and algorithms for high performance computing, and, subsequently, apply these tools to natural microbial communities to generate quantitative insights into the relative importance of the molecular mechanisms creating or maintaining microbial diversity. The tools are not specific to a particular habitat or group of organisms and thus, can be broadly used to advance our understanding of microbial evolution in different settings. In particular, the comparative whole-genome analysis of 24 Escherichia isolates form various habitats, including human and non-human associated habitats such as freshwater ecosystems and beaches, showed that organisms with more similar ecologies tend to exchange more genes, which has important implications for the prokaryotic species concept. To more directly test these findings from isolates and quantify the patterns of genetic exchange among co-occurring populations, three years of time-series metagenomics data from planktonic samples from Lake Lanier (Atlanta, GA) were analyzed. For this, it was first important to develop bioinformatics algorithms to robustly assemble population genomes from complex community metagenomes, identify the phylogenetic affiliation of assembled genome and contig sequences, and detect horizontal gene transfer among these sequences. Using these novel algorithms, in situ bacterial lineage evolution was quantitatively assessed, especially with respect to whether or not ecologically distinct lineages evolve according to the recently proposed fragmented speciation model (Retchless and Lawrence, Science 2008). Evidence in support of this model was rarely observed. Instead, it appeared that rampant HGT disseminated ecologically important genes within the population, maintaining intra-population diversity. By expanding the previous approaches to include methods to assess differential gene abundance and selection pressure between samples, it was possible to quantify how soil microbial communities respond to a decade of warming by 2 0C, which simulated the predicted effects of climate change. It was found that the heated communities showed significant shifts in composition and predicted metabolism, reflecting the release of additional soil carbon compared to the unheated (control) communities, and these shifts were community-wide as opposed to being attributable to a few taxa. These findings indicated that the microbial communities of temperate grassland soils play important roles in mediating the feedback responses to climate change. Collectively, the findings presented here advance our understanding of the modes and tempo of microbial community adaptation to environmental perturbations and have important implications for better modeling the microbial diversity on the planet. The bioinformatics algorithms and approaches developed as part of this thesis are expected to facilitate future genomic and metagenomic studies across the fields of microbiology, ecology, evolution and engineering.