Show simple item record

dc.contributor.authorKonstantinidis, Kostas
dc.contributor.authorRodriguez, Luis M.
dc.coverage.temporalJune 2012 - July 2012
dc.date.accessioned2014-01-31T16:17:38Z
dc.date.available2014-01-31T16:17:38Z
dc.date.issued2014-01-31
dc.identifier.citationRodriguez-R LM, Konstantinidis KT. (2013). Nonpareil: a redundancy-based approach to assess the level of coverage in metagenomic datasets. Bioinformatics btt584. doi: 10.1093/bioinformatics/btt584en_US
dc.identifier.urihttp://hdl.handle.net/1853/50771
dc.descriptionThe files provide simulated metagenomic datasets, generated in silico from complete bacterial and archeal genomes. The read length and frequency of errors are based on Illumina technology. The objective of these simulated datasets was to evaluate the performance of Nonpareil, an algorithm and implementation designed to estimate the average coverage of metagenomic datasets. Nonpareil is described in the following publication: Abstract is from related publication, Rodriguez-R LM, Konstantinidis KT. (2013). Nonpareil: a redundancy-based approach to assess the level of coverage in metagenomic datasets. Bioinformatics btt584. doi: 10.1093/bioinformatics/btt584. Nonpareil, the method described in the related publication, which was tested using these data, can be found at https://github.com/lmrodriguezr/nonpareil/ under the Artistic License 2.0.en_US
dc.descriptionThese files are part of a larger collection of datasets, 10 in total, which were produced using 130 genomes from the genus Escherichia, simulating evironments with extremely low phylogentic diversity, but high micro-diversity. In order to recreate the full collection, please also see the additional tiers of files, Escherichia Genomes -- Set 1, 2, 3, 4, and 5. Each tier contains one "README.txt" file in raw text format, as well as paired files with the same prefix. Those files ending with ".fa.gz" are the sequences of the simulated dataset in the FastA/gzipped format, and those files ending with ".genomes" are the tables of abundance per molecule. All files are packaged in a zipped file, and may need to be extracted before they can be used.en_US
dc.description.abstractMotivation: Determining the fraction of the diversity within a microbial community sampled and the amount of sequencing required to cover the total diversity represent challenging issues for metagenomics studies. Due to these limitations, central ecological questions with respect to the global distribution of microbes and the functional diversity of their communities cannot be robustly assessed. Results: We introduce Nonpareil, a method to estimate and project coverage in metagenomes. Nonpareil does not rely on high-quality assemblies, OTU calling, or comprehensive reference databases; thus, it is broadly applicable to metagenomic studies. Application of Nonpareil on available metagenomic datasets provided estimates on the relative complexity of soil, freshwater and human microbiome communities, and suggested that about 200Gb of sequencing data are required for 95% abundance-weighted average coverage of the soil communities analyzed.en_US
dc.description.sponsorshipUnited States. Department of Energyen_US
dc.description.sponsorshipNational Science Foundation (U.S.)en_US
dc.language.isoen_USen_US
dc.publisherGeorgia Institute of Technologyen_US
dc.subjectEnvironmental and clinical microbiologyen_US
dc.subjectBioinformatics applicationsen_US
dc.subjectSequence analysisen_US
dc.subjectMetagenomicsen_US
dc.subjectOperational taxonomic unitsen_US
dc.subjectNonpareilen_US
dc.titleEscherichia Genomes -- Set 6en_US
dc.typeDataseten_US
dc.contributor.corporatenameGeorgia Institute of Technology. School of Civil and Environmental Engineeringen_US
dc.contributor.corporatenameGeorgia Institute of Technology. School of Biologyen_US
dc.embargo.termsnullen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record