The files in this collection are simulated metagenomic datasets, generated in silico from complete bacterial and archeal genomes. The read length and frequency of errors are based on Illumina technology. The objective of these simulated datasets was to evaluate the performance of Nonpareil, an algorithm and implementation designed to estimate the average coverage of metagenomic datasets.

These data were collected from the National Center for Biotechnology Information (NCBI) database GenBank, which was designed to provide and encourage access within the scientific community to sources of current and comprehensive information. NCBI and Georgia Tech place no restrictions on the use or distribution of the data contained in this collection. However, some of the original data may be subject to patent, copyright, or other intellectual property rights. Neither NCBI nor Georgia Tech are in a position to assess the validity of such claims and since there is no transfer or rights from submitters to NCBI, NCBI has no rights to transfer to a third party. For more information on NCBI's copyright disclaimer, please see: http://www.ncbi.nlm.nih.gov/About/disclaimer.html.

Recent Submissions

  • Low Richness -- Set 2 

    Konstantinidis, Kostas; Rodriguez, Luis M. (Georgia Institute of Technology, 2014-02-03)
    Motivation: Determining the fraction of the diversity within a microbial community sampled and the amount of sequencing required to cover the total diversity represent challenging issues for metagenomics studies. Due to ...
  • All Complete Bacterial and Archeal Genomes -- Set 6 

    Konstantinidis, Kostas; Rodriguez, Luis M. (Georgia Institute of Technology, 2014-02-03)
    Motivation: Determining the fraction of the diversity within a microbial community sampled and the amount of sequencing required to cover the total diversity represent challenging issues for metagenomics studies. Due to ...
  • All Complete Bacterial and Archeal Genomes -- Set 7 

    Konstantinidis, Kostas; Rodriguez, Luis M. (Georgia Institute of Technology, 2014-02-03)
    Motivation: Determining the fraction of the diversity within a microbial community sampled and the amount of sequencing required to cover the total diversity represent challenging issues for metagenomics studies. Due to ...
  • All Complete Bacterial and Archeal Genomes -- Set 3 

    Konstantinidis, Kostas; Rodriguez, Luis M. (Georgia Institute of Technology, 2014-02-03)
    Motivation: Determining the fraction of the diversity within a microbial community sampled and the amount of sequencing required to cover the total diversity represent challenging issues for metagenomics studies. Due to ...
  • Low Richness -- Set 1 

    Konstantinidis, Kostas; Rodriguez, Luis M. (Georgia Institute of Technology, 2014-02-03)
    Motivation: Determining the fraction of the diversity within a microbial community sampled and the amount of sequencing required to cover the total diversity represent challenging issues for metagenomics studies. Due to ...
  • Escherichia Genomes -- Set 2 

    Konstantinidis, Kostas; Rodriguez, Luis M. (Georgia Institute of Technology, 2014-02-03)
    Motivation: Determining the fraction of the diversity within a microbial community sampled and the amount of sequencing required to cover the total diversity represent challenging issues for metagenomics studies. Due to ...
  • All Complete Bacterial and Archeal Genomes -- Set 2 

    Konstantinidis, Kostas; Rodriguez, Luis M. (Georgia Institute of Technology, 2014-01-31)
    Motivation: Determining the fraction of the diversity within a microbial community sampled and the amount of sequencing required to cover the total diversity represent challenging issues for metagenomics studies. Due to ...
  • Escherichia Genomes -- Set 6 

    Konstantinidis, Kostas; Rodriguez, Luis M. (Georgia Institute of Technology, 2014-01-31)
    Motivation: Determining the fraction of the diversity within a microbial community sampled and the amount of sequencing required to cover the total diversity represent challenging issues for metagenomics studies. Due to ...
  • Escherichia Genomes -- Set 5 

    Konstantinidis, Kostas; Rodriguez, Luis M. (Georgia Institute of Technology, 2014-01-31)
    Motivation: Determining the fraction of the diversity within a microbial community sampled and the amount of sequencing required to cover the total diversity represent challenging issues for metagenomics studies. Due to ...
  • Escherichia Genomes -- Set 4 

    Konstantinidis, Kostas; Rodriguez, Luis M. (Georgia Institute of Technology, 2014-01-31)
    Motivation: Determining the fraction of the diversity within a microbial community sampled and the amount of sequencing required to cover the total diversity represent challenging issues for metagenomics studies. Due to ...
  • Escherichia Genomes -- Set 1 

    Konstantinidis, Kostas; Rodriguez, Luis M. (Georgia Institute of Technology, 2014-01-31)
    Motivation: Determining the fraction of the diversity within a microbial community sampled and the amount of sequencing required to cover the total diversity represent challenging issues for metagenomics studies. Due to ...
  • All Complete Bacterial and Archeal Genomes -- Set 5 

    Konstantinidis, Kostas; Rodriguez, Luis M. (Georgia Institute of Technology, 2014-01-31)
    Motivation: Determining the fraction of the diversity within a microbial community sampled and the amount of sequencing required to cover the total diversity represent challenging issues for metagenomics studies. Due to ...
  • All Complete Bacterial and Archeal Genomes -- Set 4 

    Konstantinidis, Kostas; Rodriguez, Luis M. (Georgia Institute of Technology, 2014-01-31)
    Motivation: Determining the fraction of the diversity within a microbial community sampled and the amount of sequencing required to cover the total diversity represent challenging issues for metagenomics studies. Due to ...
  • All Complete Bacterial and Archeal Genomes -- Set 1 

    Konstantinidis, Kostas; Rodriguez, Luis M. (Georgia Institute of Technology, 2014-01-31)
    Motivation: Determining the fraction of the diversity within a microbial community sampled and the amount of sequencing required to cover the total diversity represent challenging issues for metagenomics studies. Due to ...