• Login
    View Item 
    •   SMARTech Home
    • College of Sciences (CoS)
    • School of Biology
    • School of Biology Faculty Publications
    • View Item
    •   SMARTech Home
    • College of Sciences (CoS)
    • School of Biology
    • School of Biology Faculty Publications
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training

    Thumbnail
    View/Open
    1979.pdf (467.8Kb)
    Date
    2008-12
    Author
    Ter-Hovhannisyan,Vardges
    Lomsadze, Alexandre
    Chernoff, Yury O.
    Borodovsky, Mark
    Metadata
    Show full item record
    Abstract
    We describe a new ab initio algorithm, GeneMark-ES version 2, that identifies protein-coding genes in fungal genomes. The algorithm does not require a predetermined training set to estimate parameters of the underlying hidden Markov model (HMM). Instead, the anonymous genomic sequence in question is used as an input for iterative unsupervised training. The algorithm extends our previously developed method tested on genomes of Arabidopsis thaliana, Caenorhabditis elegans, and Drosophila melanogaster. To better reflect features of fungal gene organization, we enhanced the intron submodel to accommodate sequences with and without branch point sites. This design enables the algorithm to work equally well for species with the kinds of variations in splicing mechanisms seen in the fungal phyla Ascomycota, Basidiomycota, and Zygomycota. Upon self-training, the intron submodel switches on in several steps to reach its full complexity. We demonstrate that the algorithm accuracy, both at the exon and the whole gene level, is favorably compared to the accuracy of gene finders that employ supervised training. Application of the new method to known fungal genomes indicates substantial improvement over existing annotations. By eliminating the effort necessary to build comprehensive training sets, the new algorithm can streamline and accelerate the process of annotation in a large number of fungal genome sequencing projects
    URI
    http://hdl.handle.net/1853/49178
    Collections
    • School of Biology Faculty Publications [227]

    Browse

    All of SMARTechCommunities & CollectionsDatesAuthorsTitlesSubjectsTypesThis CollectionDatesAuthorsTitlesSubjectsTypes

    My SMARTech

    Login

    Statistics

    View Usage StatisticsView Google Analytics Statistics
    • About
    • Terms of Use
    • Contact Us
    • Emergency Information
    • Legal & Privacy Information
    • Accessibility
    • Accountability
    • Accreditation
    • Employment
    • Login
    Georgia Tech

    © Georgia Institute of Technology

    • About
    • Terms of Use
    • Contact Us
    • Emergency Information
    • Legal & Privacy Information
    • Accessibility
    • Accountability
    • Accreditation
    • Employment
    • Login
    Georgia Tech

    © Georgia Institute of Technology