• Login
    View Item 
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    New AB initio methods of small genome sequence interpretation

    Thumbnail
    View/Open
    mills_ryan_e_200605_phd.pdf (3.473Mb)
    Date
    2006-04-07
    Author
    Mills, Ryan Edward
    Metadata
    Show full item record
    Abstract
    This thesis presents novel methods for analysis of short viral sequences and identifying biologically significant regions based on their statistical properties. The first section of this thesis describes the ab initio method for identifying genes in viral genomes of varying type, shape and size. This method uses statistical models of the viral protein-coding and non-coding regions. We have created an interactive database summarizing the results of the application of this method to viral genomes currently available in GenBank. This database, called VIOLIN, provides an access to the genes identified for each viral genome, allows for further analysis of these gene sequences and the translated proteins, and displays graphically the distribution of protein-coding potential in a viral genome. The next two sections of this thesis describe individual projects for two specific viral genomes analyzed with the new method. The first project was devoted to the recently sequenced Herpes B virus from Rhesus macaque. This genome was initially thought to lack an ortholog of the gamma-34.5 gene encoding for a neurovirulence factor necessary for viability of the two close relatives, human herpes simplex viruses 1 and 2. The genome of Rhesus macaque Herpes B virus was annotated using the new gene finding procedure and an in-depth analysis was conducted to find a gamma-34.5 ortholog using a variety of tools for a similarity search. A profound similarity in codon usage between B virus and its host was also identified, despite the large difference in their GC contents (74% and 51%, respectively). The last thesis section describes the analysis of the Mouse Cytomegalovirus (MCMV) genome by the combination of methods such as sequence segmentation, gene finding and protein identification by mass spectrometry. The MCMV genome is a challenging subject for statistical sequence analysis due to the heterogeneity of its protein coding regions. Therefore the MCMV genome was segmented based on its nucleotide composition and then each segment was considered independently. A thorough analysis was conducted to identify previously unnoticed genes, incorrectly annotated genes and potential sequence errors causing frameshifts. All the findings were then corroborated by the mass spectrometry analysis.
    URI
    http://hdl.handle.net/1853/10515
    Collections
    • Department of Biomedical Engineering Theses and Dissertations [550]
    • Georgia Tech Theses and Dissertations [23403]

    Browse

    All of SMARTechCommunities & CollectionsDatesAuthorsTitlesSubjectsTypesThis CollectionDatesAuthorsTitlesSubjectsTypes

    My SMARTech

    Login

    Statistics

    View Usage StatisticsView Google Analytics Statistics
    facebook instagram twitter youtube
    • My Account
    • Contact us
    • Directory
    • Campus Map
    • Support/Give
    • Library Accessibility
      • About SMARTech
      • SMARTech Terms of Use
    Georgia Tech Library266 4th Street NW, Atlanta, GA 30332
    404.894.4500
    • Emergency Information
    • Legal and Privacy Information
    • Human Trafficking Notice
    • Accessibility
    • Accountability
    • Accreditation
    • Employment
    © 2020 Georgia Institute of Technology