Show simple item record

dc.contributor.authorPiriyapongsa, Jittimaen_US
dc.contributor.authorRutledge, Mark T.en_US
dc.contributor.authorPatel, Sanilen_US
dc.contributor.authorBorodovsky, Marken_US
dc.contributor.authorJordan, I. Kingen_US
dc.date.accessioned2011-12-22T19:59:36Z
dc.date.available2011-12-22T19:59:36Z
dc.date.issued2007-11-26
dc.identifier.citationPiyapongsa, J., Rutledge, M.T., Patel, S., Borodovsky, M. and I.K. Jordan, 2007. Evaluating the protein coding potential of exonized transposable element sequences. Biol. Direct 2: 31en_US
dc.identifier.issn1745-6150
dc.identifier.urihttp://hdl.handle.net/1853/42111
dc.description© 2007 Piriyapongsa et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.en_US
dc.descriptionDOI: 10.1186/1745-6150-2-31en_US
dc.description.abstractBackground: Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons. Results: We compared the ability of three classes of sequence similarity search methods to detect TE-derived sequences among data sets of experimentally characterized proteins: 1-a profile-based hidden Markov model (HMM) approach, 2-BLAST methods and 3-RepeatMasker. Profile based methods are more sensitive and more selective than the other methods evaluated. However, the application of profile-based search methods to the detection of TE-derived sequences among well-curated experimentally characterized protein data sets did not turn up many more cases than had been previously detected and nowhere near as many cases as recent genome-wide searches have. We observed that the different search methods used were complementary in the sense that they yielded largely non-overlapping sets of hits and differed in their ability to recover known cases of TE-derived CDS. The probabilistic analysis of TE-derived exon sequences indicates that these sequences have low protein coding potential on average. In particular, non-autonomous TEs that do not encode protein sequences, such as Alu elements, are frequently exonized but unlikely to encode protein sequences. Conclusion: The exaptation of the numerous TE sequences found in exons as bona fide protein coding sequences may prove to be far less common than has been suggested by the analysis of complete genomes. We hypothesize that many exonized TE sequences actually function as post-transcriptional regulators of gene expression, rather than coding sequences, which may act through a variety of double stranded RNA related regulatory pathways. Indeed, their relatively high copy numbers and similarity to sequences dispersed throughout the genome suggests that exonized TE sequences could serve as master regulators with a wide scope of regulatory influence.en_US
dc.language.isoen_USen_US
dc.publisherGeorgia Institute of Technologyen_US
dc.subjectTransposable elementsen_US
dc.subjectPost-transcriptional regulatorsen_US
dc.subjectGene expressionen_US
dc.subjectTEsen_US
dc.subjectGene evolutionen_US
dc.titleEvaluating the protein coding potential of exonized transposable element sequencesen_US
dc.typeArticleen_US
dc.contributor.corporatenameGeorgia Institute of Technology. School of Biologyen_US
dc.contributor.corporatenameGeorgia Institute of Technology. Dept. of Biomedical Engineeringen_US
dc.contributor.corporatenameEmory University. Dept. of Biomedical Engineeringen_US
dc.contributor.corporatenameGeorgia Institute of Technology. Division of Computational Science and Engineeringen_US
dc.contributor.corporatenameGeorgia Institute of Technology. College of Computingen_US
dc.publisher.originalBioMed Centralen_US
dc.identifier.doi10.1186/1745-6150-2-31


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record