RNA secondary sturcture prediction using a combined method of thermodynamics and kinetics
MetadataShow full item record
Nowadays, RNA is extensively acknowledged an important role in the functions of information transfer, structural components, gene regulation and etc. The secondary structure of RNA becomes a key to understand structure-function relationship. Computational prediction of RNA secondary structure does not only provide possible structures, but also elucidates the mechanism of RNA folding. Conventional prediction programs are either derived from evolutionary perspective, or aimed to achieve minimum free energy. In vivo, RNA folds during transcription, which indicates that native RNA structure is a result from both thermodynamics and kinetics. In this thesis, I first reviewed the current leading kinetic folding programs and demonstrate that these programs are not able to predict secondary structure accurately. Upon that, I proposed a new sequential folding program called GTkinetics. Given an RNA sequence, GTkinetics predicts a secondary structure and a series of RNA folding trajectories. It treats the RNA as a growing chain, and adds stable local structures sequentially. It is featured with a Z-score to evaluate stability of local structures, which is able to locate native local structures with high confidence. Since all stable local structures are captured in GTkinetics, it results in some false positives, which prevents the native structure to form as the chain grows. This suggests a refolding model to melt the false positive hairpins, probable intermediate structures, and to fold the RNA into a new structure with reliable long-range helices. By analyzing suboptimal ensemble along the folding pathway, I suggested a refolding mechanism, with which refolding can be evaluated whether or not to take place. Another way to favor local structures over long-distance structures, we introduced a distance penalty function into the free energy calculation. I used a sigmoidal function to compute the energy penalty according to the distance in the primary sequence between two nucleotides of a base pair. For both the training dataset and the test dataset, the distance function improves the prediction to some extent. In order to characterize the differences between local and long-range helices, I carried out analysis of standardized local nucleotide composition and base pair composition according to the two groups. The results show that adenine accumulates on the 5' side of local structure, but not on that of long-range helices. GU base pairs occur significantly more frequent in the local helices than that in the long-range helices. These indicate that the mechanisms to form local and long range helices are different, which is encoded in the sequence itself. Based on all the results, I will draw conclusions and suggest future directions to enhance the current sequential folding program.