Bayesian models and algoritms for protein secondary structure and beta-sheet prediction
MetadataShow full item record
In this thesis, we developed Bayesian models and machine learning algorithms for protein secondary structure and beta-sheet prediction problems. In protein secondary structure prediction, we developed hidden semi-Markov models, N-best algorithms and training set reduction procedures for proteins in the single-sequence category. We introduced three residue dependency models (both probabilistic and heuristic) incorporating the statistically significant amino acid correlation patterns at structural segment borders. We allowed dependencies to positions outside the segments to relax the condition of segment independence. Another novelty of the models is the dependency to downstream positions, which is important due to asymmetric correlation patterns observed uniformly in structural segments. Among the dataset reduction methods, we showed that the composition based reduction generated the most accurate results. To incorporate non-local interactions characteristic of beta-sheets, we developed two N-best algorithms and a Bayesian beta-sheet model. In beta-sheet prediction, we developed a Bayesian model to characterize the conformational organization of beta-sheets and efficient algorithms to compute the optimum architecture, which includes beta-strand pairings, interaction types (parallel or anti-parallel) and residue-residue interactions (contact maps). We introduced a Bayesian model for proteins with six or less beta-strands, in which we model the conformational features in a probabilistic framework by combining the amino acid pairing potentials with a priori knowledge of beta-strand organizations. To select the optimum beta-sheet architecture, we analyzed the space of possible conformations by efficient heuristics, in which we significantly reduce the search space by enforcing the amino acid pairs that have strong interaction potentials. For proteins with more than six beta-strands, we first computed beta-strand pairings using the BetaPro method. Then, we computed gapped alignments of the paired beta-strands in parallel and anti-parallel directions and chose the interaction types and beta-residue pairings with maximum alignment scores. Accurate prediction of secondary structure, beta-sheets and non-local contacts should improve the accuracy and quality of the three-dimensional structure prediction.