Statistical methods for feature extraction in shape analysis and bioinformatics
Le Faucheur, Xavier Jean Maurice
MetadataShow full item record
The presented research explores two different problems of statistical data analysis. In the first part of this thesis, a method for 3D shape representation, compression and smoothing is presented. First, a technique for encoding non-spherical surfaces using second generation wavelet decomposition is described. Second, a novel model is proposed for wavelet-based surface enhancement. This part of the work aims to develop an efficient algorithm for removing irrelevant and noise-like variations from 3D shapes. Surfaces are encoded using second generation wavelets, and the proposed methodology consists of separating noise-like wavelet coefficients from those contributing to the relevant part of the signal. The empirical-based Bayesian models developed in this thesis threshold wavelet coefficients in an adaptive and robust manner. Once thresholding is performed, irrelevant coefficients are removed and the inverse wavelet transform is applied to the clean set of wavelet coefficients. Experimental results show the efficiency of the proposed technique for surface smoothing and compression. The second part of this thesis proposes using a non-parametric clustering method for studying RNA (RiboNucleic Acid) conformations. The local conformation of RNA molecules is an important factor in determining their catalytic and binding properties. RNA conformations can be characterized by a finite set of parameters that define the local arrangement of the molecule in space. Their analysis is particularly difficult due to the large number of degrees of freedom, such as torsion angles and inter-atomic distances among interacting residues. In order to understand and analyze the structural variability of RNA molecules, this work proposes a methodology for detecting repetitive conformational sub-structures along RNA strands. Clusters of similar structures in the conformational space are obtained using a nearest-neighbor search method based on the statistical mechanical Potts model. The proposed technique is a mostly automatic clustering algorithm and may be applied to problems where there is no prior knowledge on the structure of the data space, in contrast to many other clustering techniques. First, results are reported for both single residue conformations- where the parameter set of the data space includes four to seven torsional angles-, and base pair geometries. For both types of data sets, a very good match is observed between the results of the proposed clustering method and other known classifications, with only few exceptions. Second, new results are reported for base stacking geometries. In this case, the proposed classification is validated with respect to specific geometrical constraints, while the content and geometry of the new clusters are fully analyzed.