Extensions of principal components analysis
Brubaker, S. Charles
MetadataShow full item record
Principal Components Analysis is a standard tool in data analysis, widely used in data-rich fields such as computer vision, data mining, bioinformatics, and econometrics. For a set of vectors in n dimensions and a natural number k less than n, the method returns a subspace of dimension k whose average squared distance to that set is as small as possible. Besides saving computation by reducing the dimension, projecting to this subspace can often reveal structure that was hidden in high dimension. This thesis considers several novel extensions of PCA, which provably reveals hidden structure where standard PCA fails to do so. First, we consider Robust PCA, which prevents a few points, possibly corrupted by an adversary, from having a large effect on the analysis. When applied to learning noisy logconcave mixture models, the algorithm requires only slightly more separation between component means than is required for the noiseless case. Second, we consider Isotropic PCA, which can go beyond the first two moments in identifying ``interesting' directions in data. The method leads to the first affine-invariant algorithm that can provably learn mixtures of Gaussians in high dimensions, improving significantly on known results. Thirdly, we define the ``Subgraph Parity Tensor' of order r of a graph and reduce the problem of finding planted cliques in random graphs to the problem of finding the top principal component of this tensor.