Robust Generative Subspace Modeling: The Subspace t Distribution
Abstract
Linear latent variable models such as statistical factor analysis (SFA) and probabilistic principal component analysis (PPCA) assume that the data are distributed according to a multivariate Gaussian. A drawback of this assumption is that parameter learning in these models is sensitive to outliers in the training data. Approaches that rely on M-estimation have been introduced to render principal component analysis (PCA) more robust to outliers. M-estimation approaches assume the data are distributed according to a density with heavier tails than a Gaussian. Yet, these methods are limited in that they fail to define a probability model for the data. Data cannot be generated from these models, and the normalized probability of new data cannot evaluated. To address these limitations, we describe a generative probability model that accounts for outliers. The model is a linear latent variable model in which the marginal density over the data is a multivariate t, a distribution with heavier tails than a Gaussian. We present a computationally efficient expectation maximization (EM) algorithm for estimating the model parameters, and compare our approach with that of PPCA on both synthetic and real data sets.