Algorithms and analysis for non-convex optimization problems in machine learning
MetadataShow full item record
In this thesis, we propose efficient algorithms and provide theoretical analysis through the angle of spectral methods for some important non-convex optimization problems in machine learning. Specifically, we focus on two types of non-convex optimization problems: learning the parameters of latent variable models and learning in deep neural networks. Learning latent variable models is traditionally framed as a non-convex optimization problem through Maximum Likelihood Estimation (MLE). For some specific models such as multi-view model, we can bypass the non-convexity by leveraging the special model structure and convert the problem into spectral decomposition through Methods of Moments (MM) estimator. In this thesis, we propose a novel algorithm that can flexibly learn a multi-view model in a non-parametric fashion. To scale the nonparametric spectral methods to large datasets, we propose an algorithm called doubly stochastic gradient descent which uses sampling to approximate two expectations in the problem, and it achieves better balance of computation and statistics by adaptively growing the model as more data arrive. Learning with neural networks is a difficult non-convex problem while simple gradient-based methods achieve great success in practice. In this part of the thesis, we try to understand the optimization landscape of learning one-hidden-layer networks with Rectified Linear (ReLU) activation functions. By directly analyzing the structure of the gradient, we can show neural networks with diverse weights have no spurious local optima. This partly explains the empirical success of gradient descent since a stationary point leads to a global optimum under diversity conditions on the neural weights.