Composite Objective Optimization and Learning for Massive Datasets
MetadataShow full item record
Composite objective optimization is concerned with the problem of minimizing a two-term objective function which consists of an empirical loss function and a regularization function. Application with massive datasets often employ a regularization term which is non-differentiable or structured, such as L1 or mixed-norm regularization. Such regularizers promote sparse solutions and special structure of the parameters of the problem, which is a desirable goal for datasets of extremely high-dimensions. In this talk, we discuss several recently developed methods for performing composite objective minimization in the online learning and stochastic optimization settings. We start with a description of extensions of the well-known forward-backward splitting method to stochastic objectives. We then generalize this paradigm to the family of mirrordescent algorithms. Our work builds on recent work which connects proximal minimization to online and stochastic optimization. We focus in the algorithmic part on a new approach, called AdaGrad, in which the proximal function is adapted throughout the course of the algorithm in a data-dependent manner. This temporal adaptation metaphorically allows us to find needles in haystacks as the algorithm is able to single out very predictive yet rarely observed features. We conclude with several experiments on large-scale datasets that demonstrate the merits of composite objective optimization and underscore superior performance of various instantiations of AdaGrad.