Frank-Wolfe Methods for Optimization and Machine Learning
Combettes, Cyrille W.
MetadataShow full item record
In Chapter 2, we present the Frank-Wolfe algorithm (FW) and all necessary background material. We explain the projection-free and sparsity properties of the algorithm, provide motivation for real-world problems, and analyze the convergence rates and a lower bound on the complexity. In Chapter 3, we review the complexity bounds of linear minimizations and projections on several sets commonly used in optimization, providing a rigorous support to the use of FW. We also propose two methods for projecting onto the lp-ball and the Birkhoff polytope respectively, and we analyze their complexity. Computational experiments for the l1-ball and the nuclear norm-ball are presented. In Chapter 4, we identify the well-known drawback in FW, a naive zig-zagging phenomenon that slows down the algorithm. In response to this issue, we propose a boosting procedure generating descent directions better aligned with the negative gradients and preserving the projection-free property. Although the method is relatively simple and intuitive, it provides significant computational speedups over the state of the art on a variety of experiments. In Chapter 5, we address the large-scale finite-sum optimization setting arising in many tasks of machine learning. Based on a sliding technique, we propose a generic template to integrate adaptive gradients into stochastic Frank-Wolfe algorithms in a practical way. Computational experiments on standard convex optimization problems and on the nonconvex training of neural networks demonstrate that the blend of the two methods is successful. Both developments in Chapters 4 and 5 are motivated by the projection-free property of FW. In Chapter 6, we leverage the natural sparsity of the iterates generated by FW and study an application to the approximate Carathéodory problem. We show that FW generates a simple solution to the problem and that with no modification of the algorithm, better cardinality bounds can be established using existing convergence analysis of FW in different scenarios. We also consider a nonsmooth variant of FW. In Chapter 7, we carry on with the sparsity property and we consider an extension of the Frank-Wolfe algorithm to the unconstrained setting. It addresses smooth convex optimization problems over the linear span of a given set and resembles the matching pursuit algorithm. We propose a blending method that combines fast convergence and high sparsity of the iterates. Computational experiments validate the purpose of our method.