Information, complexity and structure in convex optimization
Guzman Paredes, Cristobal
MetadataShow full item record
This thesis is focused on the limits of performance of large-scale convex optimization algorithms. Classical theory of oracle complexity, first proposed by Nemirovski and Yudin in 1983, successfully established the worst-case behavior of methods based on local oracles (a generalization of first-order oracle for smooth functions) for nonsmooth convex minimization, both in the large-scale and low-scale regimes; and the complexity of approximately solving linear systems of equations (equivalent to convex quadratic minimization) over Euclidean balls, under a matrix-vector multiplication oracle. Our work extends the applicability of lower bounds in two directions: Worst-Case Complexity of Large-Scale Smooth Convex Optimization: We generalize lower bounds on the complexity of first-order methods for convex optimization, considering classes of convex functions with Hölder continuous gradients. Our technique relies on the existence of a smoothing kernel, which defines a smooth approximation for any convex function via infimal convolution. As a consequence, we derive lower bounds for \ell_p/\ell_q-setups, where 1\leq p,q\leq \infty, and extend to its matrix analogue: Smooth convex minimization (with respect to the Schatten q-norm) over matrices with bounded Schatten p-norm. The major consequences of this result are the near-optimality of the Conditional Gradient method over box-type domains (p=q=\infty), and the near-optimality of Nesterov's accelerated method over the cross-polytope (p=q=1). Distributional Complexity of Nonsmooth Convex Optimization: In this work, we prove average-case lower bounds for the complexity of nonsmooth convex ptimization. We introduce an information-theoretic method to analyze the complexity of oracle-based algorithms solving a random instance, based on the reconstruction principle. Our technique shows that all known lower bounds for nonsmooth convex optimization can be derived by an emulation procedure from a common String-Guessing Problem, which is combinatorial in nature. The derived average-case lower bounds extend to hold with high probability, and for algorithms with bounded probability error, via Fano's inequality. Finally, from the proposed technique we establish the equivalence (up to constant factors) of distributional, randomized, and worst-case complexity for black-box convex optimization. In particular, there is no gain from randomization in this setup.