Convex and structured nonconvex optimization for modern machine learning: Complexity and algorithms
Abstract
In this thesis, we investigate various optimization problems motivated by applications in modernday machine learning. In the first part, we look at the computational complexity of training ReLU neural networks. We consider the following problem: given a fullyconnected two hidden layer ReLU neural network with two ReLU nodes in the first layer and one ReLU node in the second layer, does there exists weights of the edges such that neural network fits the given data? We show that the problem is NPhard to answer. The main contribution is the design of the gadget which allows for reducing the Separation by Two Hyperplane problem into ReLU neural network training problem. In the second part of the thesis, we look at the design and complexity analysis of algorithms for function constrained optimization problem in both convex and nonconvex settings. These problems are becoming more and more popular in machine learning due to their applications in multiobjective optimization, riskaverse learning among others. For the convex function constrained optimization problem, we propose a novel Constraint Extrapolation (ConEx) method, which uses linear approximations of the constraint functions to define the extrapolation (or acceleration) step. We show that this method is a unified algorithm that achieves the bestknown rate of convergence for solving different function constrained convex composite problems, including convex or strongly convex, and smooth or nonsmooth problems with a stochastic objective and/or stochastic constraints. Many of these convergence rates were obtained for the first time in the literature. Besides, ConEx is a singleloop algorithm that does not involve any penalty subproblems. Contrary to existing dual methods, it does not require the projection of Lagrangian multipliers onto a (possibly unknown) bounded set. Moreover, in the stochastic function constraint setting, this is the first method that requires only bounded variance of the noise; a major relaxation over the restrictive assumption of subgaussian noise in the existing algorithms. In the third part of this thesis, we investigate a nonconvex nonsmooth function constrained optimization problem, where we introduce a new proximal point method which transforms the initial nonconvex problem into a sequence of convex function constrained subproblems. For this algorithm, we establish the asymptotic convergence as well as the rate of convergence to KKT points under different constraint qualifications. For practical use, we present inexact variants of this algorithm, in which approximate solutions of the subproblems are computed using the aforementioned ConEx method and establish their associated rate of convergence under a strong feasibility constraint qualification. In the fourth part, we identify an important class of nonconvex function constrained problem for statistical machine learning applications where sparsity is imperative. We consider various nonconvex sparsityinducing constraints. These are tighter approximations of $\ell_0$norm compared to $\ell_1$norm convex relaxation. For this class of problems, we relax the requirement of strong feasibility constraint qualification to a weaker and a wellknown constraint qualification and still prove convergence to KKT points at the rate of gradient descent for nonconvex regularized problems. This work performs a systematic study of the structure of nonconvex sparsity inducing constraints to obtain bounds over Lagrange multipliers and solve certain subproblems faster to achieve con vergence rate that matches the rates of nonconvex regularized version under a relaxed constraint qualification which is satisfied by almost all the time. In the fifth part, we present a faster algorithm for solving mixed packing and covering (MPC) linear programs. The proposed algorithm is from a family of primaldual type algorithm, similar to ConEx. Here, the main challenge comes from the feasible set of the primal variables which is $\ell_\infty$ ball for a general MPC. The diameter of the ball is at least $\Omega(\sqrt{n})$, where $n$ is the dimension of LP. This huge diameter also costs in the complexity. We give specialized treatment to this problem and use a new regularization function which is weaker than the strongly convex function and still obtains accelerated convergence rate. Using this regularizer, we replace the $poly(n)$ term in the complexity to a logarithmic term.
Collections
Related items
Showing items related by title, author, creator and subject.

Quadratically convergent algorithm for orbital optimization in the orbitaloptimized coupledcluster doubles method and in orbitaloptimized secondorder MøllerPlesset perturbation theory
Bozkaya, Uğur; Turney, Justin M.; Yamaguchi, Yukio; Schaefer, Henry F., III; Sherrill, C. David (Georgia Institute of TechnologyAmerican Institute of Physics, 201109)Using a Lagrangianbased approach, we present a more elegant derivation of the equations necessary for the variational optimization of the molecular orbitals (MOs) for the coupledcluster doubles (CCD) method and secondorder ... 
Global optimization methods for optimal power flow and transmission switching problems in electric power systems
Kocuk, Burak (Georgia Institute of Technology, 20160719)Power engineering is concerned with the generation, transmission, and distribution of electricity over electric power network, which is arguably one of the largest engineering systems in the world. The size of electric ... 
Efficient computation in stochastic optimization and simulation: Dynamic programs, risk quantification, and risk optimization
Zhu, Helin (Georgia Institute of Technology, 20160726)Stochastic optimization and simulation are two of the most fundamental research areas in Operations Research. In this thesis, we develop efficient computational approaches for three important topics in the realm of stochastic ...