School of Computational Science and Engineering Theses and Dissertationshttp://hdl.handle.net/1853/459802020-09-20T10:33:59Z2020-09-20T10:33:59ZDeep representation learning on hypersphereLiu, Weiyanghttp://hdl.handle.net/1853/636702020-09-16T03:07:06Z2020-07-27T00:00:00ZDeep representation learning on hypersphere
Liu, Weiyang
How to efficiently learn discriminative deep features is arguably one of the core problems in deep learning, since it can benefit a lot of downstream tasks such as visual recognition, object detection, semantic segmentation, etc. In this dissertation, we present a unified deep representation learning framework on hypersphere, which inherently introduces a novel hyperspherical inductive bias into deep neural networks. We show that our framework is well motivated from both empirical observations and theories. We discuss our framework from four distinct perspectives: (1) learning objectives on hypersphere; (2) neural architectures on hypersphere; (3) regularizations on hypersphere; (4) hyperspherical training paradigm. From the first three perspectives, we explain how we can utilize the idea of hyperspherical learning to revisit and reinvent corresponding components in deep learning. From the last perspective, we propose a general neural training framework that is heavily inspired by hyperspherical learning. We conduct comprehensive experiments on many applications to demonstrate that our deep hyperspherical learning framework yields better generalization, faster convergence and stronger adversarial robustness compared to the standard deep learning framework.
2020-07-27T00:00:00ZLiu, WeiyangHow to efficiently learn discriminative deep features is arguably one of the core problems in deep learning, since it can benefit a lot of downstream tasks such as visual recognition, object detection, semantic segmentation, etc. In this dissertation, we present a unified deep representation learning framework on hypersphere, which inherently introduces a novel hyperspherical inductive bias into deep neural networks. We show that our framework is well motivated from both empirical observations and theories. We discuss our framework from four distinct perspectives: (1) learning objectives on hypersphere; (2) neural architectures on hypersphere; (3) regularizations on hypersphere; (4) hyperspherical training paradigm. From the first three perspectives, we explain how we can utilize the idea of hyperspherical learning to revisit and reinvent corresponding components in deep learning. From the last perspective, we propose a general neural training framework that is heavily inspired by hyperspherical learning. We conduct comprehensive experiments on many applications to demonstrate that our deep hyperspherical learning framework yields better generalization, faster convergence and stronger adversarial robustness compared to the standard deep learning framework.Asynchronous Versions of Jacobi, Multigrid, and Chebyshev SolversWolfson-Pou, Jordihttp://hdl.handle.net/1853/636242020-09-09T07:24:18Z2020-07-06T00:00:00ZAsynchronous Versions of Jacobi, Multigrid, and Chebyshev Solvers
Wolfson-Pou, Jordi
Iterative methods are commonly used for solving large, sparse systems of linear equations on parallel computers. Implementations of parallel iterative solvers contain kernels (e.g., parallel sparse matrix-vector products) in which parallel processes alternate between phases of computation and communication. Standard software packages use synchronous implementations where there are one or more synchronization points per iteration. These synchronization points occur during communication phases where each process sends data to other processes and idles until all data needed for the next iteration is received. Synchronization points scale poorly on massively parallel machines and may become the primary bottleneck for future exascale computers. This calls for research and development of asynchronous iterative methods, which is the subject of this dissertation.
In asynchronous iterative methods there are no synchronization points. This means that, after a phase of computation, processes immediately proceed to the next phase of computation using whatever data is currently available. Since the late 1960s, research on asynchronous methods has primarily considered basic fixed-point methods, e.g., Jacobi, where proving asymptotic convergence bounds has been the focus. However, the practical behavior of asynchronous methods is not well understood, and asynchronous versions of certain fast-converging solvers have not been developed. This dissertation focuses on studying the practical behavior of asynchronous Jacobi, developing new communication-avoiding asynchronous iterative solvers, and introducing the first asynchronous versions of multigrid and Chebyshev.
To better understand the practical behavior of asynchronous Jacobi, we examine a model of asynchronous Jacobi where communication delays are neglected. We call this model simplified asynchronous Jacobi. Simplified asynchronous Jacobi can be used to model asynchronous Jacobi implemented in shared memory or distributed memory with fast communication networks. We analyze simplified asynchronous Jacobi for linear systems where the coefficient matrix is symmetric positive-definite and compare our analysis to experimental results from shared and distributed memory implementations. We present three important results for asynchronous Jacobi: it can converge when synchronous Jacobi does not, it can reduce the residual norm when some processes are delayed, and its convergence rate can increase with increasing parallelism.
We develop new asynchronous communication-avoiding methods using the idea of the sequential Southwell method. In the sequential Southwell method, which converges faster than Gauss-Seidel, the component of the residual with the largest residual in absolute value is relaxed during each iteration. We use the idea of choosing large residual values to create communication-avoiding parallel methods, where residual values of communication neighbors are compared rather than computing a global maximum. We present three methods: the Parallel Southwell, Distributed Southwell, and Stochastic Parallel Southwell methods. All our methods converge faster than Jacobi and use less communication.
We introduce the first asynchronous multigrid methods. We use the idea of additive multigrid where smoothing on all grids is carried out concurrently. We present models of asynchronous additive multigrid and use these models to study the convergence properties of asynchronous multigrid. We also introduce algorithms for implementing asynchronous multigrid in shared and distributed memory. Our experimental results show that asynchronous multigrid can exhibit grid-size independent convergence and can be faster than classical multigrid in terms of wall-clock time.
Lastly, we present the first asynchronous Chebyshev methods. We present models of Jacobi-preconditioned asynchronous Chebyshev. We use a little-known version of the BPX multigrid preconditioner where BPX is written as Jacobi on an extended system, which makes BPX convenient for asynchronous execution within Chebsyhev. Our experimental results show that asynchronous Chebyshev is faster than its synchronous counterpart in terms of both wall-clock time and number of iterations.
2020-07-06T00:00:00ZWolfson-Pou, JordiIterative methods are commonly used for solving large, sparse systems of linear equations on parallel computers. Implementations of parallel iterative solvers contain kernels (e.g., parallel sparse matrix-vector products) in which parallel processes alternate between phases of computation and communication. Standard software packages use synchronous implementations where there are one or more synchronization points per iteration. These synchronization points occur during communication phases where each process sends data to other processes and idles until all data needed for the next iteration is received. Synchronization points scale poorly on massively parallel machines and may become the primary bottleneck for future exascale computers. This calls for research and development of asynchronous iterative methods, which is the subject of this dissertation.
In asynchronous iterative methods there are no synchronization points. This means that, after a phase of computation, processes immediately proceed to the next phase of computation using whatever data is currently available. Since the late 1960s, research on asynchronous methods has primarily considered basic fixed-point methods, e.g., Jacobi, where proving asymptotic convergence bounds has been the focus. However, the practical behavior of asynchronous methods is not well understood, and asynchronous versions of certain fast-converging solvers have not been developed. This dissertation focuses on studying the practical behavior of asynchronous Jacobi, developing new communication-avoiding asynchronous iterative solvers, and introducing the first asynchronous versions of multigrid and Chebyshev.
To better understand the practical behavior of asynchronous Jacobi, we examine a model of asynchronous Jacobi where communication delays are neglected. We call this model simplified asynchronous Jacobi. Simplified asynchronous Jacobi can be used to model asynchronous Jacobi implemented in shared memory or distributed memory with fast communication networks. We analyze simplified asynchronous Jacobi for linear systems where the coefficient matrix is symmetric positive-definite and compare our analysis to experimental results from shared and distributed memory implementations. We present three important results for asynchronous Jacobi: it can converge when synchronous Jacobi does not, it can reduce the residual norm when some processes are delayed, and its convergence rate can increase with increasing parallelism.
We develop new asynchronous communication-avoiding methods using the idea of the sequential Southwell method. In the sequential Southwell method, which converges faster than Gauss-Seidel, the component of the residual with the largest residual in absolute value is relaxed during each iteration. We use the idea of choosing large residual values to create communication-avoiding parallel methods, where residual values of communication neighbors are compared rather than computing a global maximum. We present three methods: the Parallel Southwell, Distributed Southwell, and Stochastic Parallel Southwell methods. All our methods converge faster than Jacobi and use less communication.
We introduce the first asynchronous multigrid methods. We use the idea of additive multigrid where smoothing on all grids is carried out concurrently. We present models of asynchronous additive multigrid and use these models to study the convergence properties of asynchronous multigrid. We also introduce algorithms for implementing asynchronous multigrid in shared and distributed memory. Our experimental results show that asynchronous multigrid can exhibit grid-size independent convergence and can be faster than classical multigrid in terms of wall-clock time.
Lastly, we present the first asynchronous Chebyshev methods. We present models of Jacobi-preconditioned asynchronous Chebyshev. We use a little-known version of the BPX multigrid preconditioner where BPX is written as Jacobi on an extended system, which makes BPX convenient for asynchronous execution within Chebsyhev. Our experimental results show that asynchronous Chebyshev is faster than its synchronous counterpart in terms of both wall-clock time and number of iterations.LEARNING DYNAMIC PROCESSES OVER GRAPHSTrivedi, Rakshithttp://hdl.handle.net/1853/636192020-09-09T07:24:47Z2020-07-09T00:00:00ZLEARNING DYNAMIC PROCESSES OVER GRAPHS
Trivedi, Rakshit
Graphs appear as a versatile representation of information across domains spanning social networks, biological networks, transportation networks, molecular structures, knowledge networks, web information network and many more. Graphs represent heterogeneous information about the real-world entities and complex relationships between them in a very succinct manner. At the same time, graphs exhibit combinatorial, discrete and non- Euclidean properties in addition to being inherently sparse and incomplete which poses several challenges to techniques that analyze and study these graph structures.
There exist various approaches across different fields spanning network science, game theory, stochastic process and others that provide excellent theoretical and analytical tools with interpretability benefits to analyze these networks. However, such approaches do not learn from data and make assumptions about real-world that capture only subset of properties. More importantly, they do not support predictive capabilities critical for decision making applications. In this thesis, we develop novel data driven learning approaches that incorporate useful inductive biases inspired from these classical approaches. The resulting learning approaches exhibit more general properties that go beyond conventional probabilistic assumptions and allow for building transferable and interpretable modules. We build these approaches anchored around two fundamental questions: (i) (Formation Pro- cess) How do these networks come into existence? and (ii) (Temporal Evolution Process) How do real-world networks evolve over time?
First, we focus on the challenge of learning in a setting with highly sparse and in- complete knowledge graphs, where it is important to leverage multiple input graphs to sup- port accurate performance for variety of downstream applications such as recommendation, search and question-answering systems. Specifically, we develop a large-scale multi-graph deep relational learning framework that identifies entity linkage as a vital component of data fusion and learns to jointly perform representation learning and graph linkage across multiple graphs with applications to relational reasoning and knowledge construction.
Next, we consider networks that evolve over time and propose a generative model of dynamic graphs that is useful to encode evolving network information into low-dimensional representations that facilitate accurate downstream event prediction tasks. Our approach relies on the coevolution principle of network structure evolution and network activities being tightly couple processes and develops a multi time scale temporal point process formulation parameterized by a recurrent architecture comprising of a novel Temporal Attention mechanism. Representation learning is posed as a latent mediation process – observed network processes evolve the state of nodes, while this node evolution governs future dynamics of observed processes and applied to downstream dynamic link prediction tasks and time prediction of future realizations (events) of both observed processes.
Finally, we investigate the implication of adopting the optimization perspective of net- work formation mechanisms for building learning approaches for graph structured data. In this work, we first focus on global mechanisms that govern the formation of links in the network and build an inverse reinforcement learning based algorithm to jointly discover latent mechanisms directly from observed data, optimization of which enables a graph construction procedure capable of producing graphs with properties similar to observed data. Such an approach facilitates transfer and generalization properties and has been applied to variety of real-world graphs. In the last part, we consider the settings where the agents forming links are strategic and build a learnable model of network emergence games that jointly discovers the underlying payoff mechanisms and strategic profiles of agents from the data. This approach enables learning interpretable and transferable payoffs while the learned game as a model facilitates strategic prediction tasks, both of which are applied to several real world networks.
2020-07-09T00:00:00ZTrivedi, RakshitGraphs appear as a versatile representation of information across domains spanning social networks, biological networks, transportation networks, molecular structures, knowledge networks, web information network and many more. Graphs represent heterogeneous information about the real-world entities and complex relationships between them in a very succinct manner. At the same time, graphs exhibit combinatorial, discrete and non- Euclidean properties in addition to being inherently sparse and incomplete which poses several challenges to techniques that analyze and study these graph structures.
There exist various approaches across different fields spanning network science, game theory, stochastic process and others that provide excellent theoretical and analytical tools with interpretability benefits to analyze these networks. However, such approaches do not learn from data and make assumptions about real-world that capture only subset of properties. More importantly, they do not support predictive capabilities critical for decision making applications. In this thesis, we develop novel data driven learning approaches that incorporate useful inductive biases inspired from these classical approaches. The resulting learning approaches exhibit more general properties that go beyond conventional probabilistic assumptions and allow for building transferable and interpretable modules. We build these approaches anchored around two fundamental questions: (i) (Formation Pro- cess) How do these networks come into existence? and (ii) (Temporal Evolution Process) How do real-world networks evolve over time?
First, we focus on the challenge of learning in a setting with highly sparse and in- complete knowledge graphs, where it is important to leverage multiple input graphs to sup- port accurate performance for variety of downstream applications such as recommendation, search and question-answering systems. Specifically, we develop a large-scale multi-graph deep relational learning framework that identifies entity linkage as a vital component of data fusion and learns to jointly perform representation learning and graph linkage across multiple graphs with applications to relational reasoning and knowledge construction.
Next, we consider networks that evolve over time and propose a generative model of dynamic graphs that is useful to encode evolving network information into low-dimensional representations that facilitate accurate downstream event prediction tasks. Our approach relies on the coevolution principle of network structure evolution and network activities being tightly couple processes and develops a multi time scale temporal point process formulation parameterized by a recurrent architecture comprising of a novel Temporal Attention mechanism. Representation learning is posed as a latent mediation process – observed network processes evolve the state of nodes, while this node evolution governs future dynamics of observed processes and applied to downstream dynamic link prediction tasks and time prediction of future realizations (events) of both observed processes.
Finally, we investigate the implication of adopting the optimization perspective of net- work formation mechanisms for building learning approaches for graph structured data. In this work, we first focus on global mechanisms that govern the formation of links in the network and build an inverse reinforcement learning based algorithm to jointly discover latent mechanisms directly from observed data, optimization of which enables a graph construction procedure capable of producing graphs with properties similar to observed data. Such an approach facilitates transfer and generalization properties and has been applied to variety of real-world graphs. In the last part, we consider the settings where the agents forming links are strategic and build a learnable model of network emergence games that jointly discovers the underlying payoff mechanisms and strategic profiles of agents from the data. This approach enables learning interpretable and transferable payoffs while the learned game as a model facilitates strategic prediction tasks, both of which are applied to several real world networks.Large scale machine learning for geospatial problems in computational sustainabilityRobinson, David Calebhttp://hdl.handle.net/1853/635832020-09-09T07:24:33Z2020-05-14T00:00:00ZLarge scale machine learning for geospatial problems in computational sustainability
Robinson, David Caleb
The UN laid out 17 Sustainable Development Goals as part of the “The 2030 Agenda for Sustainable Development”. Each goal consists of broad targets - such as increasing the percentage of forested land (indicator 15.1.1) - for the world to work towards in order to achieve a more sustainable future. Part of achieving these goals involves determining how to actually measure the progress that is being made towards them. Measurements of progress are necessary in order to ensure accountability, determine where resources are most needed, and weigh the effectiveness of existing sustainability efforts. However, many of the targets/indicators - like “the percentage of forested land” - are prohibitively difficult to measure at global scales without algorithmic support. Tackling these, and other problems originating in pursuit of goals in sustainability, present unique challenges in machine learning. My dissertation addresses several such challenges, and explores how to enable large scale machine learning with geospatial data in a variety of application areas:
(1) With remotely sensed imagery in land cover mapping and human population prediction. We use convolutional neural networks to tackle the land cover mapping problem, i.e. segmenting aerial imagery into different land cover classes (water, forests, fields, etc.). We develop: methods for improving the spatial generalization of land cover models, a human-in-the-loop active learning framework for fine-tuning such models to new areas, and a self-supervised training method for initializing such models when labeled data is scarce or unavailable. We use our models to generate the first 1m resolution land cover map covering the continental United States. We also predict human population density from satellite imagery using convolutional neural networks, then aggregate model features over administrative boundaries to learn zone level models for predicting population counts.
(2) With human migration data. We develop a new loss function for training neural networks to predict human migrations between US counties. We further couple human migration and sea level rise models in a general framework for predicting population distributions under different future sea level rise scenarios. Our results highlight how sea level rise could have farther reaching indirect effects (through changing migration patterns) in addition to the direct effects along coastlines.
2020-05-14T00:00:00ZRobinson, David CalebThe UN laid out 17 Sustainable Development Goals as part of the “The 2030 Agenda for Sustainable Development”. Each goal consists of broad targets - such as increasing the percentage of forested land (indicator 15.1.1) - for the world to work towards in order to achieve a more sustainable future. Part of achieving these goals involves determining how to actually measure the progress that is being made towards them. Measurements of progress are necessary in order to ensure accountability, determine where resources are most needed, and weigh the effectiveness of existing sustainability efforts. However, many of the targets/indicators - like “the percentage of forested land” - are prohibitively difficult to measure at global scales without algorithmic support. Tackling these, and other problems originating in pursuit of goals in sustainability, present unique challenges in machine learning. My dissertation addresses several such challenges, and explores how to enable large scale machine learning with geospatial data in a variety of application areas:
(1) With remotely sensed imagery in land cover mapping and human population prediction. We use convolutional neural networks to tackle the land cover mapping problem, i.e. segmenting aerial imagery into different land cover classes (water, forests, fields, etc.). We develop: methods for improving the spatial generalization of land cover models, a human-in-the-loop active learning framework for fine-tuning such models to new areas, and a self-supervised training method for initializing such models when labeled data is scarce or unavailable. We use our models to generate the first 1m resolution land cover map covering the continental United States. We also predict human population density from satellite imagery using convolutional neural networks, then aggregate model features over administrative boundaries to learn zone level models for predicting population counts.
(2) With human migration data. We develop a new loss function for training neural networks to predict human migrations between US counties. We further couple human migration and sea level rise models in a general framework for predicting population distributions under different future sea level rise scenarios. Our results highlight how sea level rise could have farther reaching indirect effects (through changing migration patterns) in addition to the direct effects along coastlines.