• Login
    View Item 
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Fair and diverse data representation in machine learning

    Thumbnail
    View/Open
    TANTIPONGPIPAT-DISSERTATION-2020.pdf (1.283Mb)
    Date
    2020-05-17
    Author
    Tantipongpipat, Uthaipon
    Metadata
    Show full item record
    Abstract
    The work contains two major lines of research: subset selection and multi-criteria dimensionality reduction with an application to fairness. Subset selection can be applied to a classical problem of optimal design in statistics and many others in machine learning when learning is subject to a labelling budget constraint. This thesis also extends the arguably most commonly used dimensionality reduction technique, Principal Component Analysis (PCA), to satisfy a fairness criterion of choice. We model an additional fairness constraint as multi-criteria dimensionality reduction where we are given multiple objectives that need to be optimized simultaneously. Our first contribution is to show that approximability of certain criteria for optimal design problems can be obtained by novel polynomial-time sampling algorithms, improving upon best previous approximation ratios in the literature. We also show that the A-optimal design problem is NP-hard to approximate within a fixed constant when k = d. One of the most common heuristics used in practice to solve A and D-optimal design problems is the local search heuristic, also known as the Fedorov’s exchange method. This is due to its simplicity and its empirical performance. However, despite its wide usage, no theoretical bound has been proven for this algorithm. We bridge this gap and prove approximation guarantees for the local search algorithms for A- and D-optimal design problems. Our model of multi-criteria dimensionality reduction captures several fairness criteria for dimensionality reduction such as the Fair-PCA problem introduced by Samadi et al. in 2018 and the Nash Social Welfare (NSW) problem. In the Fair-PCA problem, the input data is divided into k groups, and the goal is to find a single d-dimensional representation for all groups for which the maximum reconstruction error of any one group is minimized. In NSW, the goal is to maximize the product of the individual variances of the groups achieved by the common low-dimensional space. We develop algorithms for multi-criteria dimensionality reduction and show their theoretical performance and fast implementations in practice.
    URI
    http://hdl.handle.net/1853/63581
    Collections
    • College of Computing Theses and Dissertations [1191]
    • Georgia Tech Theses and Dissertations [23877]

    Browse

    All of SMARTechCommunities & CollectionsDatesAuthorsTitlesSubjectsTypesThis CollectionDatesAuthorsTitlesSubjectsTypes

    My SMARTech

    Login

    Statistics

    View Usage StatisticsView Google Analytics Statistics
    facebook instagram twitter youtube
    • My Account
    • Contact us
    • Directory
    • Campus Map
    • Support/Give
    • Library Accessibility
      • About SMARTech
      • SMARTech Terms of Use
    Georgia Tech Library266 4th Street NW, Atlanta, GA 30332
    404.894.4500
    • Emergency Information
    • Legal and Privacy Information
    • Human Trafficking Notice
    • Accessibility
    • Accountability
    • Accreditation
    • Employment
    © 2020 Georgia Institute of Technology