• Login
    View Item 
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Geometric Methods for Mining Large and Possibly Private Datasets

    Thumbnail
    View/Open
    chen_keke_200608_phd.pdf (2.305Mb)
    Date
    2006-07-07
    Author
    Chen, Keke
    Metadata
    Show full item record
    Abstract
    With the wide deployment of data intensive Internet applications and continued advances in sensing technology and biotechnology, large multidimensional datasets, possibly containing privacy-conscious information have been emerging. Mining such datasets has become increasingly common in business integration, large-scale scientific data analysis, and national security. The proposed research aims at exploring the geometric properties of the multidimensional datasets utilized in statistical learning and data mining, and providing novel techniques and frameworks for mining very large datasets while protecting the desired data privacy. The first main contribution of this research is the development of iVIBRATE interactive visualization-based approach for clustering very large datasets. The iVIBRATE framework uniquely addresses the challenges in handling irregularly shaped clusters, domain-specific cluster definition, and cluster-labeling of the data on disk. It consists of the VISTA visual cluster rendering subsystem, and the Adaptive ClusterMap Labeling subsystem. The second main contribution is the development of ``Best K Plot'(BKPlot) method for determining the critical clustering structures in multidimensional categorical data. The BKPlot method uniquely addresses two challenges in clustering categorical data: How to determine the number of clusters (the best K) and how to identify the existence of significant clustering structures. The method consists of the basic theory, the sample BKPlot theory for large datasets, and the testing method for identifying no-cluster datasets. The third main contribution of this research is the development of the theory of geometric data perturbation and its application in privacy-preserving data classification involving single party or multiparty collaboration. The key of geometric data perturbation is to find a good randomly generated rotation matrix and an appropriate noise component that provides satisfactory balance between privacy guarantee and data quality, considering possible inference attacks. When geometric perturbation is applied to collaborative multiparty data classification, it is challenging to unify the different geometric perturbations used by different parties. We study three protocols under the data-mining-service oriented framework for unifying the perturbations: 1) the threshold-satisfied voting protocol, 2) the space adaptation protocol, and 3) the space adaptation protocol with a trusted party. The tradeoffs between the privacy guarantee, the model accuracy and the cost are studied for the protocols.
    URI
    http://hdl.handle.net/1853/11561
    Collections
    • College of Computing Theses and Dissertations [1071]
    • Georgia Tech Theses and Dissertations [22401]

    Browse

    All of SMARTechCommunities & CollectionsDatesAuthorsTitlesSubjectsTypesThis CollectionDatesAuthorsTitlesSubjectsTypes

    My SMARTech

    Login

    Statistics

    View Usage StatisticsView Google Analytics Statistics
    facebook instagram twitter youtube
    • My Account
    • Contact us
    • Directory
    • Campus Map
    • Support/Give
    • Library Accessibility
      • About SMARTech
      • SMARTech Terms of Use
    Georgia Tech Library266 4th Street NW, Atlanta, GA 30332
    404.894.4500
    • Emergency Information
    • Legal and Privacy Information
    • Human Trafficking Notice
    • Accessibility
    • Accountability
    • Accreditation
    • Employment
    © 2020 Georgia Institute of Technology