Data Privacy in the Modern Machine Learning Ecosystem
MetadataShow full item record
The explosion of data collection and advances in artificial intelligence and machine learning have motivated a robust economy around cloud-based machine learning services. While such services provide opportunities for a broad array of individuals and companies to leverage the power of modern machine learning they also introduce new vulnerabilities and privacy risks, such as membership inference attacks, attribute inference attacks, and data poisoning attacks. Such attacks can allow for the malicious manipulation of model outcomes and can cause serious violations to the data privacy of those individuals who have contributed to the model learning process. Federated learning (FL) is a decentralized collaborative machine learning paradigm developed in response to such privacy risks. In FL, machine learning models are trained via multiple rounds of communication using a distributed computing platform. This allows FL participants to share only their local training model updates, therefore allowing individual participants to keep private training data to remain local. While FL systems protect raw data from explicit disclosure, such systems remain vulnerable to inference based privacy risks, such as membership or attribute inference attacks, as well as private training data poisoning attacks. This dissertation research is dedicated to making original contributions towards addressing the growing public concern and legislative action surrounding data privacy in the modern machine learning ecosystem. This dissertation research first takes a holistic approach to create a structured and comprehensive analysis of privacy risks in machine learning including a characterization of privacy vulnerabilities in both centralized and decentralized settings, an in-depth study on inference-based privacy attacks, specifically membership inference, against machine learning models, and a framework for evaluating membership inference risks in machine learning. The second contribution is the development of a privacy-preserving machine learning solutions. This includes an analysis on privacy-preserving techniques in machine learning as well as protocols for the private training and evaluation of machine learning models under formal privacy frameworks including differential privacy and secure multiparty computation. The next contribution consists of an analysis of the challenges and system considerations for extending privacy protection to the growing domain of federated learning. The final contribution is a proposed architecture, TSC-PFed, for trust and security enhanced customizable private federated learning. To this end, we propose the development of a privacy-enhanced federated learning system which incorporates both differential privacy and secure multiparty computation (SMC) to privately train accurate predictive models. Within our TSC-PFed system we include support for considering trust dynamics within a federated learning system which allow FL participants to decrease the degree of noise injected locally by a customizable trust factor $t$ while still adhering to a global differential privacy guarantee. We additionally provide support for security enhancements as well as customizable settings which allow participants to tune the type and level of privacy provided by TSC-PFed.