Data analytics for networked and possibly private sources
MetadataShow full item record
This thesis focuses on two grand challenges facing data analytical system designers and operators nowadays. First, how to fuse information from multiple autonomous, yet correlated sources and to provide consistent views of underlying phenomena? Second, how to respect externally imposed constraints (privacy concerns in particular) without compromising the efficacy of analysis? To address the first challenge, we apply a general correlation network model to capture the relationships among data sources, and propose Network-Aware Analysis (NAA), a library of novel inference models, to capture (i) how the correlation of the underlying sources is reflected as the spatial and/or temporal relevance of the collected data, and (ii) how to track causality in the data caused by the dependency of the data sources. We have also developed a set of space-time efficient algorithms to address (i) how to correlate relevant data and (ii) how to forecast future data. To address the second challenge, we further extend the concept of correlation network to encode the semantic (possibly virtual) dependencies and constraints among entities in question (e.g., medical records). We show through a set of concrete cases that correlation networks convey significant utility for intended applications, and meanwhile are often used as the steppingstone by adversaries to perform inference attacks. Using correlation networks as the pivot for analyzing privacy-utility trade-offs, we propose Privacy-Aware Analysis (PAA), a general design paradigm of constructing analytical solutions with theoretical backing for both privacy and utility.