Fast Algorithms for Querying and Mining Large Graphs
MetadataShow full item record
Graphs appear in a wide range of settings and have posed a wealth of fascinating problems. In this talk, I will present our recent work on (1) querying (e.g., given a social network, how to measure the closeness between two persons? how to track it over time?); and (2) mining (e.g., how to identify abnormal behaviors of computer networks? In the case of virus attacks, which nodes are the best to immunize?) large graphs. For the task of querying, our main finding is that many complex user-specific patterns on large graphs can be answered by means of proximity measurement. In other words, proximity allows us to query large graphs on the atomic levels. Then, I will talk about how to adapt querying tasks to the time evolving graphs. For fast computation of proximity, we developed a family of fast solutions to compute the proximity in several different scenarios. By carefully leveraging some important properties shared by many real graphs (e.g., the block-wise structure, the linear correlation, the skewness of real bipartite graphs, etc), we can often achieve orders of magnitude of speedup with little or no quality loss. For the task of mining, I will talk about immunization and anomaly detection. For immunization, we proposed a near-optimal, fast and scalable algorithm. For anomaly detection, we proposed a family of example-based low-rank matrix approximation methods. The proposed algorithms are provably equal to or better than best known methods in both space and time, with the same accuracy. On real data sets, it is up to 112x faster than the best competitors, for the same accuracy.