Numerical and streaming analyses of centrality measures on graphs
MetadataShow full item record
Graph data represent information about entities (vertices) and the relationships or connections between them (edges). In real-world networks today, new data are constantly being produced, leading to the notion of dynamic graphs. When analyzing large graphs, a common problem of interest is to identify the most important vertices in a graph, which can be done using centrality metrics. This dissertation presents novel advances in the field of graph analysis by providing numerical and streaming techniques that help us better understand how to compute centrality measures. Several centrality measures are calculated by solving a linear system but since these linear systems are large, iterative solvers are often used as an alternate method to approximate the solution. We relate the two research areas of numerical accuracy and data mining by understanding how the error in a solver affects the relative ranking of vertices in a graph. To calculate the centrality values of vertices in a dynamic graph, the most naive method is to recompute the scores from scratch every time the graph is changed, but as the graph size grows larger this recomputation is computationally infeasible. We present four dynamic algorithms for updating different centrality metrics in evolving networks. All dynamic algorithms are faster than their static counterparts while maintaining good quality of the centrality scores. This dissertation concludes by applying methods discussed for the computation of centrality metrics to community detection, and we present a new algorithm for identifying local communities in a dynamic graph using personalized centrality.