Modeling performance of internet-based services using causal reasoning
Tariq, Muhammad Mukarram Bin
MetadataShow full item record
The performance of Internet-based services depends on many server-side, client-side, and network related factors. Often, the interaction among the factors or their effect on service performance is not known or well-understood. The complexity of these services makes it difficult to develop analytical models. Lack of models impedes network management tasks, such as predicting performance while planning for changes to service infrastructure, or diagnosing causes of poor performance. We posit that we can use statistical causal methods to model performance for Internet-based services and facilitate performance related network management tasks. Internet-based services are well-suited for statistical learning because the inherent variability in many factors that affect performance allows us to collect comprehensive datasets that cover service performance under a wide variety of conditions. These conditional distributions represent the functions that govern service performance and dependencies that are inherent in the service infrastructure. These functions and dependencies are accurate and can be used in lieu of analytical models to reason about system performance, such as predicting performance of a service when changing some factors, finding causes of poor performance, or isolating contribution of individual factors in observed performance. We present three systems, What-if Scenario Evaluator (WISE), How to Improve Performance (HIP), and Network Access Neutrality Observatory (NANO), that use statistical causal methods to facilitate network management tasks. WISE predicts performance for what-if configurations and deployment questions for content distribution networks. For this, WISE learns the causal dependency structure among the latency-causing factors, and when one or more factors is changed, WISE estimates effect on other factors using the dependency structure. HIP extends WISE and uses the causal dependency structure to invert the performance function, find causes of poor performance, and help answers questions about how to improve performance or achieve performance goals. NANO uses causal inference to quantify the impact of discrimination policies of ISPs on service performance. NANO is the only tool to date for detecting destination-based discrimination techniques that ISPs may use. We have evaluated these tools by application to large-scale Internet-based services and by experiments on wide-area Internet. WISE is actively used at Google for predicting network-level and browser-level response time for Web search for new datacenter deployments. We have used HIP to find causes of high-latency Web search transactions in Google, and identified many cases where high-latency transactions can be significantly mitigated with simple infrastructure changes. We have evaluated NANO using experiments on wide-area Internet and also made the tool publicly available to recruit users and deploy NANO at a global scale.