Internet-Scale Information Monitoring: A Continual Query Approach

Show full item record

Please use this identifier to cite or link to this item:

Title: Internet-Scale Information Monitoring: A Continual Query Approach
Author: Tang, Wei
Abstract: Information monitoring systems are publish-subscribe systems that continuously track information changes and notify users (or programs acting on behalf of humans) of relevant updates according to specified thresholds. Internet-scale information monitoring presents a number of new challenges. First, automated change detection is harder when sources are autonomous and updates are performed asynchronously. Second, information source heterogeneity makes the problem of modelling and representing changes harder than ever. Third, efficient and scalable mechanisms are needed to handle a large and growing number of users and thousands or even millions of monitoring triggers fired at multiple sources. In this dissertation, we model users' monitoring requests using continual queries (CQs) and present a suite of efficient and scalable solutions to large scale information monitoring over structured or semi-structured data sources. A CQ is a standing query that monitors information sources for interesting events (triggers) and notifies users when new information changes meet specified thresholds. In this dissertation, we first present the system level facilities for building an Internet-scale continual query system, including the design and development of two operational CQ monitoring systems OpenCQ and WebCQ, the engineering issues involved, and our solutions. We then describe a number of research challenges that are specific to large-scale information monitoring and the techniques developed in the context of OpenCQ and WebCQ to address these challenges. Example issues include how to efficiently process large number of continual queries, what mechanisms are effective for building a scalable distributed trigger system that is capable of handling tens of thousands of triggers firing at hundreds of data sources, how to effectively disseminate fresh information to the right users at the right time. We have developed a suite of techniques to optimize the processing of continual queries, including an effective CQ grouping scheme, an auxiliary data structure to support group-based indexing of CQs, and a differential CQ evaluation algorithm (DRA). The third contribution is the design of an experimental evaluation model and testbed to validate the solutions. We have engaged our evaluation using both measurements on real systems (OpenCQ/WebCQ) and simulation-based approach. To our knowledge, the research documented in this dissertation is to date the first one to present a focused study of research and engineering issues in building large-scale information monitoring systems using continual queries.
Type: Dissertation
Date: 2003-12-08
Publisher: Georgia Institute of Technology
Subject: Differential re-evaluation
Continual queries
Web page monitoring
Semi-structured data
Information monitoring
Web sites Management
Information technology
Internet programming
Department: Computing
Advisor: Committee Chair: Ling Liu; Committee Member: Calton Pu; Committee Member: Constantinos Dovrolis; Committee Member: Edward Omiecinski; Committee Member: Leo Mark; Committee Member: Thomas E. Potok
Degree: Ph.D.

All materials in SMARTech are protected under U.S. Copyright Law and all rights are reserved, unless otherwise specifically indicated on or in the materials.

Files in this item

Files Size Format View
tang_wei_200312.pdf 3.051Mb PDF View/ Open

This item appears in the following Collection(s)

Show full item record