Show simple item record

dc.contributor.authorTang, Weien_US
dc.date.accessioned2005-03-04T16:41:32Z
dc.date.available2005-03-04T16:41:32Z
dc.date.issued2003-12-08en_US
dc.identifier.urihttp://hdl.handle.net/1853/5281
dc.description.abstractInformation monitoring systems are publish-subscribe systems that continuously track information changes and notify users (or programs acting on behalf of humans) of relevant updates according to specified thresholds. Internet-scale information monitoring presents a number of new challenges. First, automated change detection is harder when sources are autonomous and updates are performed asynchronously. Second, information source heterogeneity makes the problem of modelling and representing changes harder than ever. Third, efficient and scalable mechanisms are needed to handle a large and growing number of users and thousands or even millions of monitoring triggers fired at multiple sources. In this dissertation, we model users' monitoring requests using continual queries (CQs) and present a suite of efficient and scalable solutions to large scale information monitoring over structured or semi-structured data sources. A CQ is a standing query that monitors information sources for interesting events (triggers) and notifies users when new information changes meet specified thresholds. In this dissertation, we first present the system level facilities for building an Internet-scale continual query system, including the design and development of two operational CQ monitoring systems OpenCQ and WebCQ, the engineering issues involved, and our solutions. We then describe a number of research challenges that are specific to large-scale information monitoring and the techniques developed in the context of OpenCQ and WebCQ to address these challenges. Example issues include how to efficiently process large number of continual queries, what mechanisms are effective for building a scalable distributed trigger system that is capable of handling tens of thousands of triggers firing at hundreds of data sources, how to effectively disseminate fresh information to the right users at the right time. We have developed a suite of techniques to optimize the processing of continual queries, including an effective CQ grouping scheme, an auxiliary data structure to support group-based indexing of CQs, and a differential CQ evaluation algorithm (DRA). The third contribution is the design of an experimental evaluation model and testbed to validate the solutions. We have engaged our evaluation using both measurements on real systems (OpenCQ/WebCQ) and simulation-based approach. To our knowledge, the research documented in this dissertation is to date the first one to present a focused study of research and engineering issues in building large-scale information monitoring systems using continual queries.en_US
dc.format.extent3199538 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.publisherGeorgia Institute of Technologyen_US
dc.subjectDifferential re-evaluationen_US
dc.subjectContinual queries
dc.subjectWeb page monitoring
dc.subjectSemi-structured data
dc.subjectInformation monitoring
dc.subject.lcshWeb sites Managementen_US
dc.subject.lcshInternet programmingen_US
dc.subject.lcshInformation technologyen_US
dc.titleInternet-Scale Information Monitoring: A Continual Query Approachen_US
dc.typeDissertationen_US
dc.description.degreePh.D.en_US
dc.contributor.departmentComputingen_US
dc.description.advisorCommittee Chair: Ling Liu; Committee Member: Calton Pu; Committee Member: Constantinos Dovrolis; Committee Member: Edward Omiecinski; Committee Member: Leo Mark; Committee Member: Thomas E. Potoken_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record