Discovering and Tracking Interesting Web Services

Show full item record

Please use this identifier to cite or link to this item: http://hdl.handle.net/1853/4889

Title: Discovering and Tracking Interesting Web Services
Author: Rocco, Daniel J. (Daniel John)
Abstract: The World Wide Web has become the standard mechanism for information distribution and scientific collaboration on the Internet. This dissertation research explores a suite of techniques for discovering relevant dynamic sources in a specific domain of interest and for managing Web data effectively. We first explore techniques for discovery and automatic classification of dynamic Web sources. Our approach utilizes a service class model of the dynamic Web that allows the characteristics of interesting services to be specified using a service class description. To promote effective Web data management, the Page Digest Web document encoding eliminates tag redundancy and places structure, content, tags, and attributes into separate containers, each of which can be referenced in isolation or in conjunction with the other elements of the document. The Page Digest Sentinel system leverages our unique encoding to provide efficient and scalable change monitoring for arbitrary Web documents through document compartmentalization and semantic change request grouping. Finally, we present XPack, an XML document compression system that uses a containerized view of an XML document to provide both good compression and efficient querying over compressed documents. XPack's queryable XML compression format is general-purpose, does not rely on domain knowledge or particular document structural characteristics for compression, and achieves better query performance than standard query processors using text-based XML. Our research expands the capabilities of existing dynamic Web techniques, providing superior service discovery and classification services, efficient change monitoring of Web information, and compartmentalized document handling. DynaBot is the first system to combine a service class view of the Web with a modular crawling architecture to provide automated service discovery and classification. The Page Digest Web document encoding represents Web documents efficiently by separating the individual characteristics of the document. The Page Digest Sentinel change monitoring system utilizes the Page Digest document encoding for scalable change monitoring through efficient change algorithms and intelligent request grouping. Finally, XPack is the first XML compression system that delivers compression rates similar to existing techniques while supporting better query performance than standard query processors using text-based XML.
Type: Dissertation
URI: http://hdl.handle.net/1853/4889
Date: 2004-12-01
Publisher: Georgia Institute of Technology
Subject: XML compression
Source discovery
Web services
Data management
Directed crawling
Document representation
XML (Document markup language)
Data compression (Computer science)
Internet searching
Query languages (Computer science)
Web services
Department: Computing
Advisor: Committee Chair: Ling Liu; Committee Member: Calton Pu; Committee Member: H. Venkatesawaran; Committee Member: Sham Navathe; Committee Member: Terence Critchlow
Degree: Ph.D.

All materials in SMARTech are protected under U.S. Copyright Law and all rights are reserved, unless otherwise specifically indicated on or in the materials.

Files in this item

Files Size Format View
rocco_daniel_j_200412_phd.pdf 1.755Mb PDF View/ Open

This item appears in the following Collection(s)

Show full item record