Characterizing World Wide Web Ecologies
One of the fastest growing sources of information today is the World Wide Web (WWW), having grown from only fifty sources of information in January of 1993 to over a half million four years later. The exponential growth of information within the Web has created an overabundance of information and a poverty of human attention, with users citing the inability to navigate and find relevant information on the Web as one of the biggest problems facing the Web today. The primary goal of the research presented here is to put forth new techniques and models that can be used to help efficiently manage peoples attentional processes when dealing with large, unstructured, heterogeneous information environments. The primary model is based upon the desirability of items on the Web. This research searches for lawful patterns of structure, content, and use. Methods are developed to exploit these patterns to organize and optimize users' information foraging and sense-making activities. These enhancements rely on predicting, categorization and allocation of attention. Several methods are explored for inducing categorical structures for the WWW. Some of these enhancements involve clustering in a high-dimensional space of content, use, and structural features. Others derive from cocitation analysis methods used in the study of scientific communities. A user would also be aided by retrieval mechanisms that predicted and returned the most likely needed WWW pages, given that the user is attending to some given page(s). The approach of this research uses a spreading activation mechanism to predict the needed, relevant information, computed using past usage patterns, degree of shared content, and WWW hyperlink structure.