Detecting the Change of Clustering Structure in Categorical Data Streams
MetadataShow full item record
Clustering data streams can provide critical information for making decision in real-time. We argue that detecting the change of clustering structure in the data streams can be beneficial to many realtime monitoring applications. In this paper, we present a framework for detecting changes of clustering structure in categorical data streams. The change of clustering structure is detected by the change of the best number of clusters in the data stream. The framework consists of two main components: the BkPlot method for determining the best number of clusters in a categorical dataset, and the summarization structure, Hierarchical Entropy Tree (HE-Tree), for efficiently capturing the entropy property of the categorical data streams. HE-Tree enables us to quickly and precisely draw the clustering information from the data stream that is needed by BkPlot method to identify the change of best number of clusters. Combining the snapshots of the HE-Tree information and the BkPlot method, we are able to observe the change of clustering structure online. The experiments show that HE-Tree + BkPlot method can efficiently and precisely detect the change of clustering structure in categorical data streams.