Visualization of textual content from social media and online communities
MetadataShow full item record
In this thesis, I explore design principles for interactive visualizations that facilitate analysis of large quantities of text documents from social media and online communities. I summarize characteristics of such text documents, including their huge volume, short and informal expressions, high density of repeated language patterns, high noise-to-information ratio, and the prevalence of conflicting opinions. All of these characteristics pose challenges for analyzing the data, in addition to the difficulties of processing natural language. I focus on two domains of text, consumer reviews and social media posts, and show that analytical tasks in both domains share three common steps: 1) gaining an overall impression of the dataset by learning the major topics, 2) finding interesting facets of the dataset that are worth exploration, 3) reading the original documents to gain insights. I introduce two visualization systems that address these tasks for the two domains I study. OpinionBlocks presents a novel visualization interface for reading consumer reviews and enables crowd-correction of text analysis errors. SentenTree is a new visualization technique uniquely suited for social media text analysis by providing the key benefits of both word-based (a variation of word cloud) and sentence-based (as represented by Word Tree) visual metaphors while overcoming some of the limitations of each.