Understanding social media credibility
MetadataShow full item record
Today, social media provide the means by which billions of people experience news and events happening around the world. We hear about breaking news from people we “follow” on Twitter. We engage in discussions about unfolding news stories with our “friends” on Facebook. We tend to read and respond to strangers sharing newsworthy information on Reddit. Simply put, individuals are increasingly relying on social media to share news and information quickly, without relying on established official sources. While on one hand this empowers us with unparalleled information access, on the other hand it presents a new challenge — the challenge of ensuring that the unfiltered information originating from unofficial sources is credible. In fact, there is a popular narrative that social media is full of inaccurate information. But how much? Does information with dubious credibility have structure — temporal or linguistic? Are there systematic variations in such structures between highly credible and less credible information? This dissertation finds answers to such questions. Despite many organized attempts along this line of research, social media is still opaque to the credibility of news and information. When you view your social media feed, you have no sense as to which parts are reliable and which are not. In other words, we do not understand the basic properties separating credible and non-credible content in our social feeds. This dissertation addresses this gap, by building large-scale, generalizable science around credibility in social media. Specifically, this dissertation makes the following contributions. First, it offers an iterative framework for systematically tracking the credibility of social media information. My framework combines machine and human computation in an efficient way to track both less well-known and widespread instances of newsworthy content in real-time, followed by crowd-sourcing credibility assessments. Next, by running the framework for several months on the popular social networking site – Twitter, I present a corpus (CREDBANK) with newsworthy topics, their associated tweets and corresponding credibility scores. Combining the massive dataset of CREDBANK with linguistic scholarship, I show that a parsimonious language model can predict the credibility of newsworthy topics with an accuracy of 68% compared to a random baseline of 25%. A deeper look at the most predictive phrases revealed that certain classes of words, like hedges, were associated with lower credibility, while affirmative booster words were indicative of higher credibility. Next, by investigating differences in temporal dynamics through the lens of collective attention, I demonstrate that recurring attentional bursts are correlated with lower credible events. These results provide the basis for addressing the online misinformation ecosystem. They also open avenues for future research in designing interventions aimed at controlling the spread of false information or cautioning social media users to be skeptical about an evolving topic’s veracity, ultimately raising an individual’s capacity to assess the credibility of content shared on social media.