Understanding and circumventing censorship on Chinese social media
MetadataShow full item record
Chinese Internet users not only face the most technologically advanced filtering system in the world, the Great Firewall of China, but also are under the watchful eyes of the repressive government that controls every layer of their communications. Although social networking sites such as Facebook and Twitter are blocked in China, Chinese Internet users have the local replicas such as WeChat and Sina Weibo to communicate with others. However, these sites employ both advanced keyword detection algorithms and human censors to filter any kind of inappropriate content. While previous research has explored the technology behind the censorship mechanisms, little work has focused on the effects of censorship on online and offline behaviors. In this thesis, I bridge this gap by conducting a mixed-method study to gain a deeper understanding of these effects. The results of the mixed-method study show that censorship has strong off-platform effects, which are not detectable from usage logs. Users deliberately self-censor their speech out of caution, because they do not have a clear understanding of what content is being censored and what risks are associated with censorship on Chinese social media. Although on-platform effects of censorship are present on social media usage logs, they wear out over time. Informed by these results, I attempt to provide social media users a better understanding of how the censorship mechanism works and an effective censorship circumvention technique, both of which will lead to greater freedom of expression among social media users. Digital activists have long employed homophones of censored keywords to avoid detection by keyword matching algorithms on Chinese social media. One part of this thesis demonstrates that it is possible to scale this technique up in ways that are costly and difficult to defend against because human censors must manually read through all social media posts. Specifically, I developed a non-deterministic algorithm for generating homophones that creates large numbers of false positives for censors. In experiments, the algorithm allows homophone-transformed posts to remain on Sina Weibo three times longer than their previously censored counterparts without creating any confusion to native Chinese speakers. Extrapolating from this work, I employed this algorithm in the development of CENSE, a real-time system that Chinese social media users can use to easily detect and replace censored keywords with homophones. The results of a formative interview study indicate a welcoming response from Chinese social media users to the concept of a censorship circumvention tool. Overall, the contributions of my research bridge the areas of Internet censorship and censorship circumvention technologies. The mixed-method study provides a better understanding of how censorship affects social media users. Additionally, the homophone transformation algorithm and CENSE, a real-time censorship circumvention tool, aid users in experiencing increased freedom of expression on Chinese social media.