• Login
    View Item 
    •   SMARTech Home
    • Georgia Tech Interdisciplinary Research Centers (IRCs)
    • Machine Learning (ML@GT)
    • Machine Learning@Georgia Tech Seminars
    • View Item
    •   SMARTech Home
    • Georgia Tech Interdisciplinary Research Centers (IRCs)
    • Machine Learning (ML@GT)
    • Machine Learning@Georgia Tech Seminars
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    The Data-Driven Analysis of Literature

    Thumbnail
    View/Open
    bamman.mp4 (491.2Mb)
    bamman_videostream.html (1.323Kb)
    transcript.txt (59.77Kb)
    thumbnail.jpg (73.72Kb)
    Date
    2019-11-15
    Author
    Bamman, David
    Metadata
    Show full item record
    Abstract
    Literary novels push the limits of natural language processing. While much work in NLP has been heavily optimized toward the narrow domains of news and Wikipedia, literary novels are an entirely different animal--the long, complex sentences in novels strain the limits of syntactic parsers with super-linear computational complexity, their use of figurative language challenges representations of meaning based on neo-Davidsonian semantics, and their long length (ca. 100,000 words on average) rules out existing solutions for problems like coreference resolution that expect a small set of candidate antecedents. At the same time, fiction drives computational research questions that are uniquely interesting to that domain. In this talk, I'll outline some of the opportunities that NLP presents for research in the quantitative analysis of culture--including measuring the disparity in attention given to characters as a function of their gender over two hundred years of literary history (Underwood et al. 2018)--and describe our progress to date on two problems essential to a more complex representation of plot: recognizing the entities in literary texts, such as the characters, locations, and spaces of interest (Bamman et al. 2019) and identifying the events that are depicted as having transpired (Sims et al. 2019). Both efforts involve the creation of a new dataset of 200,000 words evenly drawn from 100 different English-language literary texts and building computational models to automatically identify each phenomenon. This is joint work with Matt Sims, Ted Underwood, Sabrina Lee, Jerry Park, Sejal Popat and Sheng Shen.
    URI
    http://hdl.handle.net/1853/62069
    Collections
    • Machine Learning@Georgia Tech Seminars [52]

    Browse

    All of SMARTechCommunities & CollectionsDatesAuthorsTitlesSubjectsTypesThis CollectionDatesAuthorsTitlesSubjectsTypes

    My SMARTech

    Login

    Statistics

    View Usage StatisticsView Google Analytics Statistics
    facebook instagram twitter youtube
    • My Account
    • Contact us
    • Directory
    • Campus Map
    • Support/Give
    • Library Accessibility
      • About SMARTech
      • SMARTech Terms of Use
    Georgia Tech Library266 4th Street NW, Atlanta, GA 30332
    404.894.4500
    • Emergency Information
    • Legal and Privacy Information
    • Human Trafficking Notice
    • Accessibility
    • Accountability
    • Accreditation
    • Employment
    © 2020 Georgia Institute of Technology