Using rationales and influential training examples to (attempt to) explain neural predictions in NLP
Abstract
Modern deep learning models for natural language processing (NLP) achieve state-of-the-art predictive performance but are notoriously opaque. I will discuss recent work looking to address this limitation. I will focus specifically on approaches to: (i) Providing snippets of text (sometimes called "rationales") that support predictions, and; (ii) Identifying examples from the training data that influenced a given model output.