Character Eyes: Seeing Language through Character-Level Taggers
MetadataShow full item record
Character-level models have been used extensively in recent years in NLP tasks as both supplements and replacements for closed-vocabulary token-level word representations. In one popular architecture, character-level RNNs, typically LSTMs, form a bottom tier creating a word representation for a sequence tagger used to predict token-level annotations such as part-of-speech (POS) tags. In this work, we examine the behavior of POS taggers from the perspective of individual hidden units within the character-level LSTM. Analysis of activation patterns on a macro scale allows us to identify the ways in which the burden of POS detection is spread across the hidden layer in different languages, as a function of their morphological properties. Using ablation tests, we show how different allocations of forward and backward units affect model arrangement and performance in different categories of languages. We use these results to offer heuristics for hyperparameter selection that are based on known linguistic traits.