Interpretation, grounding and imagination for machine intelligence
Vedantam, Shanmukha Ramak
MetadataShow full item record
Understanding how to model computer vision and natural language jointly is a long-standing challenge in artificial intelligence. In this thesis, I study how modeling vision and language using semantic and pragmatic considerations can help derive more human-like inferences from machine learning models. Specifically, I consider three related problems: interpretation, grounding and imagination. In interpretation, the goal is to get machine learning models to understand an image and describe its contents using natural language in a contextually relevant manner. In grounding, I study how to connect natural language to referents in the physical world, and understand if this can help learn common sense. Finally, in imagination, I study how to ‘imagine’ visual concepts completely and accurately across the full range and (potentially unseen) compositions of their visual attributes. This thesis analyzes these problems from computational as well as algorithmic perspectives and suggests exciting directions for future work.