Now showing items 1-3 of 3
Building agents that can see, talk, and act
(Georgia Institute of Technology, 2020-04-25)
A long-term goal in AI is to build general-purpose intelligent agents that simultaneously possess the ability to perceive the rich visual environment around us (through vision, audition, or other sensors), reason and infer ...
Visually grounded language understanding and generation
(Georgia Institute of Technology, 2020-01-13)
The world around us involves multiple modalities -- we see objects, feel texture, hear sounds, smell odors and so on. In order for Artificial Intelligence (AI) to make progress in understanding the world around us, it needs ...
Visual question answering and beyond
(Georgia Institute of Technology, 2019-09-03)
In this dissertation, I propose and study a multi-modal Artificial Intelligence (AI) task called Visual Question Answering (VQA) -- given an image and a natural language question about the image (e.g., "What kind of store ...