Now showing items 1-6 of 6
Visually grounded language understanding and generation
(Georgia Institute of Technology, 2020-01-13)
The world around us involves multiple modalities -- we see objects, feel texture, hear sounds, smell odors and so on. In order for Artificial Intelligence (AI) to make progress in understanding the world around us, it needs ...
EvalAI: Evaluating AI systems at scale
(Georgia Institute of Technology, 2018-12-06)
Artificial Intelligence research has progressed tremendously in the last few years. There has been the introduction of several new multi-modal datasets and tasks due to which it is becoming much harder to compare new ...
Encoding 3D contextual information for dynamic scene understanding
(Georgia Institute of Technology, 2020-04-27)
This thesis aims to demonstrate how using 3D cues improves semantic labeling and object classification. Specifically, we will consider depth, surface normals, object classification, and pixel-wise semantic labeling in this ...
Explaining model decisions and fixing them via human feedback
(Georgia Institute of Technology, 2020-05-07)
Deep networks have enabled unprecedented breakthroughs in a variety of computer vision tasks. While these models enable superior performance, their increasing complexity and lack of decomposability into individually intuitive ...
Domain adaptation via data augmentation
(Georgia Institute of Technology, 2020-04-28)
Deep learning (DL) models require large labeled datasets for training. Practitioners often need to adapt an existing DL model to a different domain. For instance, a practitioner in a company developing autonomous vehicles ...
Visual question answering and beyond
(Georgia Institute of Technology, 2019-09-03)
In this dissertation, I propose and study a multi-modal Artificial Intelligence (AI) task called Visual Question Answering (VQA) -- given an image and a natural language question about the image (e.g., "What kind of store ...