Search
Now showing items 1-10 of 20
Explaining model decisions and fixing them via human feedback
(Georgia Institute of Technology, 2020-05-07)
Deep networks have enabled unprecedented breakthroughs in a variety of computer vision tasks. While these models enable superior performance, their increasing complexity and lack of decomposability into individually intuitive ...
Visual attribute labeling of images
(Georgia Institute of Technology, 2019-08-12)
In this work, we analyze and apply various recent techniques in visual attribute recognition and labeling on a common benchmark dataset in order to motivate the design of a novel framework for this task. Using the large ...
Visual question answering and beyond
(Georgia Institute of Technology, 2019-09-03)
In this dissertation, I propose and study a multi-modal Artificial Intelligence (AI) task called Visual Question Answering (VQA) -- given an image and a natural language question about the image (e.g., "What kind of store ...
Evaluating visual conversational agents via cooperative human-AI games
(Georgia Institute of Technology, 2019-04-26)
As AI continues to advance, human-AI teams are inevitable. However, progress in AI is routinely measured in isolation, without a human in the loop. It is crucial to benchmark progress in AI, not just in isolation, but ...
Urban 3D scene understanding from images
(Georgia Institute of Technology, 2018-01-22)
Human vision is marvelous in obtaining a structured representation of complex dynamic scenes, such as spatial scene-layout, re-organization of the scene into its constituent objects, support of each object, etc. We also ...
Domain adaptation via data augmentation
(Georgia Institute of Technology, 2020-04-28)
Deep learning (DL) models require large labeled datasets for training. Practitioners often need to adapt an existing DL model to a different domain. For instance, a practitioner in a company developing autonomous vehicles ...
Interpretation, grounding and imagination for machine intelligence
(Georgia Institute of Technology, 2018-11-08)
Understanding how to model computer vision and natural language jointly is a long-standing challenge in artificial intelligence. In this thesis, I study how modeling vision and language using semantic and pragmatic ...
EvalAI: Evaluating AI systems at scale
(Georgia Institute of Technology, 2018-12-06)
Artificial Intelligence research has progressed tremendously in the last few years. There has been the introduction of several new multi-modal datasets and tasks due to which it is becoming much harder to compare new ...
Visually grounded language understanding and generation
(Georgia Institute of Technology, 2020-01-13)
The world around us involves multiple modalities -- we see objects, feel texture, hear sounds, smell odors and so on. In order for Artificial Intelligence (AI) to make progress in understanding the world around us, it needs ...
Building agents that can see, talk, and act
(Georgia Institute of Technology, 2020-04-25)
A long-term goal in AI is to build general-purpose intelligent agents that simultaneously possess the ability to perceive the rich visual environment around us (through vision, audition, or other sensors), reason and infer ...