Search
Now showing items 1-10 of 12
Building agents that can see, talk, and act
(Georgia Institute of Technology, 2020-04-25)
A long-term goal in AI is to build general-purpose intelligent agents that simultaneously possess the ability to perceive the rich visual environment around us (through vision, audition, or other sensors), reason and infer ...
Domain adaptation via data augmentation
(Georgia Institute of Technology, 2020-04-28)
Deep learning (DL) models require large labeled datasets for training. Practitioners often need to adapt an existing DL model to a different domain. For instance, a practitioner in a company developing autonomous vehicles ...
Evaluating visual conversational agents via cooperative human-AI games
(Georgia Institute of Technology, 2019-04-26)
As AI continues to advance, human-AI teams are inevitable. However, progress in AI is routinely measured in isolation, without a human in the loop. It is crucial to benchmark progress in AI, not just in isolation, but ...
EvalAI: Evaluating AI systems at scale
(Georgia Institute of Technology, 2018-12-06)
Artificial Intelligence research has progressed tremendously in the last few years. There has been the introduction of several new multi-modal datasets and tasks due to which it is becoming much harder to compare new ...
Interpretation, grounding and imagination for machine intelligence
(Georgia Institute of Technology, 2018-11-08)
Understanding how to model computer vision and natural language jointly is a long-standing challenge in artificial intelligence. In this thesis, I study how modeling vision and language using semantic and pragmatic ...
Encoding 3D contextual information for dynamic scene understanding
(Georgia Institute of Technology, 2020-04-27)
This thesis aims to demonstrate how using 3D cues improves semantic labeling and object classification. Specifically, we will consider depth, surface normals, object classification, and pixel-wise semantic labeling in this ...
Detecting Mosquitoes with Convolutional Neural Networks
Mosquitoes are directly responsible for the death of more than a million people each year. Yet the ability to mitigate their deadly impact or even monitor them in the wild to better understand their behavior remains ...
Visual attribute labeling of images
(Georgia Institute of Technology, 2019-08-12)
In this work, we analyze and apply various recent techniques in visual attribute recognition and labeling on a common benchmark dataset in order to motivate the design of a novel framework for this task. Using the large ...
Visual question answering and beyond
(Georgia Institute of Technology, 2019-09-03)
In this dissertation, I propose and study a multi-modal Artificial Intelligence (AI) task called Visual Question Answering (VQA) -- given an image and a natural language question about the image (e.g., "What kind of store ...
Segmental discriminative analysis for American Sign Language recognition and verification
(Georgia Institute of Technology, 2010-04-06)
This dissertation presents segmental discriminative analysis techniques for American Sign Language (ASL) recognition and verification. ASL recognition is a sequence classification problem. One of the most successful ...