Learning to Form Coalitions in Heterogeneous Teams from Suboptimal Demonstrations
MetadataShow full item record
Multi-agent systems (MAS) have proven to be effective in a wide range of domains including warehouse automation, defense, agriculture, and environmental modeling. Heterogeneous MAS, often made up of a different types of agents with complementary capabilities, can particularly be effective in handling complex scenarios that require a variety of skills. However, coordinating such teams presents significant challenges that require either experts with near-perfect domain knowledge or learning approaches that require vast amounts of computation resources. This thesis explores the possibility of learning to coordinate heterogeneous MAS from humans who might not be experts and will not act optimally as a result. Specifically, we develop a framework that can learn to form effective coalitions (an instance of the task allocation problem) that can solve multiple concurrent tasks from suboptimal demonstrations. To this end, we first learn to predict the reward associated with a given allocation using supervised learning, and subsequently optimize over the space of allocations to identify one that will maximize the predicted reward. As such, we effectively utilize non-expert data to bootstrap learning, instead of attempting to learn from scratch. Consequently, our framework neither requires considerable domain knowledge nor incurs an unsustainable amount of computational burden. In order to implement and evaluate our framework, we also contribute a user study interface capable of collecting demonstrations from remote users as they play a virtual multi-agent game designed using the StarCraft II simulator. Our experimental results demonstrate that we can learn the reward functions associated with the tasks with an accuracy of over 70% while having access to just the suboptimal demonstrations. Reward functions learned from such data can then be used to predict an ideal assignment to get better performance than what has been seen in the training data.