Reward and Diversity in Multirobot Foraging
This research seeks to quantify the impact of the choice of reward function on behavioral diversity in learning robot teams. The methodology developed for this work has been applied to multirobot foraging, soccer and cooperative movement. This paper focuses specifically on results in multirobot foraging. In these experiments three types of reward are used with Q-learning to train a multirobot team to forage: a local performance-based reward, a global performance-based reward, and a heuristic strategy referred to as shaped reinforcement. Local strategies provide each agent a specific reward according to its own behavior, while global rewards provide all the agents on the team the same reward simultaneously. Shaped reinforcement provides a heuristic reward for an agent's action given its situation. The experiments indicate that local performance-based rewards and shaped reinforcement generate statistically similar results: they both provide the best performance and the least diversity. Finally, learned policies are demonstrated on a team of Nomadic Technologies' Nomad-150 robots.