Code-Upload AI Challenges on EvalAI
MetadataShow full item record
Artificial intelligence develops techniques and systems whose performance must be evaluated on a regular basis in order to certify and foster progress in the discipline. We have developed several tools such as EvalAI which helps us in evaluating the performance of these systems and to push the frontiers of machine learning and artificial intelligence. Initially, the AI community focussed on simple and traditional methods of evaluating these systems in the form of prediction upload challenges but with the advent of deep learning, larger datasets, and complex AI agents, etc. these methods are not sufficient for evaluation. A technique to evaluate these AI agents is by uploading their code, running it on the sequestered test dataset, and reporting the results on the leaderboard. In this work, we introduced code upload evaluation of AI agents on EvalAI for all kinds of AI tasks, i.e.reinforcement learning, supervised learning, and unsupervised learning. We offer features such as scalable backend, prioritized submission evaluation, secure test environment, and running AI agents code in an isolated sanitized environment. The end-to-end pipeline is extremely flexible, modular, and portable which can later be extended to multi-agents setups and evaluation on dynamic datasets. We also proposed a procedure using GitHub for AI challenge creation to version, maintain, and reduce the friction in this conglomerate process. Finally, we focused on providing analytics to all the users of the platform along with easing the hosting of EvalAI on private servers as an internal evaluation platform.