Meta-learning with few-shot models Analysis Final Project

This project focuses on understanding the various elements of Meta-learning and few-shot models and the effectiveness of the different detailed implementation approaches. Using the default RobustQA project as a baseline, we explored the different implementations of the Meta-learning algorithm, LEOPARD, and evaluate the impact on performance of the prediction accuracy. We have also experimented with the eval-every parameter to understand how fast each implementation can learn when presented with the out of domain questions initially. We found that the multiple datasets implementation of the Leopard algorithm yields the best few-shot result. On the first evaluation at step 0 (after 1 batch of data for learning) this implementation already achieving a result of a EM score of 34.55 (on the validation set) compared to the ~32 EM scores that the other implementation and the baseline are getting. However, after the model is trained for a longer time, we found that the baseline can actually achieve a better EM score overall with 42.202 on the test set. Although, the difference in the overall accuracy of the test set score are very small for different implementations, we found the more simple implementation yields better accuracy in the long run. Our key finding is that the design of few-shot learning algorithm or model is actually a trade off between few-shot accuracy and the overall highest achievable accuracy.