Robust QA with Model Agnostic Meta Learning

One model, called BERT (Bidirectional Encoder Representations from Transformers), has achieved current state-of-the-art on metrics such as GLUE score, MultiNLI accuracy, and F1 score on the SQuAD v1.1 and v2.0 question answering datasets. BERT is pre-trained using unlabeled natural language data via a masked language model (MLM) method, it is then fine-tuned for next- sentence prediction and question answering tasks. Successfully adapting BERT to low-reource natural language domains remains an open problem. Previous approaches have included using multitask and meta-learning fine-tuning procedures. Using a variant of the Model Agnostic Meta Learning (MAML) algorithm from, researchers were able to show that meta learning procedures had a slight advantage in low-resource domain adaptation than multitask models. However the researchers experimented with only a few task distributions p(T) for the MAML algorithm, and while the results did show an improvement over multitask models, performance for certain task distributions on specific tasks was somewhat counterintuitive. In this paper, suggestions from a recent paper in the International Conference on Learning Representations (ICLR) are implemented to stabilize training of a MAML-type algorithm on a pre-trained variant of BERT called DistilBERT. Several task distributions and other MAML-specific hyperparameter initializations are implemented and analyzed and a classifier is trained to predict out-of-domain dataset type to better leverage task-specific fine-tuning. The image included indicates that certain tasks, like predicting for the race and relation extraction datasets, are distinguishable and that a MAML algorithm might not be able to leverage data from one to help the other. However, another task, like predicting on the duorc dataset that is shown to be fairly indistinguishable from the other two datasets, might be able to help the other two tasks out during training.