Task-Adaptive Pretraining, Domain Sampling, and Data Augmentation Improve Generalized Question Answering

To create a deep-learning question answering (QA) system that generalizes to unseen domains, we investigate the use of three techniques: task-adaptive pretraining (TAPT), domain sampling, and data augmentation. We train a single DistilBERT model in three phases (shown in the flowchart). First, during TAPT, we pretrain with masked-language modeling (MLM) on our QA datasets. Second, we fine-tune on our QA data. We employ domain sampling during both pretraining and fine-tuning, which preferably samples data that lead to better downstream performance. Finally, for our data augmentations, we use synonym replacement and random deletion to increase the size and variety of our out-domain data, before additionally fine-tuning on these augmented data. During evaluation, we found significant EM/F1 performance improvements by fine-tuning on augmented out-domain data. We found modest, yet non-trivial, performance improvements with TAPT and domain sampling. Using these three techniques, our model achieved EM/F1 scores of 37.44/51.37 on the development set and 40.12/58.05 on the test set.