Comparing Model Size and Attention Layer Design Impact on Question-Answer Tasks

In this project, we explore the use of various Neural Language Models applied to Question Answer tasks from the SQuAD dataset. We're specifically interested in exploring the transition from RNN-based models to transformer-based models. RNN Neural Language Models were dominant in language tasks for many years, but the introduction of the transformer demonstrated that the fall-backs of RNN models could be overcome by using architectures that optimize for larger, more parallelizable models. In this work, we compare the impacts of expanding model size with the impact of changing attention layer implementations using a Bi-Directional Attention Flow baseline model. We find that model size has a significantly greater impact on model performance on the SQuAD dataset, but larger models fail to improve performance on unanswerable question-answer examples.