CS224N Default Final Project Report: Building a QA System Using BiDAF and Subword Modeling Techniques

In our project, we attempted to answer the question: How can we best adapt a baseline Bi-Directional Attention Flow (BiDAF) network to answer questions in the SQuAD dataset? Our baseline model achieved 57.54 EM and 60.90 F1 in the dev set. Based on this, we experimented with concatenating character embeddings with word embeddings and other forms of subword modeling, such as manually constructing a subword vocabulary of size 10,000 by using the Byte-Pair Encoding algorithm and splitting words into subwords. We found that using our subword embedding layer actually decreased performance, likely to due confusion generated when encountering out of vocabulary words. Our final system and best-performing model is the BiDAF network with the character embedding layer, where character and word embeddings are concatenated in equal part (50/50). Our best results achieved 60.595 EM and 63.587 F1 on the dev set and 59.222 EM and 62.662 F1 on the test set.