Question Answering with Co-attention and Transformer
in this project, we implemented several improvements of question answering system based on SQuAD database including: 1) QANet 2) coattention 3) RNet. We built the models from scratch and evaluated against the EM and F1 scores. Our main goal is to explore through various techniques in the Question Answering System. In this process, we were able to practice our skills of implementing complex models according to their descriptions in literatures.
We first implemented the co-attention layer, which did not improve the model performance. We then added character-level embeddings to the baseline model which improved the EM score to 60.59 and F1 score to 64.17.
After that we implemented QANet which used convolutions to capture the local structure of the context and self-attention mechanism to model the global interactions between text. We built the QANet incrementally and implemented several model components. We eventually saw major improvements in both EM and F1 scores (64.49 and 69.62) compared to the baseline BiDAF model and BiDAF with character-level embeddings.
At the same time, we implemented the Self Matching layer and the Pointer Network described in the RNet paper. The self-matching mechanism helps refine the attention representation by matching the passage against itself, which effectively encodes information from the whole passage. This is implemented on the top of character-level embeddings and the baseline. We tested several modifications of the RNet architecture including different gate attention recurrent network and output layer. While Self Matching improved the performance, the Pointer Network caused vanishing gradients. The self-matching layer combined with character-level embeddings improved the performance to 62.06(EM) and 65.53(F1).
Among all techniques, QANet gives the best performance, and to our understanding, the reason is that the QANet can capture the local and global interaction at the same time with its complex model architecture containing both convolutions and attention-mechanism.