Evaluating BERT on Question Exhaustivity Inference

img
We as pragmatic listeners are able to infer the intended exhaustivity of a question, even when the question is semantically underspecified. For example, if somebody were to ask you "Where can I get coffee around here?", you would probably answer by providing the names of a few nearby coffee shops. Even though it wasn't specified anywhere in the question, you would know that the questioner most likely doesn't want an exhaustive list of every coffee shop in the area. In this work, we explore the extent to which BERT can learn those kind of exhaustivity judgements. We first show that BERT, finetuned on a small dataset of questions and human judgements of exhaustivity, can predict these judgements with high accuracy (r = 0.65, r = 0.59, r = 0.76 and r = 0.7 across each category of interest). We also provide evidence that the model learns associations between the linguistic features that previous works have identified as influencing how people perceive exhaustivity judgement, and the magnitude of the judgement.