Google Scholar

Question answering as an automatic evaluation metric for news article summarization

M Eyal, T Baumel, M Elhadad - arXiv preprint arXiv:1906.00318, 2019 - arxiv.org

arXiv preprint arXiv:1906.00318, 2019•arxiv.org

Recent work in the field of automatic summarization and headline generation focuses on
maximizing ROUGE scores for various news datasets. We present an alternative, extrinsic,
evaluation metric for this task, Answering Performance for Evaluation of Summaries. APES
utilizes recent progress in the field of reading-comprehension to quantify the ability of a
summary to answer a set of manually created questions regarding central entities in the
source article. We first analyze the strength of this metric by comparing it to known manual …

Recent work in the field of automatic summarization and headline generation focuses on maximizing ROUGE scores for various news datasets. We present an alternative, extrinsic, evaluation metric for this task, Answering Performance for Evaluation of Summaries. APES utilizes recent progress in the field of reading-comprehension to quantify the ability of a summary to answer a set of manually created questions regarding central entities in the source article. We first analyze the strength of this metric by comparing it to known manual evaluation metrics. We then present an end-to-end neural abstractive model that maximizes APES, while increasing ROUGE scores to competitive results.

arxiv.org

Show moreShow less

Save Cite Cited by 131 Related articles All 5 versions View as HTML

Cite

Advanced search

Saved to My library

Question answering as an automatic evaluation metric for news article summarization