default search action
1st Eval4NLP 2020: Online
- Steffen Eger, Yang Gao, Maxime Peyrard, Wei Zhao, Eduard H. Hovy:
Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, Eval4NLP 2020, Online, November 20, 2020. Association for Computational Linguistics 2020, ISBN 978-1-952148-82-8 - Klaus-Michael Lux, Maya Sappelli, Martha A. Larson:
Truth or Error? Towards systematic analysis of factual errors in abstractive summaries. 1-10 - Oleg V. Vasilyev, Vedant Dharnidharka, John Bohannon:
Fill in the BLANC: Human-free quality estimation of document summaries. 11-20 - João Sedoc, Lyle H. Ungar:
Item Response Theory for Efficient Human Evaluation of Chatbots. 21-33 - Hwanhee Lee, Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Kyomin Jung:
ViLBERTScore: Evaluating Image Caption Using Vision-and-Language BERT. 34-39 - Kawin Ethayarajh, Dorsa Sadigh:
BLEU Neighbors: A Reference-less Approach to Automatic Evaluation. 40-50 - Xi Chen, Nan Ding, Tomer Levinboim, Radu Soricut:
Improving Text Generation Evaluation with Batch Centering and Tempered Word Mover Distance. 51-59 - Jacob Bremerman, Huda Khayrallah, Douglas W. Oard, Matt Post:
On the Evaluation of Machine Translation n-best Lists. 60-68 - Rahul Jha, Keping Bi, Yang Li, Mahdi Pakdaman, Asli Celikyilmaz, Ivan Zhiboedov, Kieran McDonald:
Artemis: A Novel Annotation Methodology for Indicative Single Document Summarization. 69-78 - Reda Yacouby, Dustin Axman:
Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models. 79-91 - Adam Poliak:
A survey on Recognizing Textual Entailment as an NLP Evaluation. 92-109 - Jingcheng Niu, Gerald Penn:
Grammaticality and Language Modelling. 110-119 - Jesper Brink Andersen, Mikkel Bak Bertelsen, Mikkel Hørby Schou, Manuel R. Ciosici, Ira Assent:
One of these words is not like the other: a reproduction of outlier identification using non-contextual word representations. 120-130 - Shiran Dudy, Steven Bedrick:
Are Some Words Worth More than Others? 131-142 - Kiril Gashteovski, Rainer Gemulla, Bhushan Kotnis, Sven Hertling, Christian Meilicke:
On Aligning OpenIE Extractions with Knowledge Bases: A Case Study. 143-154 - Hanna Wecker, Annemarie Friedrich, Heike Adel:
ClusterDataSplit: Exploring Challenging Clustering-Based Data Splits for Model Performance Evaluation. 155-163 - Neslihan Iskender, Tim Polzehl, Sebastian Möller:
Best Practices for Crowd-based Evaluation of German Summarization: Comparing Crowd, Expert and Automatic Evaluation. 164-175 - Nathan Stringham, Mike Izbicki:
Evaluating Word Embeddings on Low-Resource Languages. 176-186
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.