ExplainaBoard For Natural Language Processing

Top-scoring System outputs for Summarization

What are the advantages of these datasets?

SOTAs Well-aligned Unified Pre-processing

Cover more than more than 30 top-scoring summarization systems (GSum, BART, T5, UniLM) on two popular datasets (CNNDM, XSum)

The generated summaries of all systems have been aligned to corresponding references and source documents.

All texts have been pre-processed in the same way, which allows users to make fair comparisons over them.

What's the use of these datasets?

Human Judgment & Metric Meta-Evaluation Factuality Analysis System Combination

Researchers can collect human judgments on there top-performing systems and re-evaluate the reliabiity of existing evaluation metrics (ROUGE, BERTScore).
Papers

Researchers can systematically investigate how well current top-scoring systems generate factually correct summaries.
Papers

Researchers can explore the potential complementarity among different systems.
Papers