Top-scoring System outputs for Summarization

Full Version Separate Version Format
Cover more than more than 30 top-scoring summarization systems (GSum, BART, T5, UniLM) on two popular datasets (CNNDM, XSum)
The generated summaries of all systems have been aligned to corresponding references and source documents.
All texts have been pre-processed in the same way, which allows users to make fair comparisons over them.
Researchers can collect human judgments on there top-performing systems and re-evaluate the reliabiity of existing evaluation metrics (ROUGE, BERTScore).
Researchers can systematically investigate how well current top-scoring systems generate factually correct summaries.
Researchers can explore the potential complementarity among different systems.