数据集:

mteb/summeval

语言:

en
中文

SummEval

The annotations include summaries generated by 16 models from 100 source news articles (1600 examples in total). Each of the summaries was annotated by 5 indepedent crowdsource workers and 3 independent experts (8 annotations in total). Summaries were evaluated across 4 dimensions: coherence, consistency, fluency, relevance. Each source news article comes with the original reference from the CNN/DailyMail dataset and 10 additional crowdsources reference summaries.

For this dataset, we averaged the 3 expert annotations to get the human scores.

source: https://github.com/Yale-LILY/SummEval