数据集:
clue
语言:
zh计算机处理:
monolingual大小:
100K<n<1M语言创建人:
other批注创建人:
other源数据集:
original许可:
license:unknownCLUE, A Chinese Language Understanding Evaluation Benchmark ( https://www.cluebenchmarks.com/ ) is a collection of resources for training, evaluating, and analyzing Chinese language understanding systems.
An example of 'validation' looks as follows.
{ "idx": 0, "label": 0, "sentence1": "双十一花呗提额在哪", "sentence2": "里可以提花呗额度" }c3
An example of 'train' looks as follows.
This example was too long and was cropped: { "answer": "比人的灵敏", "choice": ["没有人的灵敏", "和人的差不多", "和人的一样好", "比人的灵敏"], "context": "[\"许多动物的某些器官感觉特别灵敏,它们能比人类提前知道一些灾害事件的发生,例如,海洋中的水母能预报风暴,老鼠能事先躲避矿井崩塌或有害气体,等等。地震往往能使一些动物的某些感觉器官受到刺激而发生异常反应。如一个地区的重力发生变异,某些动物可能通过它们的平衡...", "id": 1, "question": "动物的器官感觉与人的相比有什么不同?" }chid
An example of 'train' looks as follows.
This example was too long and was cropped: { "answers": { "candidate_id": [3, 5, 6, 1, 7, 4, 0], "text": ["碌碌无为", "无所作为", "苦口婆心", "得过且过", "未雨绸缪", "软硬兼施", "传宗接代"] }, "candidates": "[\"传宗接代\", \"得过且过\", \"咄咄逼人\", \"碌碌无为\", \"软硬兼施\", \"无所作为\", \"苦口婆心\", \"未雨绸缪\", \"和衷共济\", \"人老珠黄\"]...", "content": "[\"谈到巴萨目前的成就,瓜迪奥拉用了“坚持”两个字来形容。自从上世纪90年代克鲁伊夫带队以来,巴萨就坚持每年都有拉玛西亚球员进入一队的传统。即便是范加尔时代,巴萨强力推出的“巴萨五鹰”德拉·佩纳、哈维、莫雷罗、罗杰·加西亚和贝拉乌桑几乎#idiom0000...", "idx": 0 }cluewsc2020
An example of 'train' looks as follows.
{ "idx": 0, "label": 1, "target": { "span1_index": 3, "span1_text": "伤口", "span2_index": 27, "span2_text": "它们" }, "text": "裂开的伤口涂满尘土,里面有碎石子和木头刺,我小心翼翼把它们剔除出去。" }cmnli
An example of 'train' looks as follows.
{ "idx": 0, "label": 0, "sentence1": "从概念上讲,奶油略读有两个基本维度-产品和地理。", "sentence2": "产品和地理位置是使奶油撇油起作用的原因。" }
The data fields are the same among all splits.
afqmcname | train | validation | test |
---|---|---|---|
afqmc | 34334 | 4316 | 3861 |
c3 | 11869 | 3816 | 3892 |
chid | 84709 | 3218 | 3231 |
cluewsc2020 | 1244 | 304 | 290 |
cmnli | 391783 | 12241 | 13880 |
@inproceedings{xu-etal-2020-clue, title = "{CLUE}: A {C}hinese Language Understanding Evaluation Benchmark", author = "Xu, Liang and Hu, Hai and Zhang, Xuanwei and Li, Lu and Cao, Chenjie and Li, Yudong and Xu, Yechen and Sun, Kai and Yu, Dian and Yu, Cong and Tian, Yin and Dong, Qianqian and Liu, Weitang and Shi, Bo and Cui, Yiming and Li, Junyi and Zeng, Jun and Wang, Rongzhao and Xie, Weijian and Li, Yanting and Patterson, Yina and Tian, Zuoyu and Zhang, Yiwen and Zhou, He and Liu, Shaoweihua and Zhao, Zhe and Zhao, Qipeng and Yue, Cong and Zhang, Xinrui and Yang, Zhengliang and Richardson, Kyle and Lan, Zhenzhong", booktitle = "Proceedings of the 28th International Conference on Computational Linguistics", month = dec, year = "2020", address = "Barcelona, Spain (Online)", publisher = "International Committee on Computational Linguistics", url = "https://aclanthology.org/2020.coling-main.419", doi = "10.18653/v1/2020.coling-main.419", pages = "4762--4772", }
Thanks to @thomwolf , @JetRunner for adding this dataset.