数据集:
juletxara/xstory_cloze
计算机处理:
multilingual大小:
1K<n<10K批注创建人:
found源数据集:
extended|story_cloze预印本库:
arxiv:2112.10668许可:
cc-by-sa-4.0XStoryCloze consists of the professionally translated version of the English StoryCloze dataset (Spring 2016 version) to 10 non-English languages. This dataset is released by Meta AI.
commonsense reasoning
en, ru, zh (Simplified), es (Latin America), ar, hi, id, te, sw, eu, my.
An example of 'train' looks as follows.
{'answer_right_ending': 1, 'input_sentence_1': 'Rick grew up in a troubled household.', 'input_sentence_2': 'He never found good support in family, and turned to gangs.', 'input_sentence_3': "It wasn't long before Rick got shot in a robbery.", 'input_sentence_4': 'The incident caused him to turn a new leaf.', 'sentence_quiz1': 'He is happy now.', 'sentence_quiz2': 'He joined a gang.', 'story_id': '138d5bfb-05cc-41e3-bf2c-fa85ebad14e2'}
The data fields are the same among all splits.
This dataset is intended to be used for evaluating the zero- and few-shot learning capabilities of multlingual language models. We split the data for each language into train and test (360 vs. 1510 examples, respectively). The released data files for different languages maintain a line-by-line alignment.
name | train | test |
---|---|---|
en | 360 | 1510 |
ru | 360 | 1510 |
zh | 360 | 1510 |
es | 360 | 1510 |
ar | 360 | 1510 |
hi | 360 | 1510 |
id | 360 | 1510 |
te | 360 | 1510 |
sw | 360 | 1510 |
eu | 360 | 1510 |
my | 360 | 1510 |
XStoryCloze is opensourced under CC BY-SA 4.0 , the same license as the original English StoryCloze.
@article{DBLP:journals/corr/abs-2112-10668, author = {Xi Victoria Lin and Todor Mihaylov and Mikel Artetxe and Tianlu Wang and Shuohui Chen and Daniel Simig and Myle Ott and Naman Goyal and Shruti Bhosale and Jingfei Du and Ramakanth Pasunuru and Sam Shleifer and Punit Singh Koura and Vishrav Chaudhary and Brian O'Horo and Jeff Wang and Luke Zettlemoyer and Zornitsa Kozareva and Mona T. Diab and Veselin Stoyanov and Xian Li}, title = {Few-shot Learning with Multilingual Language Models}, journal = {CoRR}, volume = {abs/2112.10668}, year = {2021}, url = {https://arxiv.org/abs/2112.10668}, eprinttype = {arXiv}, eprint = {2112.10668}, timestamp = {Tue, 04 Jan 2022 15:59:27 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-2112-10668.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Thanks to @juletx .