数据集:
EleutherAI/lambada_openai
子任务:
language-modeling计算机处理:
translation大小:
1K<n<10K语言创建人:
machine-generated源数据集:
lambada许可:
mitThis dataset is comprised of the LAMBADA test split as pre-processed by OpenAI (see relevant discussions here and here ). It also contains machine translated versions of the split in German, Spanish, French, and Italian.
LAMBADA is used to evaluate the capabilities of computational models for text understanding by means of a word prediction task. LAMBADA is a collection of narrative texts sharing the characteristic that human subjects are able to guess their last word if they are exposed to the whole text, but not if they only see the last sentence preceding the target word. To succeed on LAMBADA, computational models cannot simply rely on local context, but must be able to keep track of information in the broader discourse.
English, German, Spanish, French, and Italian.
For non-English languages, the data splits were produced by Google Translate. See the translation_script.py for more details.
For data integrity checks we leave the following checksums for the files in this dataset:
File Name | Checksum (SHA-256) |
---|---|
lambada_test_de.jsonl | 51c6c1795894c46e88e4c104b5667f488efe79081fb34d746b82b8caa663865e |
openai/lambada_test.jsonl | 4aa8d02cd17c719165fc8a7887fddd641f43fcafa4b1c806ca8abc31fabdb226 |
lambada_test_en.jsonl | 4aa8d02cd17c719165fc8a7887fddd641f43fcafa4b1c806ca8abc31fabdb226 |
lambada_test_es.jsonl | ffd760026c647fb43c67ce1bc56fd527937304b348712dce33190ea6caba6f9c |
lambada_test_fr.jsonl | 941ec6a73dba7dc91c860bf493eb66a527cd430148827a4753a4535a046bf362 |
lambada_test_it.jsonl | 86654237716702ab74f42855ae5a78455c1b0e50054a4593fb9c6fcf7fad0850 |
License: Modified MIT
@article{radford2019language, title={Language Models are Unsupervised Multitask Learners}, author={Radford, Alec and Wu, Jeff and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya}, year={2019} }
@misc{ author={Paperno, Denis and Kruszewski, Germán and Lazaridou, Angeliki and Pham, Quan Ngoc and Bernardi, Raffaella and Pezzelle, Sandro and Baroni, Marco and Boleda, Gemma and Fernández, Raquel}, title={The LAMBADA dataset}, DOI={10.5281/zenodo.2630551}, publisher={Zenodo}, year={2016}, month={Aug} }
Thanks to Sid Black ( @sdtblck ) for translating the lambada_openai dataset into the non-English languages.
Thanks to Jonathan Tow ( @jon-tow ) for adding this dataset.