数据集:
ptb_text_only
语言:
en计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
expert-generated源数据集:
original许可:
otherThis is the Penn Treebank Project: Release 2 CDROM, featuring a million words of 1989 Wall Street Journal material. The rare words in this version are already replaced with token. The numbers are replaced with token.
Language Modelling
The text in the dataset is in American English
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
Who are the source language producers?[Needs More Information]
[Needs More Information]
Who are the annotators?[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
Dataset provided for research purposes only. Please check dataset license for additional information.
@article{marcus-etal-1993-building, title = "Building a Large Annotated Corpus of {E}nglish: The {P}enn {T}reebank", author = "Marcus, Mitchell P. and Santorini, Beatrice and Marcinkiewicz, Mary Ann", journal = "Computational Linguistics", volume = "19", number = "2", year = "1993", url = " https://www.aclweb.org/anthology/J93-2004" , pages = "313--330", }
Thanks to @harshalmittal4 for adding this dataset.