数据集:
ghomasHudson/vlsp
语言:
enDataset following the methodology of the scientific_papers dataset, but specifically designed for very long documents (>10,000 words). This is gathered from arxiv.org by searching for theses.
The dataset has 2 features:
Summarization
English
[Needs More Information]
[Needs More Information]
Only a test set is provided.
[Needs More Information]
[Needs More Information]
Who are the source language producers?[Needs More Information]
[Needs More Information]
Who are the annotators?[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]