数据集:
SZTAKI-HLT/HunSum-1
The HunSum-1 Dataset is a Hungarian-language dataset containing over 1.1M unique news articles with lead and other metadata. The dataset contains articles from 9 major Hungarian news websites.
The HunSum-1 dataset has 3 splits: train , validation , and test .
Dataset Split | Number of Instances in Split |
---|---|
Train | 1,144,255 |
Validation | 1996 |
Test | 1996 |
If you use our dataset, please cite the following paper:
@inproceedings {HunSum-1, title = {{HunSum-1: an Abstractive Summarization Dataset for Hungarian}}, booktitle = {XIX. Magyar Számítógépes Nyelvészeti Konferencia (MSZNY 2023)}, year = {2023}, publisher = {Szegedi Tudományegyetem, Informatikai Intézet}, address = {Szeged, Magyarország}, author = {Barta, Botond and Lakatos, Dorina and Nagy, Attila and Nyist, Mil{\'{a}}n Konor and {\'{A}}cs, Judit}, pages = {231--243} }