数据集:
SZTAKI-HLT/HunSum-1
The HunSum-1 Dataset is a Hungarian-language dataset containing over 1.1M unique news articles with lead and other metadata. The dataset contains articles from 9 major Hungarian news websites.
The HunSum-1 dataset has 3 splits: train , validation , and test .
| Dataset Split | Number of Instances in Split | 
|---|---|
| Train | 1,144,255 | 
| Validation | 1996 | 
| Test | 1996 | 
If you use our dataset, please cite the following paper:
@inproceedings {HunSum-1,
    title = {{HunSum-1: an Abstractive Summarization Dataset for Hungarian}},
    booktitle = {XIX. Magyar Számítógépes Nyelvészeti Konferencia (MSZNY 2023)},
    year = {2023},
    publisher = {Szegedi Tudományegyetem, Informatikai Intézet},
    address = {Szeged, Magyarország},
    author = {Barta, Botond and Lakatos, Dorina and Nagy, Attila and Nyist, Mil{\'{a}}n Konor and {\'{A}}cs, Judit},
    pages = {231--243}
}