数据集:
kinnews_kirnews
任务:
文本分类计算机处理:
monolingual语言创建人:
found批注创建人:
expert-generated源数据集:
original预印本库:
arxiv:2010.12174许可:
mitKinyarwanda and Kirundi news classification datasets (KINNEWS and KIRNEWS,respectively), which were both collected from Rwanda and Burundi news websites and newspapers, for low-resource monolingual and cross-lingual multiclass classification tasks.
This dataset can be used for text classification of news articles in Kinyarwadi and Kirundi languages. Each news article can be classified into one of the 14 possible classes. The classes are:
Kinyarwanda and Kirundi
Here is an example from the dataset:
Field | Value |
---|---|
label | 1 |
kin_label/kir_label | 'inkino' |
url | ' https://nawe.bi/Primus-Ligue-Imirwi-igiye-guhura-gute-ku-ndwi-ya-6-y-ihiganwa.html' |
title | 'Primus Ligue\xa0: Imirwi igiye guhura gute ku ndwi ya 6 y’ihiganwa\xa0?' |
content | ' Inkino zitegekanijwe kuruno wa gatandatu igenekerezo rya 14 Nyakanga umwaka wa 2019...' |
en_label | 'sport' |
The raw version of the data for Kinyarwanda language consists of these fields
The cleaned version contains only the label , title and the content fields
Lang | Train | Test |
---|---|---|
Kinyarwandai Raw | 17014 | 4254 |
Kinyarwandai Clean | 17014 | 4254 |
Kirundi Raw | 3689 | 923 |
Kirundi Clean | 3689 | 923 |
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
@article{niyongabo2020kinnews, title={KINNEWS and KIRNEWS: Benchmarking Cross-Lingual Text Classification for Kinyarwanda and Kirundi}, author={Niyongabo, Rubungo Andre and Qu, Hong and Kreutzer, Julia and Huang, Li}, journal={arXiv preprint arXiv:2010.12174}, year={2020} }
Thanks to @saradhix for adding this dataset.