数据集:
id_clickbait
任务:
文本分类子任务:
fact-checking语言:
id计算机处理:
monolingual大小:
10K<n<100K语言创建人:
expert-generated批注创建人:
expert-generated源数据集:
original许可:
cc-by-4.0The CLICK-ID dataset is a collection of Indonesian news headlines that was collected from 12 local online news publishers; detikNews, Fimela, Kapanlagi, Kompas, Liputan6, Okezone, Posmetro-Medan, Republika, Sindonews, Tempo, Tribunnews, and Wowkeren. This dataset is comprised of mainly two parts; (i) 46,119 raw article data, and (ii) 15,000 clickbait annotated sample headlines. Annotation was conducted with 3 annotator examining each headline. Judgment were based only on the headline. The majority then is considered as the ground truth. In the annotated sample, our annotation shows 6,290 clickbait and 8,710 non-clickbait.
[More Information Needed]
Indonesian
An example of the annotated article:
{ 'id': '100', 'label': 1, 'title': "SAH! Ini Daftar Nama Menteri Kabinet Jokowi - Ma'ruf Amin" } >
The dataset contains train set.
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Creative Commons Attribution 4.0 International license
@article{WILLIAM2020106231, title = "CLICK-ID: A novel dataset for Indonesian clickbait headlines", journal = "Data in Brief", volume = "32", pages = "106231", year = "2020", issn = "2352-3409", doi = "https://doi.org/10.1016/j.dib.2020.106231", url = "http://www.sciencedirect.com/science/article/pii/S2352340920311252", author = "Andika William and Yunita Sari", keywords = "Indonesian, Natural Language Processing, News articles, Clickbait, Text-classification", abstract = "News analysis is a popular task in Natural Language Processing (NLP). In particular, the problem of clickbait in news analysis has gained attention in recent years [1, 2]. However, the majority of the tasks has been focused on English news, in which there is already a rich representative resource. For other languages, such as Indonesian, there is still a lack of resource for clickbait tasks. Therefore, we introduce the CLICK-ID dataset of Indonesian news headlines extracted from 12 Indonesian online news publishers. It is comprised of 15,000 annotated headlines with clickbait and non-clickbait labels. Using the CLICK-ID dataset, we then developed an Indonesian clickbait classification model achieving favourable performance. We believe that this corpus will be useful for replicable experiments in clickbait detection or other experiments in NLP areas." }
Thanks to @cahya-wirawan for adding this dataset.