数据集:
fake_news_filipino
任务:
文本分类子任务:
fact-checking语言:
tl计算机处理:
monolingual大小:
1K<n<10K语言创建人:
crowdsourced批注创建人:
expert-generated源数据集:
original许可:
license:unknownLow-Resource Fake News Detection Corpora in Filipino. The first of its kind. Contains 3,206 expertly-labeled news samples, half of which are real and half of which are fake.
[More Information Needed]
The dataset is primarily in Filipino, with the addition of some English words commonly used in Filipino vernacular.
Sample data:
{ "label": "0", "article": "Sa 8-pahinang desisyon, pinaboran ng Sandiganbayan First Division ang petition for Writ of Preliminary Attachment/Garnishment na inihain ng prosekusyon laban sa mambabatas." }
[More Information Needed]
[More Information Needed]
Fake news articles were sourced from online sites that were tagged as fake news sites by the non-profit independent media fact-checking organization Verafiles and the National Union of Journalists in the Philippines (NUJP). Real news articles were sourced from mainstream news websites in the Philippines, including Pilipino Star Ngayon, Abante, and Bandera.
We remedy the lack of a proper, curated benchmark dataset for fake news detection in Filipino by constructing and producing what we call “Fake News Filipino.”
We construct the dataset by scraping our source websites, encoding all characters into UTF-8. Preprocessing was light to keep information intact: we retain capitalization and punctuation, and do not correct any misspelled words.
Who are the source language producers?Jan Christian Blaise Cruz, Julianne Agatha Tan, and Charibeth Cheng
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Jan Christian Cruz , Julianne Agatha Tan, and Charibeth Cheng
[More Information Needed]
@inproceedings{cruz2020localization, title={Localization of Fake News Detection via Multitask Transfer Learning}, author={Cruz, Jan Christian Blaise and Tan, Julianne Agatha and Cheng, Charibeth}, booktitle={Proceedings of The 12th Language Resources and Evaluation Conference}, pages={2596--2604}, year={2020} }
Thanks to @anaerobeth for adding this dataset.