This model is a fine-tuned version of PlanTL-GOB-ES/roberta-large-bne on an Spanish Fake News Dataset .
It achieves the following results on the evaluation set:
So, based on the leaderboard our model outperforms the best model (scores F1 = 0.7666).
RoBERTa-large-bne is a transformer-based masked language model for the Spanish language. It is based on the RoBERTa large model and has been pre-trained using the largest Spanish corpus known to date, with a total of 570GB of clean and deduplicated text processed for this work, compiled from the web crawlings performed by the National Library of Spain (Biblioteca Nacional de España) from 2009 to 2019.
The objective of this task is to decide if a news item is fake or real by analyzing its textual representation.
FakeDeS : Fake News Detection in Spanish Shared Task
Fake news provides information that aims to manipulate people for different purposes: terrorism, political elections, advertisement, satire, among others. In social networks, misinformation extends in seconds among thousands of people, so it is necessary to develop tools that help control the amount of false information on the web. Similar tasks are detection of popularity in social networks and detection of subjectivity of messages in this media. A fake news detection system aims to help users detect and filter out potentially deceptive news. The prediction of intentionally misleading news is based on the analysis of truthful and fraudulent previously reviewed news, i.e., annotated corpora.
The Spanish Fake News Corpus is a collection of news compiled from several web sources: established newspapers websites,media companies websites, special websites dedicated to validating fake news, websites designated by different journalists as sites that regularly publish fake news. The news were collected from January to July of 2018 and all of them were written in Mexican Spanish.
The corpus has 971 news collected from January to July, 2018, from different sources:
The corpus was tagged considering only two classes (true or fake), following a manual labeling process:
The training corpus contains the following information:
Category: Fake/ True
Topic: Science/ Sport/ Economy/ Education/ Entertainment/ Politics, Health/ Security/ Society
Headline: The title of the news.
Text: The complete text of the news.
Link: The URL where the news was published.
More information needed
TBA
The following hyperparameters were used during training:
Training Loss | Epoch | Step | Validation Loss | F1 | Accuracy |
---|---|---|---|---|---|
No log | 1.0 | 243 | 0.6282 | 0.7513 | 0.75 |
No log | 2.0 | 486 | 0.9600 | 0.7346 | 0.7587 |
0.5099 | 3.0 | 729 | 1.2128 | 0.7656 | 0.7570 |
0.5099 | 4.0 | 972 | 1.4001 | 0.7606 | 0.7622 |
0.1949 | 5.0 | 1215 | 1.9748 | 0.6475 | 0.7220 |
0.1949 | 6.0 | 1458 | 1.7386 | 0.7706 | 0.7710 |
0.0263 | 7.0 | 1701 | 1.7474 | 0.7717 | 0.7797 |
0.0263 | 8.0 | 1944 | 1.8114 | 0.7695 | 0.7780 |
0.0046 | 9.0 | 2187 | 1.8444 | 0.7709 | 0.7797 |
0.0046 | 10.0 | 2430 | 1.8552 | 0.7709 | 0.7797 |
from transformers import pipeline ckpt = "Narrativaai/fake-news-detection-spanish" classifier = pipeline("text-classification", model=ckpt) headline = "Your headline" text = "Your article text here..." classifier(headline + " [SEP] " + text)
Created by: Narrativa
About Narrativa: Natural Language Generation (NLG) | Gabriele, our machine learning-based platform, builds and deploys natural language solutions. #NLG #AI