数据集:
tweets_hate_speech_detection
任务:
文本分类语言:
en计算机处理:
monolingual大小:
10K<n<100K语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
original许可:
gpl-3.0The objective of this task is to detect hate speech in tweets. For the sake of simplicity, we say a tweet contains hate speech if it has a racist or sexist sentiment associated with it. So, the task is to classify racist or sexist tweets from other tweets.
Formally, given a training sample of tweets and labels, where label ‘1’ denotes the tweet is racist/sexist and label ‘0’ denotes the tweet is not racist/sexist, your objective is to predict the labels on the given test dataset.
[More Information Needed]
The tweets are primarily in English Language.
The dataset contains a label denoting is the tweet a hate speech or not
{'label': 0, # not a hate speech 'tweet': ' @user when a father is dysfunctional and is so selfish he drags his kids into his dysfunction. #run'}
The data contains training data with :31962 entries
[More Information Needed]
Crowdsourced from tweets of users
Who are the source language producers?Cwodsourced from twitter
The data has been precprocessed and a model has been trained to assign the relevant label to the tweet
Who are the annotators?The data has been provided by Roshan Sharma
[More Information Needed]
With the help of this dataset, one can understand more about the human sentiments and also analye the situations when a particular person intends to make use of hatred/racist comments
The data could be cleaned up further for additional purposes such as applying a better feature extraction techniques
[More Information Needed]
[More Information Needed]
Roshan Sharma
Thanks to @darshan-gandhi for adding this dataset.