数据集:

tweets_hate_speech_detection

任务:

文本分类

子任务:

sentiment-classification

语言:

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

crowdsourced

批注创建人:

crowdsourced

源数据集:

original

许可:

gpl-3.0

数据集介绍文件清单

中文

Dataset Card for Tweets Hate Speech Detection

Dataset Summary

The objective of this task is to detect hate speech in tweets. For the sake of simplicity, we say a tweet contains hate speech if it has a racist or sexist sentiment associated with it. So, the task is to classify racist or sexist tweets from other tweets.

Formally, given a training sample of tweets and labels, where label ‘1’ denotes the tweet is racist/sexist and label ‘0’ denotes the tweet is not racist/sexist, your objective is to predict the labels on the given test dataset.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

The tweets are primarily in English Language.

Dataset Structure

Data Instances

The dataset contains a label denoting is the tweet a hate speech or not

{'label': 0,  # not a hate speech
 'tweet': ' @user when a father is dysfunctional and is so selfish he drags his kids into his dysfunction.   #run'}

Data Fields

label : 1 - it is a hate speech, 0 - not a hate speech.
tweet: content of the tweet as a string.

Data Splits

The data contains training data with :31962 entries

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

Crowdsourced from tweets of users

Who are the source language producers?

Cwodsourced from twitter

Annotations

Annotation process

The data has been precprocessed and a model has been trained to assign the relevant label to the tweet

Who are the annotators?

The data has been provided by Roshan Sharma

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

With the help of this dataset, one can understand more about the human sentiments and also analye the situations when a particular person intends to make use of hatred/racist comments

Discussion of Biases

The data could be cleaned up further for additional purposes such as applying a better feature extraction techniques

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

Roshan Sharma

Licensing Information

Information

Citation Information

Citation

Contributions

Thanks to @darshan-gandhi for adding this dataset.

作者:

佚名

数据集大小:

11.47 KB