数据集:
ucberkeley-dlab/measuring-hate-speech
任务:
文本分类语言:
en计算机处理:
monolingual批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:2009.10277许可:
cc-by-4.0This is a public release of the dataset described in Kennedy et al. (2020) and Sachdeva et al. (2022), consisting of 39,565 comments annotated by 7,912 annotators, for 135,556 combined rows. The primary outcome variable is the "hate speech score" but the 10 constituent ordinal labels (sentiment, (dis)respect, insult, humiliation, inferior status, violence, dehumanization, genocide, attack/defense, hate speech benchmark) can also be treated as outcomes. Includes 8 target identity groups (race/ethnicity, religion, national origin/citizenship, gender, sexual orientation, age, disability, political ideology) and 42 target identity subgroups, as well as 6 annotator demographics and 40 subgroups. The hate speech score incorporates an IRT adjustment by estimating variation in annotator interpretation of the labeling guidelines.
This dataset card is a work in progress and will be improved over time.
The dataset can be downloaded using the following python code:
import datasets dataset = datasets.load_dataset('ucberkeley-dlab/measuring-hate-speech', 'binary') df = dataset['train'].to_pandas() df.describe()
@article{kennedy2020constructing, title={Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application}, author={Kennedy, Chris J and Bacon, Geoff and Sahn, Alexander and von Vacano, Claudia}, journal={arXiv preprint arXiv:2009.10277}, year={2020} }
Dataset curated by @ck37 , @pssachdeva , et al.
Kennedy, C. J., Bacon, G., Sahn, A., & von Vacano, C. (2020). Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application . arXiv preprint arXiv:2009.10277.
Pratik Sachdeva, Renata Barreto, Geoff Bacon, Alexander Sahn, Claudia von Vacano, and Chris Kennedy. 2022. The Measuring Hate Speech Corpus: Leveraging Rasch Measurement Theory for Data Perspectivism . In Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022 , pages 83–94, Marseille, France. European Language Resources Association.