数据集:
offenseval2020_tr
任务:
语言:
计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
found源数据集:
original许可:
The file offenseval-tr-training-v1.tsv contains 31,756 annotated tweets.
The file offenseval-annotation.txt contains a short summary of the annotation guidelines.
Twitter user mentions were substituted by @USER and URLs have been substitute by URL.
Each instance contains up to 1 labels corresponding to one of the following sub-task:
The dataset was published on this paper .
The dataset is based on Turkish.
A binary dataset with with (NOT) Not Offensive and (OFF) Offensive tweets.
Instances are included in TSV format as follows:
ID INSTANCE SUBA
The column names in the file are the following:
id tweet subtask_a
The labels used in the annotation are listed below.
Task and Labels(A) Sub-task A: Offensive language identification
In our annotation, we label a post as offensive (OFF) if it contains any form of non-acceptable language (profanity) or a targeted offense, which can be veiled or direct.
| train | test | 
|---|---|
| 31756 | 3528 | 
[More Information Needed]
[More Information Needed]
Initial Data Collection and Normalization[More Information Needed]
Who are the source language producers?From tweeter.
[More Information Needed]
Annotation processWe describe the labels above in a “flat” manner. However, the annotation process we follow is hierarchical. The following QA pairs give a more flowchart-like procedure to follow
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
The annotations are distributed under the terms of Creative Commons Attribution License (CC-BY) . Please cite the following paper, if you use this resource.
@inproceedings{coltekin2020lrec,
 author  = {\c{C}\"{o}ltekin, \c{C}a\u{g}r{\i}},
 year  = {2020},
 title  = {A Corpus of Turkish Offensive Language on Social Media},
 booktitle  = {Proceedings of The 12th Language Resources and Evaluation Conference},
 pages  = {6174--6184},
 address  = {Marseille, France},
 url  = {https://www.aclweb.org/anthology/2020.lrec-1.758},
}
 Thanks to @yavuzKomecoglu for adding this dataset.