数据集:
kan_hope
任务:
文本分类计算机处理:
multilingual大小:
1K<n<10K语言创建人:
crowdsourced批注创建人:
expert-generated源数据集:
original预印本库:
arxiv:2108.04616许可:
cc-by-4.0KanHope dataset is a code-mixed Kannada-English dataset for hope speech detection. All texts are scraped from the comments section of YouTube. The dataset consists of 6,176 user-generated comments in code mixed Kannada scraped from YouTube and manually annotated as bearing hope speech or Not-hope speech.
This task aims to detect Hope speech content of the code-mixed dataset of comments/posts in Dravidian Languages ( Kannada-English) collected from social media. The comment/post may contain more than one sentence, but the average sentence length of the corpora is 1. Each comment/post is annotated at the comment/post level. This dataset also has class imbalance problems depicting real-world scenarios.
Code-mixed text in Dravidian languages (Kannada-English).
An example from the Kannada dataset looks as follows:
text | label |
---|---|
��������� ��ͭ� heartly heltidini... plz avrigella namma nimmellara supprt beku | 0 (Non_hope speech) |
Next song gu kuda alru andre evaga yar comment madidera alla alrru like madi share madi nam industry na next level ge togond hogaona. | 1 (Hope Speech) |
Kannada
train | validation | test | |
---|---|---|---|
Kannada | 4941 | 618 | 617 |
Numerous methods have been developed to monitor the spread of negativity in modern years by eliminating vulgar, offensive, and fierce comments from social media platforms. However, there are relatively lesser amounts of study that converges on embracing positivity, reinforcing supportive and reassuring content in online forums.
[Needs More Information]
Who are the source language producers?Youtube users
[Needs More Information]
Who are the annotators?[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
@misc{hande2021hope, title={Hope Speech detection in under-resourced Kannada language}, author={Adeep Hande and Ruba Priyadharshini and Anbukkarasi Sampath and Kingston Pal Thamburaj and Prabakaran Chandran and Bharathi Raja Chakravarthi}, year={2021}, eprint={2108.04616}, archivePrefix={arXiv}, primaryClass={cs.CL} }
Thanks to @adeepH for adding this dataset.