数据集:
kor_sarcasm
许可:
mit源数据集:
original批注创建人:
expert-generated语言创建人:
found大小:
1K<n<10K计算机处理:
monolingual语言:
ko任务:
文本分类The Korean Sarcasm Dataset was created to detect sarcasm in text, which can significantly alter the original meaning of a sentence. 9319 tweets were collected from Twitter and labeled for sarcasm or not_sarcasm . These tweets were gathered by querying for: 역설, 아무말, 운수좋은날, 笑, 뭐래 아닙니다, 그럴리없다, 어그로, irony sarcastic, and sarcasm . The dataset was pre-processed by removing the keyword hashtag, urls and mentions of the user to maintain anonymity.
The text in the dataset is in Korean and the associated is BCP-47 code is ko-KR .
An example data instance contains a Korean tweet and a label whether it is sarcastic or not. 1 maps to sarcasm and 0 maps to no sarcasm.
{ "tokens": "[ 수도권 노선 아이템 ] 17 . 신분당선의 #딸기 : 그의 이미지 컬러 혹은 머리 색에서 유래한 아이템이다 . #메트로라이프" "label": 0 }
The data is split into a training set comrpised of 9018 tweets and a test set of 301 tweets.
[More Information Needed]
The dataset was created by gathering HTML data from Twitter. Queries for hashtags that include sarcasm and variants of it were used to return tweets. It was preprocessed by removing the keyword hashtag, urls and mentions of the user to preserve anonymity.
Who are the source language producers?The source language producers are Korean Twitter users.
Tweets were labeled 1 for sarcasm and 0 for no sarcasm.
Who are the annotators?[More Information Needed]
Mentions of the user in a tweet were removed to keep them anonymous.
[More Information Needed]
[More Information Needed]
[More Information Needed]
This dataset was curated by Dionne Kim.
This dataset is licensed under the MIT License.
@misc{kim2019kocasm, author = {Kim, Jiwon and Cho, Won Ik}, title = {Kocasm: Korean Automatic Sarcasm Detection}, year = {2019}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/SpellOnYou/korean-sarcasm}} }
Thanks to @stevhliu for adding this dataset.