数据集:

persiannlp/parsinlu_sentiment

子任务:

sentiment-analysis

语言:

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

expert-generated

批注创建人:

expert-generated

源数据集:

extended|translated|mnli

预印本库:

arxiv:2012.06154

许可:

cc-by-nc-sa-4.0

数据集介绍文件清单

中文

Dataset Card for PersiNLU (Textual Entailment)

Dataset Summary

A Persian sentiment analysis dataset.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

The text dataset is in Persian ( fa ).

Dataset Structure

Data Instances

Here is an example from the dataset:

{
  "review": "خوب بود ولی خیلی گرون شده دیگه...فک نکنم به این قیمت ارزش خرید داشته باشد",
  "review_id": "1538",
  "example_id": "4",
  "excel_id": "food_194",
  "question": "نظر شما در مورد بسته بندی و نگهداری این حلوا شکری، ارده و کنجد چیست؟",
  "category": "حلوا شکری، ارده و کنجد",
  "aspect": "بسته بندی",
  "label": "-3",
  "guid": "food-dev-r1538-e4"
}

Data Fields

review : the review text.
review_id : a unique id associated with the review.
example_id : a unique id associated with a particular attribute being addressed about the review.
question : a natural language question about a particular attribute.
category : the subject discussed in the review.
aspect : the aspect mentioned in the input question.
label : the overall sentiment towards this particular subject, in the context of the mentioned aspect. Here are the definition of the labels:

    '-3': 'no sentiment expressed',
    '-2': 'very negative',
    '-1': 'negative',
    '0': 'neutral',
    '1': 'positive',
    '2': 'very positive',
    '3': 'mixed',

Data Splits

See the data.

Dataset Creation

Curation Rationale

For details, check the corresponding draft .

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

CC BY-NC-SA 4.0 License

Citation Information

@article{huggingface:dataset,
    title = {ParsiNLU: A Suite of Language Understanding Challenges for Persian},
    authors = {Khashabi, Daniel and Cohan, Arman and Shakeri, Siamak and Hosseini, Pedram and Pezeshkpour, Pouya and Alikhani, Malihe and Aminnaseri, Moin and Bitaab, Marzieh and Brahman, Faeze and Ghazarian, Sarik and others},
    year={2020}
    journal = {arXiv e-prints},
    eprint = {2012.06154},    
}

Contributions

Thanks to @danyaljj for adding this dataset.

作者:

persiannlp

数据集大小:

14.39 KB