数据集:

allegro_reviews

任务:

文本分类

子任务:

sentiment-scoring text-scoring

语言:

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

found

批注创建人:

found

源数据集:

original

许可:

cc-by-sa-4.0

数据集介绍文件清单

中文

Dataset Card for [Dataset Name]

Dataset Summary

Allegro Reviews is a sentiment analysis dataset, consisting of 11,588 product reviews written in Polish and extracted from Allegro.pl - a popular e-commerce marketplace. Each review contains at least 50 words and has a rating on a scale from one (negative review) to five (positive review).

We recommend using the provided train/dev/test split. The ratings for the test set reviews are kept hidden. You can evaluate your model using the online evaluation tool available on klejbenchmark.com.

Supported Tasks and Leaderboards

Product reviews sentiment analysis. https://klejbenchmark.com/leaderboard/

Languages

Polish

Dataset Structure

Data Instances

Two tsv files (train, dev) with two columns (text, rating) and one (test) with just one (text).

Data Fields

text: a product review of at least 50 words
rating: product rating of a scale of one (negative review) to five (positive review)

Data Splits

Data is splitted in train/dev/test split.

Dataset Creation

Curation Rationale

This dataset is one of nine evaluation tasks to improve polish language processing.

Source Data

Initial Data Collection and Normalization

The Allegro Reviews is a set of product reviews from a popular e-commerce marketplace (Allegro.pl).

Who are the source language producers?

Customers of an e-commerce marketplace.

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

Allegro Machine Learning Research team klejbenchmark@allegro.pl

Licensing Information

Dataset licensed under CC BY-SA 4.0

Citation Information

@inproceedings{rybak-etal-2020-klej, title = "{KLEJ}: Comprehensive Benchmark for Polish Language Understanding", author = "Rybak, Piotr and Mroczkowski, Robert and Tracz, Janusz and Gawlik, Ireneusz", booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics", month = jul, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = " https://www.aclweb.org/anthology/2020.acl-main.111" , pages = "1191--1201", }

Contributions

Thanks to @abecadel for adding this dataset.

作者:

佚名

数据集大小:

11.82 KB