数据集:

bprec

语言:

pl

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

expert-generated

批注创建人:

expert-generated

源数据集:

original
中文

Dataset Card for [Dataset Name]

Dataset Summary

Brand-Product Relation Extraction Corpora in Polish

Supported Tasks and Leaderboards

NER, Entity linking

Languages

Polish

Dataset Structure

Data Instances

[More Information Needed]

Data Fields

  • id: int identifier of a text
  • text: string text, for example a consumer comment on the social media
  • ner: extracted entities and their relationship
    • source and target: a pair of entities identified in the text
      • from: int value representing starting character of the entity
      • text: string value with the entity text
      • to: int value representing end character of the entity
      • type: one of pre-identified entity types:
        • PRODUCT_NAME
        • PRODUCT_NAME_IMP
        • PRODUCT_NO_BRAND
        • BRAND_NAME
        • BRAND_NAME_IMP
        • VERSION
        • PRODUCT_ADJ
        • BRAND_ADJ
        • LOCATION
        • LOCATION_IMP

Data Splits

No train/validation/test split provided. Current dataset configurations point to 4 domain categories for the texts:

  • tele
  • electro
  • cosmetics
  • banking

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

[More Information Needed]

Citation Information

@inproceedings{inproceedings,
author = {Janz, Arkadiusz and Kopociński, Łukasz and Piasecki, Maciej and Pluwak, Agnieszka},
year = {2020},
month = {05},
pages = {},
title = {Brand-Product Relation Extraction Using Heterogeneous Vector Space Representations}
}

Contributions

Thanks to @kldarek for adding this dataset.