数据集:

wrbsc

任务:

文本分类

子任务:

semantic-similarity-classification

语言:

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

found

批注创建人:

expert-generated

源数据集:

original

许可:

cc-by-sa-3.0

数据集介绍文件清单

中文

Dataset Card for wrbsc

Dataset Summary

WUT Relations Between Sentences Corpus contains 2827 pairs of related sentences. Relationships are derived from Cross-document Structure Theory (CST), which enables multi-document summarization through identification of cross-document rhetorical relationships within a cluster of related documents. Every relation was marked by at least 3 annotators.

Supported Tasks and Leaderboards

[Needs More Information]

Languages

Polish

Dataset Structure

Data Instances

An example contains two related sentences and a class representing the type of relationship between those sentences.

{'relationship': 0,
 'sentence1': 'Znajdujące się w Biurze Bezpieczeństwa Narodowego akta Komisji Weryfikacyjnej WSI zostały przewiezione do siedziby Służby Kontrwywiadu Wojskowego.',
 'sentence2': '2008-07-03: Wywiezienie akt dotyczących WSI – sprawa dla prokuratury?'}

Data Fields

sentence1 : the first sentence being compared ( string )
sentence2 : the second sentence being compared ( string )
relationship : the type of relationship between those sentences. Can be one of 16 classes listed below:
- Krzyżowanie_się : crossing
- Tło_historyczne : historical background
- Źródło : source
- Dalsze_informacje : additional information
- Zawieranie : inclusion
- Opis : description
- Uszczegółowienie : further detail
- Parafraza : paraphrase
- Spełnienie : fulfillment
- Mowa_zależna : passive voice
- Zmiana_poglądu : change of opinion
- Streszczenie : summarization
- Tożsamość : identity
- Sprzeczność : conflict
- Modalność : modality
- Cytowanie : quotation

Data Splits

Single train split

Dataset Creation

Curation Rationale

[Needs More Information]

Source Data

Initial Data Collection and Normalization

[Needs More Information]

Who are the source language producers?

[Needs More Information]

Annotations

Annotation process

[Needs More Information]

Who are the annotators?

[Needs More Information]

Personal and Sensitive Information

[Needs More Information]

Considerations for Using the Data

Social Impact of Dataset

[Needs More Information]

Discussion of Biases

[Needs More Information]

Other Known Limitations

[Needs More Information]

Additional Information

Dataset Curators

[Needs More Information]

Licensing Information

Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)

Citation Information

@misc{11321/305,	
 title = {{WUT} Relations Between Sentences Corpus},	
 author = {Oleksy, Marcin and Fikus, Dominika and Wolski, Micha{\l} and Podbielska, Ma{\l}gorzata and Turek, Agnieszka and Kędzia, Pawe{\l}},	
 url = {http://hdl.handle.net/11321/305},	
 note = {{CLARIN}-{PL} digital repository},	
 copyright = {Attribution-{ShareAlike} 3.0 Unported ({CC} {BY}-{SA} 3.0)},	
 year = {2016}	
}

Contributions

Thanks to @kldarek for adding this dataset.

作者:

佚名

数据集大小:

14.75 KB