数据集:

d0rj/samsum-ru

语言:

ru

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

translated

批注创建人:

expert-generated

源数据集:

samsum

预印本库:

arxiv:1911.12237
中文

Dataset Card for SAMSum Corpus (ru)

Dataset Description

Translated samsum dataset to russian language.

Notes

Row with ID 13828807 was deleted.

Links

Languages

Russian (translated from English samsum using Google Translator)

Dataset Structure

Data Fields

  • dialogue: text of dialogue.
  • summary: human written summary of the dialogue.
  • id: unique file id of an example.

Data Splits

  • train: 14731
  • val: 818
  • test: 819

Licensing Information

non-commercial licence: CC BY-NC-ND 4.0

Citation Information

@inproceedings{gliwa-etal-2019-samsum,
    title = "{SAMS}um Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization",
    author = "Gliwa, Bogdan  and
      Mochol, Iwona  and
      Biesek, Maciej  and
      Wawer, Aleksander",
    booktitle = "Proceedings of the 2nd Workshop on New Frontiers in Summarization",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-5409",
    doi = "10.18653/v1/D19-5409",
    pages = "70--79"
}