数据集:

allenai/peer_read

任务:

文本分类

语言:

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

found

批注创建人:

expert-generated

源数据集:

original

预印本库:

arxiv:1804.09635

其他:

acceptability-classification

许可:

license:unknown

数据集介绍文件清单

中文

Dataset Card for peer_read

Dataset Summary

PearRead is a dataset of scientific peer reviews available to help researchers study this important artifact. The dataset consists of over 14K paper drafts and the corresponding accept/reject decisions in top-tier venues including ACL, NIPS and ICLR, as well as over 10K textual peer reviews written by experts for a subset of the papers.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

en-English

Dataset Structure

Data Instances

[More Information Needed]

Data Fields

parsed_pdfs

name : string Filename in the dataset
metadata : dict Paper metadata
- source : string Paper source
- authors : list<string> List of paper authors
- title : string Paper title
- sections : list<dict> List of section heading and corresponding description
  - heading : string Section heading
  - text : string Section description
- references : string List of references
  - title : string Title of reference paper
  - author : list<string> List of reference paper authors
  - venue : string Reference venue
  - citeRegEx : string Reference citeRegEx
  - shortCiteRegEx : string Reference shortCiteRegEx
  - year : int Reference publish year
- referenceMentions : list<string> List of reference mentions
  - referenceID : int Reference mention ID
  - context : string Reference mention context
  - startOffset : int Reference startOffset
  - endOffset : int Reference endOffset
- year : int Paper publish year
- abstractText : string Paper abstract
- creator : string Paper creator

reviews

id : int Review ID
conference : string Conference name
comments : string Review comments
subjects : string Review subjects
version : string Review version
date_of_submission : string Submission date
title : string Paper title
authors : list<string> List of paper authors
accepted : bool Paper accepted flag
abstract : string Paper abstract
histories : list<string> Paper details with link
reviews : dict Paper reviews
- date : string Date of review
- title : string Paper title
- other_keys : string Reviewer other details
- originality : string Originality score
- comments : string Reviewer comments
- is_meta_review : bool Review type flag
- recommendation : string Reviewer recommendation
- replicability : string Replicability score
- presentation_format : string Presentation type
- clarity : string Clarity score
- meaningful_comparison : string Meaningful comparison score
- substance : string Substance score
- reviewer_confidence : string Reviewer confidence score
- soundness_correctness : string Soundness correctness score
- appropriateness : string Appropriateness score
- impact : string Impact score

Data Splits

[More Information Needed]

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

Dongyeop Kang, Waleed Ammar, Bhavana Dalvi Mishra, Madeleine van Zuylen, Sebastian Kohlmeier, Eduard Hovy, Roy Schwartz

Licensing Information

[More Information Needed]

Citation Information

@inproceedings{kang18naacl, title = {A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications}, author = {Dongyeop Kang and Waleed Ammar and Bhavana Dalvi and Madeleine van Zuylen and Sebastian Kohlmeier and Eduard Hovy and Roy Schwartz}, booktitle = {Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL)}, address = {New Orleans, USA}, month = {June}, url = { https://arxiv.org/abs/1804.09635} , year = {2018} }

Contributions

Thanks to @vinaykudari for adding this dataset.

作者:

allenai

数据集大小:

31.27 KB