数据集:

allenai/peer_read

语言:

en

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

found

批注创建人:

expert-generated

源数据集:

original

预印本库:

arxiv:1804.09635
中文

Dataset Card for peer_read

Dataset Summary

PearRead is a dataset of scientific peer reviews available to help researchers study this important artifact. The dataset consists of over 14K paper drafts and the corresponding accept/reject decisions in top-tier venues including ACL, NIPS and ICLR, as well as over 10K textual peer reviews written by experts for a subset of the papers.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

en-English

Dataset Structure

Data Instances

[More Information Needed]

Data Fields

parsed_pdfs
  • name : string Filename in the dataset
  • metadata : dict Paper metadata
    • source : string Paper source
    • authors : list<string> List of paper authors
    • title : string Paper title
    • sections : list<dict> List of section heading and corresponding description
      • heading : string Section heading
      • text : string Section description
    • references : string List of references
      • title : string Title of reference paper
      • author : list<string> List of reference paper authors
      • venue : string Reference venue
      • citeRegEx : string Reference citeRegEx
      • shortCiteRegEx : string Reference shortCiteRegEx
      • year : int Reference publish year
    • referenceMentions : list<string> List of reference mentions
      • referenceID : int Reference mention ID
      • context : string Reference mention context
      • startOffset : int Reference startOffset
      • endOffset : int Reference endOffset
    • year : int Paper publish year
    • abstractText : string Paper abstract
    • creator : string Paper creator
reviews
  • id : int Review ID
  • conference : string Conference name
  • comments : string Review comments
  • subjects : string Review subjects
  • version : string Review version
  • date_of_submission : string Submission date
  • title : string Paper title
  • authors : list<string> List of paper authors
  • accepted : bool Paper accepted flag
  • abstract : string Paper abstract
  • histories : list<string> Paper details with link
  • reviews : dict Paper reviews
    • date : string Date of review
    • title : string Paper title
    • other_keys : string Reviewer other details
    • originality : string Originality score
    • comments : string Reviewer comments
    • is_meta_review : bool Review type flag
    • recommendation : string Reviewer recommendation
    • replicability : string Replicability score
    • presentation_format : string Presentation type
    • clarity : string Clarity score
    • meaningful_comparison : string Meaningful comparison score
    • substance : string Substance score
    • reviewer_confidence : string Reviewer confidence score
    • soundness_correctness : string Soundness correctness score
    • appropriateness : string Appropriateness score
    • impact : string Impact score

Data Splits

[More Information Needed]

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

Dongyeop Kang, Waleed Ammar, Bhavana Dalvi Mishra, Madeleine van Zuylen, Sebastian Kohlmeier, Eduard Hovy, Roy Schwartz

Licensing Information

[More Information Needed]

Citation Information

@inproceedings{kang18naacl, title = {A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications}, author = {Dongyeop Kang and Waleed Ammar and Bhavana Dalvi and Madeleine van Zuylen and Sebastian Kohlmeier and Eduard Hovy and Roy Schwartz}, booktitle = {Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL)}, address = {New Orleans, USA}, month = {June}, url = { https://arxiv.org/abs/1804.09635} , year = {2018} }

Contributions

Thanks to @vinaykudari for adding this dataset.