数据集:
jfleg
JFLEG (JHU FLuency-Extended GUG) is an English grammatical error correction (GEC) corpus. It is a gold standard benchmark for developing and evaluating GEC systems with respect to fluency (extent to which a text is native-sounding) as well as grammaticality. For each source document, there are four human-written corrections.
Grammatical error correction.
English (native as well as L2 writers)
Each instance contains a source sentence and four corrections. For example:
{ 'sentence': "They are moved by solar energy ." 'corrections': [ "They are moving by solar energy .", "They are moved by solar energy .", "They are moved by solar energy .", "They are propelled by solar energy ." ] }
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License .
This benchmark was proposed by Napoles et al., 2020 .
@InProceedings{napoles-sakaguchi-tetreault:2017:EACLshort, author = {Napoles, Courtney and Sakaguchi, Keisuke and Tetreault, Joel}, title = {JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction}, booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers}, month = {April}, year = {2017}, address = {Valencia, Spain}, publisher = {Association for Computational Linguistics}, pages = {229--234}, url = {http://www.aclweb.org/anthology/E17-2037} } @InProceedings{heilman-EtAl:2014:P14-2, author = {Heilman, Michael and Cahill, Aoife and Madnani, Nitin and Lopez, Melissa and Mulholland, Matthew and Tetreault, Joel}, title = {Predicting Grammaticality on an Ordinal Scale}, booktitle = {Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)}, month = {June}, year = {2014}, address = {Baltimore, Maryland}, publisher = {Association for Computational Linguistics}, pages = {174--180}, url = {http://www.aclweb.org/anthology/P14-2029} }
Thanks to @j-chim for adding this dataset.