数据集:
DFKI-SLT/brat
Brat is an intuitive web-based tool for text annotation supported by Natural Language Processing (NLP) technology. BRAT has been developed for rich structured annota- tion for a variety of NLP tasks and aims to support manual curation efforts and increase annotator productivity using NLP techniques. brat is designed in particular for structured annotation, where the notes are not free form text but have a fixed form that can be automatically processed and interpreted by a computer.
Dataset annotated with brat format is processed using this script. Annotations created in brat are stored on disk in a standoff format: annotations are stored separately from the annotated document text, which is never modified by the tool. For each text document in the system, there is a corresponding annotation file. The two are associatied by the file naming convention that their base name (file name without suffix) is the same: for example, the file DOC-1000.ann contains annotations for the file DOC-1000.txt. More information can be found here .
[Needs More Information]
-context: html content of data file as string -file_name: a string name of file -spans: a sequence containing id, type, location and text of a span -relations: a sequence containing id, type and arguments of a relation -equivalence_relations: -events: -attributions: -normalizations: -notes:
brat script can be used by calling load_dataset() method and passing kwargs (arguments to the BuilderConfig ) which should include at least url of the dataset prepared using brat. We provide an example of SciArg dataset below,
from datasets import load_dataset kwargs = { "description" : """This dataset is an extension of the Dr. Inventor corpus (Fisas et al., 2015, 2016) with an annotation layer containing fine-grained argumentative components and relations. It is the first argument-annotated corpus of scientific publications (in English), which allows for joint analyses of argumentation and other rhetorical dimensions of scientific writing.""", "citation" : """@inproceedings{lauscher2018b, title = {An argument-annotated corpus of scientific publications}, booktitle = {Proceedings of the 5th Workshop on Mining Argumentation}, publisher = {Association for Computational Linguistics}, author = {Lauscher, Anne and Glava\v{s}, Goran and Ponzetto, Simone Paolo}, address = {Brussels, Belgium}, year = {2018}, pages = {40–46} }""", "homepage": "https://github.com/anlausch/ArguminSci", "url": "http://data.dws.informatik.uni-mannheim.de/sci-arg/compiled_corpus.zip", "file_name_blacklist": ['A28'], } dataset = load_dataset('dfki-nlp/brat', **kwargs)
[Needs More Information]
@inproceedings{stenetorp-etal-2012-brat, title = "brat: a Web-based Tool for {NLP}-Assisted Text Annotation", author = "Stenetorp, Pontus and Pyysalo, Sampo and Topi{\'c}, Goran and Ohta, Tomoko and Ananiadou, Sophia and Tsujii, Jun{'}ichi", booktitle = "Proceedings of the Demonstrations at the 13th Conference of the {E}uropean Chapter of the Association for Computational Linguistics", month = apr, year = "2012", address = "Avignon, France", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/E12-2021", pages = "102--107", }