数据集:
nkjp-ner
任务:
标记分类语言:
pl计算机处理:
monolingual大小:
10K<n<100K语言创建人:
other批注创建人:
expert-generated源数据集:
original许可:
gpl-3.0A linguistic corpus is a collection of texts where one can find the typical use of a single word or a phrase, as well as their meaning and grammatical function. Nowadays, without access to a language corpus, it has become impossible to do linguistic research, to write dictionaries, grammars and language teaching books, to create search engines sensitive to Polish inflection, machine translation engines and software of advanced language technology. Language corpora have become an essential tool for linguists, but they are also helpful for software engineers, scholars of literature and culture, historians, librarians and other specialists of art and computer sciences. The manually annotated 1-million word subcorpus of the NJKP, available on GNU GPL v.3
Named entity recognition
[More Information Needed]
Polish
Two tsv files (train, dev) with two columns (sentence, target) and one (test) with just one (sentence).
Data is splitted in train/dev/test split.
This dataset is one of nine evaluation tasks to improve polish language processing.
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
GNU GPL v.3
@book{przepiorkowski2012narodowy, title={Narodowy korpus j{\k{e}}zyka polskiego}, author={Przepi{'o}rkowski, Adam}, year={2012}, publisher={Naukowe PWN} }
Thanks to @abecadel for adding this dataset.