数据集:
universal_morphologies
计算机处理:
monolingual语言创建人:
found批注创建人:
expert-generated源数据集:
original其他:
morphology许可:
cc-by-sa-3.0The Universal Morphology (UniMorph) project is a collaborative effort to improve how NLP handles complex morphology in the world’s languages. The goal of UniMorph is to annotate morphological data in a universal schema that allows an inflected word from any language to be defined by its lexical meaning, typically carried by the lemma, and by a rendering of its inflectional form in terms of a bundle of morphological features from our schema. The specification of the schema is described in Sylak-Glassman (2016).
[More Information Needed]
The current version of the UniMorph dataset covers 110 languages.
Each data instance comprises of a lemma and a set of possible realizations with morphological and meaning annotations. For example:
{'forms': {'Aktionsart': [[], [], [], [], []], 'Animacy': [[], [], [], [], []], ... 'Finiteness': [[], [], [], [1], []], ... 'Number': [[], [], [0], [], []], 'Other': [[], [], [], [], []], 'Part_Of_Speech': [[7], [10], [7], [7], [10]], ... 'Tense': [[1], [1], [0], [], [0]], ... 'word': ['ablated', 'ablated', 'ablates', 'ablate', 'ablating']}, 'lemma': 'ablate'}
Each instance in the dataset has the following fields:
[More Information Needed]
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Thanks to @yjernite for adding this dataset.