数据集:
classla/janes_tag
语言:
si许可:
cc-by-sa-4.0The dataset contains 6273 training samples, 762 validation samples and 749 test samples. Each sample represents a sentence and includes the following features: sentence ID ('sent_id'), list of tokens ('tokens'), list of normalised word forms ('norms'), list of lemmas ('lemmas'), list of Multext-East tags ('xpos_tags), list of morphological features ('feats'), and list of UPOS tags ('upos_tags'), which are encoded as class labels.