数据集:
classla/ssj500k
The dataset contains 7432 training samples, 1164 validation samples and 893 test samples. Each sample represents a sentence and includes the following features: sentence ID ('sent_id'), list of tokens ('tokens'), list of lemmas ('lemmas'), list of Multext-East tags ('xpos_tags), list of UPOS tags ('upos_tags'), list of morphological features ('feats'), list of IOB tags ('iob_tags'), and list of universal dependency tags ('uds'). Three dataset configurations are available, where the corresponding features are encoded as class labels: 'ner', 'upos', and 'ud'.