中文

The dataset contains 6273 training samples, 762 validation samples and 749 test samples. Each sample represents a sentence and includes the following features: sentence ID ('sent_id'), list of tokens ('tokens'), list of normalised word forms ('norms'), list of lemmas ('lemmas'), list of Multext-East tags ('xpos_tags), list of morphological features ('feats'), and list of UPOS tags ('upos_tags'), which are encoded as class labels.