数据集:

pasinit/xlwic

计算机处理:

multilingual

大小:

10K<n<100K

语言创建人:

found

批注创建人:

expert-generated

源数据集:

original
中文

XL-WiC

Huggingface dataset for the XL-WiC paper https://www.aclweb.org/anthology/2020.emnlp-main.584.pdf . Please refer to the official website for more information.

Configurations

When loading one of the XL-WSD datasets one has to specify the training language and the target language (on which dev and test will be performed). Please refer to Languages section to see in which languages training data is available. For example, we can load the dataset having English as training language and Italian as target language as follows:

from datasets import load_dataset
dataset = load_dataset('pasinit/xlwic', 'en_it')

Languages

Training data

  • en (English)
  • fr (French)
  • de (German)
  • it (Italian)

Dev & Test data

  • fr (French)
  • de (German)
  • it (Italian)
  • bg (Bulgarian)
  • zh (Chinese)
  • hr (Croatian)
  • da (Danish)
  • nl (Dutch)
  • et (Estonian)
  • fa (Farsi)
  • ja (Japanesse)
  • ko (Korean)