数据集:

neulab/odex

中文

ODEX is an Open-Domain EXecution-based NL-to-Code generation data benchmark. It contains 945 samples with a total of 1,707 human-written test cases, covering intents in four different natural languages -- 439 in English, 90 in Spanish, 164 in Japanese, and 252 in Russian.

You can load the dataset by specifying a subset from en, es, ja, ru (by default the english subset en is loaded):

from datasets import load_dataset

ds = load_dataset("neulab/odex", "ja", split="test")

If you find our dataset useful, please cite the paper

@article{wang2022execution,
  title={Execution-Based Evaluation for Open-Domain Code Generation},
  author={Zhiruo Wang, Shuyan Zhou, Daniel Fried, Graham Neubig},
  journal={arXiv preprint arXiv:2212.10481},
  year={2022}
}