数据集:

lucasmccabe-lmi/CodeAlpaca-20k

中文

Dataset Card for "CodeAlpaca-20k"

We provide a minor modification of the CodeAlpaca-20k dataset. In particular, we add the phrase, "Write corresponding code in Python." if the intended language is not explicitly stated.

Numbers:

Prompts: 20022

Tokens: 1561716 using the EleutherAI/gpt-neox-20b tokenizer (counting instruction+input+output)