数据集:
lucasmccabe-lmi/CodeAlpaca-20k
We provide a minor modification of the CodeAlpaca-20k dataset. In particular, we add the phrase, "Write corresponding code in Python." if the intended language is not explicitly stated.
Prompts: 20022
Tokens: 1561716 using the EleutherAI/gpt-neox-20b tokenizer (counting instruction+input+output)