Dataset Card for H4 Code Evaluation Prompts

These are a filtered set of prompts for evaluating code instruction models. It will contain a variety of languages and task types. Currently, we used ChatGPT (GPT-3.5-tubro) to generate these, so we encourage using them only for qualatative evaluation and not to train your models.

The generation of this data is similar to something like CodeAlpaca , which you can download here , but we intend to make these tasks both a) more challenging, and b) more curated.

These two things hopefully give a meaningful evaluation, but is not enough data to train an entire model.

The data corresponds to the following:

20 simple python instruction following,
20 intermediate python instruction following,
10 advanced python instruciton following,
15 python machine learning questions,
20 C++ instruction following,
10 html instruction following,
20 misc language code feedback questions.

Or, on a per language basis:

Python: 81
C++: 21
html: 10
Ruby: 1
Bash: 1
MATLAB: 1
React: 1
Scala: 1
JavaScript: 1
Java: 1
PHP: 1

Or, per instruction type:

Code completion / instruction following: 95
Bug fixing: 20

To get the current information on the tasks, you can use the following snippet:

from datasets import load_dataset
d = load_dataset("HuggingFaceH4/code_evaluation_prompts")
language_list = d['train']['language']
language_count = {ele:language_list.count(ele) for ele in language_list}

Similar code can be run for the type of instruction (code generation vs. bug advice).

Interested in contributing? Open a PR with a specific language and question content.

Here are the ChatGPT prompts used to initiate the responses (which are then filtered), May 3rd 2023 version:

Generate a bunch of instructions for coding questions in python (in the format of {"prompt": instruction})
These have been useful, can you generate the last few that are the hardest and most Pythonic that you can think of?
Taking a step back, can you generate 20 for me that don't need to be as hard, but are machine learning focused (e.g. a mix of PyTorch and Jax).
Generate a bunch of instructions for coding questions in C++ (in the format of {"prompt": instruction})
Can you generate 5 examples of instructions, with the same format {"prompt": text}, where the instruction has a piece of code with a bug, and you're asking for feedback on your code as if you wrote it?

作者:

HuggingFaceH4

数据集大小:

20.47 KB