数据集:

HuggingFaceH4/code_evaluation_prompts

语言:

en

大小:

n<1K
中文

Dataset Card for H4 Code Evaluation Prompts

These are a filtered set of prompts for evaluating code instruction models. It will contain a variety of languages and task types. Currently, we used ChatGPT (GPT-3.5-tubro) to generate these, so we encourage using them only for qualatative evaluation and not to train your models.

The generation of this data is similar to something like CodeAlpaca , which you can download here , but we intend to make these tasks both a) more challenging, and b) more curated.

These two things hopefully give a meaningful evaluation, but is not enough data to train an entire model.

The data corresponds to the following:

  • 20 simple python instruction following,
  • 20 intermediate python instruction following,
  • 10 advanced python instruciton following,
  • 15 python machine learning questions,
  • 20 C++ instruction following,
  • 10 html instruction following,
  • 20 misc language code feedback questions.

Or, on a per language basis:

  • Python: 81
  • C++: 21
  • html: 10
  • Ruby: 1
  • Bash: 1
  • MATLAB: 1
  • React: 1
  • Scala: 1
  • JavaScript: 1
  • Java: 1
  • PHP: 1

Or, per instruction type:

  • Code completion / instruction following: 95
  • Bug fixing: 20

To get the current information on the tasks, you can use the following snippet:

from datasets import load_dataset
d = load_dataset("HuggingFaceH4/code_evaluation_prompts")
language_list = d['train']['language']
language_count = {ele:language_list.count(ele) for ele in language_list}

Similar code can be run for the type of instruction (code generation vs. bug advice).

Interested in contributing? Open a PR with a specific language and question content.

Here are the ChatGPT prompts used to initiate the responses (which are then filtered), May 3rd 2023 version:

  • Generate a bunch of instructions for coding questions in python (in the format of {"prompt": instruction})
  • These have been useful, can you generate the last few that are the hardest and most Pythonic that you can think of?
  • Taking a step back, can you generate 20 for me that don't need to be as hard, but are machine learning focused (e.g. a mix of PyTorch and Jax).
  • Generate a bunch of instructions for coding questions in C++ (in the format of {"prompt": instruction})
  • Can you generate 5 examples of instructions, with the same format {"prompt": text}, where the instruction has a piece of code with a bug, and you're asking for feedback on your code as if you wrote it?