These are a filtered set of prompts for evaluating code instruction models. It will contain a variety of languages and task types. Currently, we used ChatGPT (GPT-3.5-tubro) to generate these, so we encourage using them only for qualatative evaluation and not to train your models.
The generation of this data is similar to something like CodeAlpaca , which you can download here , but we intend to make these tasks both a) more challenging, and b) more curated.
These two things hopefully give a meaningful evaluation, but is not enough data to train an entire model.
The data corresponds to the following:
Or, on a per language basis:
Or, per instruction type:
To get the current information on the tasks, you can use the following snippet:
from datasets import load_dataset d = load_dataset("HuggingFaceH4/code_evaluation_prompts") language_list = d['train']['language'] language_count = {ele:language_list.count(ele) for ele in language_list}
Similar code can be run for the type of instruction (code generation vs. bug advice).
Interested in contributing? Open a PR with a specific language and question content.
Here are the ChatGPT prompts used to initiate the responses (which are then filtered), May 3rd 2023 version: