中文

Dataset Card for "instructional_code-search-net-python"

Dataset Summary

This is an instructional dataset for Python. The dataset contains two different kind of tasks:

  • Given a piece of code generate a description of what it does.
  • Given a description generate a piece of code that fulfils the description.

Languages

The dataset is in English.

Data Splits

There are no splits.

Dataset Creation

May of 2023

Curation Rationale

This dataset was created to improve the coding capabilities of LLMs.

Source Data

The summarized version of the code-search-net dataset can be found at https://huggingface.co/datasets/Nan-Do/code-search-net-python

Annotations

The dataset includes an instruction and response columns.

Annotation process

The annotation procedure was done using templates and NLP techniques to generate human-like instructions and responses. A sample notebook of the process can be found at https://github.com/Nan-Do/OpenAssistantInstructionResponsePython The annontations have been cleaned to make sure there are no repetitions and/or meaningless summaries.

Licensing Information

Apache 2.0