数据集:
Nan-Do/instructional_code-search-net-python
语言:
en许可:
apache-2.0This is an instructional dataset for Python. The dataset contains two different kind of tasks:
The dataset is in English.
There are no splits.
May of 2023
This dataset was created to improve the coding capabilities of LLMs.
The summarized version of the code-search-net dataset can be found at https://huggingface.co/datasets/Nan-Do/code-search-net-python
The dataset includes an instruction and response columns.
Annotation processThe annotation procedure was done using templates and NLP techniques to generate human-like instructions and responses. A sample notebook of the process can be found at https://github.com/Nan-Do/OpenAssistantInstructionResponsePython The annontations have been cleaned to make sure there are no repetitions and/or meaningless summaries.
Apache 2.0