数据集:
neural_code_search
许可:
cc-by-nc-4.0预印本库:
arxiv:1908.09804语言创建人:
crowdsourced批注创建人:
expert-generated源数据集:
original任务:
问答子任务:
extractive-qa语言:
en计算机处理:
monolingualNeural-Code-Search-Evaluation-Dataset presents an evaluation dataset consisting of natural language query and code snippet pairs, with the hope that future work in this area can use this dataset as a common benchmark. We also provide the results of two code search models (NCS, UNIF) from recent work.
[More Information Needed]
EN - English
The search corpus is indexed using all method bodies parsed from the 24,549 GitHub repositories. In total, there are 4,716,814 methods in this corpus. The code search model will find relevant code snippets (i.e. method bodies) from this corpus given a natural language query. In this data release, we will provide the following information for each method in the corpus:
Evaluation DatasetThe evaluation dataset is composed of 287 Stack Overflow question and answer pairs
[More Information Needed]
[More Information Needed]
The most popular Android repositories on GitHub (ranked by the number of stars) is used to create the search corpus. For each repository that we indexed, we provide the link, specific to the commit that was used.5 In total, there are 24,549 repositories.
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Dataset provided for research purposes only. Please check dataset license for additional information.
Hongyu Li, Seohyun Kim and Satish Chandra
CC-BY-NC 4.0 (Attr Non-Commercial Inter.)
arXiv:1908.09804 [cs.SE]
Thanks to @vinaykudari for adding this dataset.