数据集:
code_x_glue_cc_clone_detection_poj104
任务:
文本检索子任务:
document-retrieval语言:
code计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
found源数据集:
original许可:
c-udaCodeXGLUE Clone-detection-POJ-104 dataset, available at https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/Clone-detection-POJ-104
Given a code and a collection of candidates as the input, the task is to return Top K codes with the same semantic. Models are evaluated by MAP score. We use POJ-104 dataset on this task.
An example of 'train' looks as follows.
{ "code": "\nint f(int shu,int min)\n{ \n int k=1;\n if(shu < min)\n { \n k= 0; \n return k;\n } \n else\n {\n for(int i = min;i<shu;i++)\n { \n if(shu%i == 0)\n { \n k=k+ f(shu/i,i); \n } \n \n \n } \n return k; \n}\n} \n\nmain()\n{\n int n,i,a;\n scanf(\"%d\",&n);\n \n for(i=0;i<n;i++)\n {\n scanf(\"%d\",&a);\n \n if(i!=n-1) \n printf(\"%d\\n\",f(a,2));\n else\n printf(\"%d\",f(a,2)); \n \n \n \n } \n \n \n }", "id": 0, "label": "home" }
In the following each data field in go is explained for each config. The data fields are the same among all splits.
defaultfield name | type | description |
---|---|---|
id | int32 | Index of the sample |
code | string | The full text of the function |
label | string | The id of problem that the source code solves |
name | train | validation | test |
---|---|---|---|
default | 32000 | 8000 | 12000 |
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
https://github.com/microsoft , https://github.com/madlag
Computational Use of Data Agreement (C-UDA) License.
@inproceedings{mou2016convolutional, title={Convolutional neural networks over tree structures for programming language processing}, author={Mou, Lili and Li, Ge and Zhang, Lu and Wang, Tao and Jin, Zhi}, booktitle={Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence}, pages={1287--1293}, year={2016} }
Thanks to @madlag (and partly also @ncoop57) for adding this dataset.