数据集:
notional/notional-python
子任务:
language-modeling语言:
py计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
no-annotation源数据集:
original许可:
license:unknownThe Notional-python dataset contains python code files from 100 well-known repositories gathered from Google Bigquery Github Dataset. The dataset was created to test the ability of programming language models. Follow our repo to do the model evaluation using notional-python dataset.
Python
Notional-python was built to provide a dataset for testing the ability of the machine to generate python code.
The data was obtained by filtering code from Google Bigquery Github data In order to improve the quality of the dataset, only python code files that meet the below conditions are added to the dataset:
The producers are users of github.