数据集:
gabeorlanski/bc-humaneval
To use this dataset, you can either use the original BabelCode Repo , or you can use the bc_eval Metric .
The BabelCode-HumaneEval (BC-HumanEval) dataset converts the HumanEval dataset released by OpenAI to 16 programming languages.
BC-HumanEval supports:
>>> from datasets import load_dataset >>> load_dataset("gabeorlanski/bc-humaneval") DatasetDict({ test: Dataset({ features: ['qid', 'title', 'language', 'text', 'signature_with_docstring', 'signature', 'arguments', 'solution', 'question_info'], num_rows: 2576 }) })
NOTE: If you want to use a different function name (or class name for languages that require class names) for the prediction, you must update the entry_fn_name and entry_cls_name accordingly. For example, if you have the original question with entry_fn_name of add , but want to change it to f , you must update ds["question_info"]["entry_fn_name"] to f :
>>> from datasets import load_dataset >>> ds = load_dataset("gabeorlanski/bc-humaneval")['test'] >>> # The original entry_fn_name >>> ds[0]['question_info']['entry_fn_name'] hasCloseElements >>> # You MUST update the corresponding entry_fn_name >>> ds[0]['question_info']['entry_fn_name'] = 'f' >>> ds[0]['question_info']['entry_fn_name'] f
See section 2 of the BabelCode Paper to learn more about how the datasets are translated.
For information on how the original HumanEval was curated, please see the Evaluating Large Language Models Trained on Code paper .
Google Research
CC-BY-4.0
@article{orlanski2023measuring, title={Measuring The Impact Of Programming Language Distribution}, author={Orlanski, Gabriel and Xiao, Kefan and Garcia, Xavier and Hui, Jeffrey and Howland, Joshua and Malmaud, Jonathan and Austin, Jacob and Singh, Rishah and Catasta, Michele}, journal={arXiv preprint arXiv:2302.01973}, year={2023} } @article{chen2021codex, title={Evaluating Large Language Models Trained on Code}, author={Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and Henrique Ponde de Oliveira Pinto and Jared Kaplan and Harri Edwards and Yuri Burda and Nicholas Joseph and Greg Brockman and Alex Ray and Raul Puri and Gretchen Krueger and Michael Petrov and Heidy Khlaaf and Girish Sastry and Pamela Mishkin and Brooke Chan and Scott Gray and Nick Ryder and Mikhail Pavlov and Alethea Power and Lukasz Kaiser and Mohammad Bavarian and Clemens Winter and Philippe Tillet and Felipe Petroski Such and Dave Cummings and Matthias Plappert and Fotios Chantzis and Elizabeth Barnes and Ariel Herbert-Voss and William Hebgen Guss and Alex Nichol and Alex Paino and Nikolas Tezak and Jie Tang and Igor Babuschkin and Suchir Balaji and Shantanu Jain and William Saunders and Christopher Hesse and Andrew N. Carr and Jan Leike and Josh Achiam and Vedant Misra and Evan Morikawa and Alec Radford and Matthew Knight and Miles Brundage and Mira Murati and Katie Mayer and Peter Welinder and Bob McGrew and Dario Amodei and Sam McCandlish and Ilya Sutskever and Wojciech Zaremba}, year={2021}, eprint={2107.03374}, archivePrefix={arXiv}, primaryClass={cs.LG} }