数据集:
theblackcat102/codex-math-qa
Solve math_qa using codex-davinci-002 via Python Programming.
Since OpenAI decided to shut off code-davinci-002 behind Azure , this dataset aims to share the generation results for code-davinci-002 OpenAI's 176B code generation model.
name | train | validation | test |
---|---|---|---|
main | 25065 | 4133 | 2985 |
rational | - | 4151 | 2985 |
dataset = load_dataset("theblackcat102/codex-math-qa", "main")
Example :
the average weight of 8 person ' s increases by 1.5 kg when a new person comes in place of one of them weighing 75 kg. what might be the weight of the new person ? Write a short snippet of python function which solve this problem. No need to explain the answer, let the code output the answer.
output
def average_weight(weights): total_weight = sum(weights) return total_weight / len(weights) def new_person_weight(weights): average_weight = average_weight(weights) new_person_weight = average_weight + 1.5 return new_person_weight weights = [75, 80, 85, 90, 95] print(new_person_weight(weights))
dataset = load_dataset("theblackcat102/codex-math-qa", "rational")
Example :
one copy machine can make 20 copies a minute, and a second copy machine makes 10 copies a minute. if the two copiers work together, how long would it take them to make 900 copies ? Rationale: "total work done by both machines in a minute = 20 + 10 = 30 copies total number of copies required = 900 time = 900 / 30 = 30 mins answer d" Write a short snippet of python function which solve this problem within 30 lines of code. Add comments in the function block for explaination.
output
def copy_machine(copies, copy_speed): """ :param copies: number of copies required :param copy_speed: speed of the copy machine :return: time in minutes """ total_work = sum(copy_speed) time = copies / total_work return time if __name__ == "__main__": print(copy_machine(900, [20, 10]))
The generated results are unvalidated and are as what it is from the codex-davinci-002 outputs. So there's a majority of answers which is incorrect and code with syntax error. However, this is a work for a future study and the aim of this dataset was to provide a source or reference for code based math answering by codex-davinci-002.
Dataset was sourced from math_qa and append prompts at the end of section for generating Python solutions for the answer. This is an aim for providing dataset for the work offload seem in galactica
The generation config for code-davinci-02 are as follows:
name | value |
---|---|
max_tokens | 2048 |
temperature | 0.5 |
top_p | 0.7 |
@inproceedings{amini-etal-2019-mathqa, title = "{M}ath{QA}: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms", author = "Amini, Aida and Gabriel, Saadia and Lin, Shanchuan and Koncel-Kedziorski, Rik and Choi, Yejin and Hajishirzi, Hannaneh", booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)", month = jun, year = "2019", address = "Minneapolis, Minnesota", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N19-1245", doi = "10.18653/v1/N19-1245", pages = "2357--2367", }