Codex Math QA

Solve math_qa using codex-davinci-002 via Python Programming.

Since OpenAI decided to shut off code-davinci-002 behind Azure , this dataset aims to share the generation results for code-davinci-002 OpenAI's 176B code generation model.

Data Splits

name	train	validation	test
main	25065	4133	2985
rational	-	4151	2985

Basic prompts : only question

dataset = load_dataset("theblackcat102/codex-math-qa", "main")

Example :

the average weight of 8 person ' s increases by 1.5 kg when a new person comes in place of one of them weighing 75 kg. what might be the weight of the new person ?
Write a short snippet of python function which solve this problem. No need to explain the answer, let the code output the answer.

output

def average_weight(weights):
    total_weight = sum(weights)
    return total_weight / len(weights)

def new_person_weight(weights):
    average_weight = average_weight(weights)
    new_person_weight = average_weight + 1.5
    return new_person_weight

weights = [75, 80, 85, 90, 95]
print(new_person_weight(weights))

Prompts with rationale : include the Rationale text which should helps the programming planinig

dataset = load_dataset("theblackcat102/codex-math-qa", "rational")

Example :

one copy machine can make 20 copies a minute, and a second copy machine makes 10 copies a minute. if the two copiers work together, how long would it take them to make 900 copies ?
Rationale: "total work done by both machines in a minute = 20 + 10 = 30 copies total number of copies required = 900 time = 900 / 30 = 30 mins answer d"
Write a short snippet of python function which solve this problem within 30 lines of code. Add comments in the function block for explaination.

output

def copy_machine(copies, copy_speed):
    """
    :param copies: number of copies required
    :param copy_speed: speed of the copy machine
    :return: time in minutes
    """
    total_work = sum(copy_speed)
    time = copies / total_work
    return time


if __name__ == "__main__":
    print(copy_machine(900, [20, 10]))

Notes:

The generated results are unvalidated and are as what it is from the codex-davinci-002 outputs. So there's a majority of answers which is incorrect and code with syntax error. However, this is a work for a future study and the aim of this dataset was to provide a source or reference for code based math answering by codex-davinci-002.

Dataset Creation

Dataset was sourced from math_qa and append prompts at the end of section for generating Python solutions for the answer. This is an aim for providing dataset for the work offload seem in galactica

The generation config for code-davinci-02 are as follows:

name	value
max_tokens	2048
temperature	0.5
top_p	0.7

Citation Information

@inproceedings{amini-etal-2019-mathqa,
    title = "{M}ath{QA}: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms",
    author = "Amini, Aida  and
      Gabriel, Saadia  and
      Lin, Shanchuan  and
      Koncel-Kedziorski, Rik  and
      Choi, Yejin  and
      Hajishirzi, Hannaneh",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/N19-1245",
    doi = "10.18653/v1/N19-1245",
    pages = "2357--2367",
}

作者:

theblackcat102

数据集大小:

39.44 MB