数据集:

air_dialogue

中文

Dataset Card for air_dialogue

Dataset Summary

AirDialogue, is a large dataset that contains 402,038 goal-oriented conversations. To collect this dataset, we create a contextgenerator which provides travel and flight restrictions. Then the human annotators are asked to play the role of a customer or an agent and interact with the goal of successfully booking a trip given the restrictions.

Supported Tasks and Leaderboards

We use perplexity and BLEU score to evaluate the quality of the language generated by the model. We also compare the dialogue state generated by the model s and the ground truth state s0. Two categories of the metrics are used: exact match scores and scaled scores

The inference competition & leaderboard can be found here: https://worksheets.codalab.org/worksheets/0xa79833f4b3c24f4188cee7131b120a59

Languages

The text in the dataset is in English. The BCP 47 code is en

Dataset Structure

Data Instances

The data is provided in two set of files. The first one has the dialogues ( air_dialogue_data ) and the knowledge-base ( air_dialogue_kb )

BuilderConfig: air_dialogue_data

{"action": {"status": "book", "name": "Emily Edwards", "flight": [1027]}, "intent": {"return_month": "June", "return_day": "14", "max_price": 200, "departure_airport": "DFW", "return_time": "afternoon", "max_connections": 1, "departure_day": "12", "goal": "book", "departure_month": "June", "name": "Emily Edwards", "return_airport": "IAD"}, "timestamps": [1519233239, 1519233244, 1519233249, 1519233252, 1519233333, 1519233374, 1519233392, 1519233416, 1519233443, 1519233448, 1519233464, 1519233513, 1519233525, 1519233540, 1519233626, 1519233628, 1519233638], "dialogue": ["customer: Hello.", "agent: Hello.", "customer: My name is Emily Edwards.", "agent: How may I help you out?", "customer: I need some help in my flight ticket reservation to attend a convocation meeting, can you please help me?", "agent: Sure, I will help you out. May I know your travelling dates please?", "customer: Thank you and my dates are 06/12 and back on 06/14.", "agent: Can I know your airport codes?", "customer: The airport codes are from DFW to IAD.", "agent: Ok, please wait a moment.", "customer: Sure.", "agent: There is a flight with connection 1 and price 200, can I proceed with this flight?", "customer: Yes, do proceed with booking.", "agent: Ok, your ticket has been booked.", "customer: Thank you for your assistance in my flight ticket reservation.", "agent: Thank you for choosing us.", "customer: You are welcome."], "expected_action": {"status": "book", "name": "Emily Edwards", "flight": [1027]}, "correct_sample": true}

BuilderConfig: air_dialogue_kb

{"kb": [{"return_airport": "DTW", "airline": "Spirit", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1000, "departure_month": "June", "departure_time_num": 17, "class": "economy", "return_time_num": 2, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "DTW", "airline": "Frontier", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1001, "departure_month": "June", "departure_time_num": 0, "class": "business", "return_time_num": 15, "return_month": "June", "return_day": "13", "num_connections": 0, "price": 500}, {"return_airport": "DTW", "airline": "JetBlue", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1002, "departure_month": "June", "departure_time_num": 0, "class": "business", "return_time_num": 13, "return_month": "June", "return_day": "13", "num_connections": 1, "price": 600}, {"return_airport": "IAD", "airline": "Hawaiian", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1003, "departure_month": "June", "departure_time_num": 6, "class": "economy", "return_time_num": 5, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "DFW", "airline": "AA", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1004, "departure_month": "June", "departure_time_num": 9, "class": "economy", "return_time_num": 11, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "IAD", "airline": "AA", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1005, "departure_month": "June", "departure_time_num": 3, "class": "economy", "return_time_num": 17, "return_month": "June", "return_day": "13", "num_connections": 1, "price": 100}, {"return_airport": "DTW", "airline": "Frontier", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1006, "departure_month": "June", "departure_time_num": 10, "class": "economy", "return_time_num": 10, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "IAD", "airline": "UA", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1007, "departure_month": "June", "departure_time_num": 14, "class": "economy", "return_time_num": 20, "return_month": "June", "return_day": "13", "num_connections": 1, "price": 100}, {"return_airport": "DFW", "airline": "AA", "departure_day": "13", "departure_airport": "DTW", "flight_number": 1008, "departure_month": "June", "departure_time_num": 6, "class": "economy", "return_time_num": 8, "return_month": "June", "return_day": "14", "num_connections": 2, "price": 400}, {"return_airport": "DFW", "airline": "Delta", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1009, "departure_month": "June", "departure_time_num": 18, "class": "economy", "return_time_num": 6, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "DFW", "airline": "Frontier", "departure_day": "13", "departure_airport": "DTW", "flight_number": 1010, "departure_month": "June", "departure_time_num": 4, "class": "economy", "return_time_num": 2, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "DFW", "airline": "Southwest", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1011, "departure_month": "June", "departure_time_num": 17, "class": "economy", "return_time_num": 22, "return_month": "June", "return_day": "13", "num_connections": 0, "price": 100}, {"return_airport": "DTW", "airline": "JetBlue", "departure_day": "11", "departure_airport": "DFW", "flight_number": 1012, "departure_month": "June", "departure_time_num": 13, "class": "economy", "return_time_num": 22, "return_month": "June", "return_day": "13", "num_connections": 1, "price": 100}, {"return_airport": "DTW", "airline": "Southwest", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1013, "departure_month": "June", "departure_time_num": 16, "class": "economy", "return_time_num": 13, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "DTW", "airline": "Delta", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1014, "departure_month": "June", "departure_time_num": 0, "class": "economy", "return_time_num": 8, "return_month": "June", "return_day": "15", "num_connections": 1, "price": 100}, {"return_airport": "DTW", "airline": "Southwest", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1015, "departure_month": "June", "departure_time_num": 17, "class": "economy", "return_time_num": 1, "return_month": "June", "return_day": "15", "num_connections": 1, "price": 300}, {"return_airport": "DTW", "airline": "UA", "departure_day": "11", "departure_airport": "DFW", "flight_number": 1016, "departure_month": "June", "departure_time_num": 10, "class": "economy", "return_time_num": 4, "return_month": "June", "return_day": "14", "num_connections": 0, "price": 200}, {"return_airport": "DFW", "airline": "AA", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1017, "departure_month": "June", "departure_time_num": 14, "class": "economy", "return_time_num": 23, "return_month": "June", "return_day": "14", "num_connections": 2, "price": 400}, {"return_airport": "DTW", "airline": "JetBlue", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1018, "departure_month": "June", "departure_time_num": 3, "class": "economy", "return_time_num": 1, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "DFW", "airline": "Hawaiian", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1019, "departure_month": "June", "departure_time_num": 7, "class": "economy", "return_time_num": 18, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "DFW", "airline": "Delta", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1020, "departure_month": "June", "departure_time_num": 6, "class": "economy", "return_time_num": 18, "return_month": "June", "return_day": "14", "num_connections": 2, "price": 200}, {"return_airport": "IAD", "airline": "Delta", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1021, "departure_month": "June", "departure_time_num": 11, "class": "business", "return_time_num": 8, "return_month": "June", "return_day": "14", "num_connections": 0, "price": 1000}, {"return_airport": "IAD", "airline": "JetBlue", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1022, "departure_month": "June", "departure_time_num": 4, "class": "economy", "return_time_num": 14, "return_month": "June", "return_day": "13", "num_connections": 0, "price": 200}, {"return_airport": "IAD", "airline": "Frontier", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1023, "departure_month": "June", "departure_time_num": 19, "class": "economy", "return_time_num": 23, "return_month": "June", "return_day": "13", "num_connections": 1, "price": 200}, {"return_airport": "DFW", "airline": "UA", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1024, "departure_month": "June", "departure_time_num": 11, "class": "economy", "return_time_num": 19, "return_month": "June", "return_day": "15", "num_connections": 1, "price": 200}, {"return_airport": "DTW", "airline": "Hawaiian", "departure_day": "11", "departure_airport": "IAD", "flight_number": 1025, "departure_month": "June", "departure_time_num": 6, "class": "economy", "return_time_num": 10, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "DTW", "airline": "UA", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1026, "departure_month": "June", "departure_time_num": 0, "class": "economy", "return_time_num": 18, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 300}, {"return_airport": "IAD", "airline": "Delta", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1027, "departure_month": "June", "departure_time_num": 17, "class": "economy", "return_time_num": 15, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "IAD", "airline": "Southwest", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1028, "departure_month": "June", "departure_time_num": 23, "class": "economy", "return_time_num": 13, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "DFW", "airline": "Spirit", "departure_day": "11", "departure_airport": "DTW", "flight_number": 1029, "departure_month": "June", "departure_time_num": 22, "class": "business", "return_time_num": 4, "return_month": "June", "return_day": "14", "num_connections": 0, "price": 800}], "reservation": 0}

Data Fields

BuilderConfig: air_dialogue_data : Provides for customer context, dialogue states and environment

key name Description
'search_action' search action performed by customer
'action' Action taken by the agent
'intent' Intents from the conversation
'timestamps' Timestamp for each of the dialogues
'dialogue' Dialogue recorded between agent & customer
'expected_action' Expected action from agent (human-annotated)
'correct_sample' whether action performed by agent was same as expected_action

BuilderConfig: air_dialogue_kb : Provides for the Agent Context ca = ( db , r )

key name Description
'kb' Available flights in the database
'reservation' whether customer has an existing reservation

Data Splits

Data is split into Train/Dev & Test in the ration of 80%, 10% and 10%

Dataset Creation

Curation Rationale

[Needs More Information]

Source Data

Initial Data Collection and Normalization

[Needs More Information]

Who are the source language producers?

[Needs More Information]

Annotations

Annotation process

To collect this dataset, we create a contextgenerator which provides travel and flight restrictions. We then ask human annotators to play the role of a customer or an agent and interact with the goal of successfully booking a trip given the restrictions. Key to our environment is the ease of evaluating the success of the dialogue, which is achieved by using ground-truth states (e.g., the flight being booked) generated by the restrictions. Any dialogue agent that does not generate the correct states is considered to fail.

Who are the annotators?

[Needs More Information]

Personal and Sensitive Information

No personal and sensitive information is stored

Considerations for Using the Data

Social Impact of Dataset

[Needs More Information]

Discussion of Biases

[Needs More Information]

Other Known Limitations

[Needs More Information]

Additional Information

Dataset Curators

AirDialogue team

For issues regarding HuggingFace Dataset Hub implementation Aakash Gupta

Licensing Information

cc-by-nc-4.0

Citation Information

@inproceedings{wei-etal-2018-airdialogue, title = "{A}ir{D}ialogue: An Environment for Goal-Oriented Dialogue Research", author = "Wei, Wei and Le, Quoc and Dai, Andrew and Li, Jia", booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing", month = oct # "-" # nov, year = "2018", address = "Brussels, Belgium", publisher = "Association for Computational Linguistics", url = " https://www.aclweb.org/anthology/D18-1419" , doi = "10.18653/v1/D18-1419", pages = "3844--3854", abstract = "Recent progress in dialogue generation has inspired a number of studies on dialogue systems that are capable of accomplishing tasks through natural language interactions. A promising direction among these studies is the use of reinforcement learning techniques, such as self-play, for training dialogue agents. However, current datasets are limited in size, and the environment for training agents and evaluating progress is relatively unsophisticated. We present AirDialogue, a large dataset that contains 301,427 goal-oriented conversations. To collect this dataset, we create a context-generator which provides travel and flight restrictions. We then ask human annotators to play the role of a customer or an agent and interact with the goal of successfully booking a trip given the restrictions. Key to our environment is the ease of evaluating the success of the dialogue, which is achieved by using ground-truth states (e.g., the flight being booked) generated by the restrictions. Any dialogue agent that does not generate the correct states is considered to fail. Our experimental results indicate that state-of-the-art dialogue models can only achieve a score of 0.17 while humans can reach a score of 0.91, which suggests significant opportunities for future improvement.", }

Contributions

Thanks to @skyprince999 for adding this dataset.