数据集:
air_dialogue
语言:
en计算机处理:
monolingual大小:
100K<n<1M语言创建人:
machine-generated批注创建人:
crowdsourced源数据集:
original许可:
cc-by-nc-4.0AirDialogue, is a large dataset that contains 402,038 goal-oriented conversations. To collect this dataset, we create a contextgenerator which provides travel and flight restrictions. Then the human annotators are asked to play the role of a customer or an agent and interact with the goal of successfully booking a trip given the restrictions.
We use perplexity and BLEU score to evaluate the quality of the language generated by the model. We also compare the dialogue state generated by the model s and the ground truth state s0. Two categories of the metrics are used: exact match scores and scaled scores
The inference competition & leaderboard can be found here: https://worksheets.codalab.org/worksheets/0xa79833f4b3c24f4188cee7131b120a59
The text in the dataset is in English. The BCP 47 code is en
The data is provided in two set of files. The first one has the dialogues ( air_dialogue_data ) and the knowledge-base ( air_dialogue_kb )
BuilderConfig: air_dialogue_data
{"action": {"status": "book", "name": "Emily Edwards", "flight": [1027]}, "intent": {"return_month": "June", "return_day": "14", "max_price": 200, "departure_airport": "DFW", "return_time": "afternoon", "max_connections": 1, "departure_day": "12", "goal": "book", "departure_month": "June", "name": "Emily Edwards", "return_airport": "IAD"}, "timestamps": [1519233239, 1519233244, 1519233249, 1519233252, 1519233333, 1519233374, 1519233392, 1519233416, 1519233443, 1519233448, 1519233464, 1519233513, 1519233525, 1519233540, 1519233626, 1519233628, 1519233638], "dialogue": ["customer: Hello.", "agent: Hello.", "customer: My name is Emily Edwards.", "agent: How may I help you out?", "customer: I need some help in my flight ticket reservation to attend a convocation meeting, can you please help me?", "agent: Sure, I will help you out. May I know your travelling dates please?", "customer: Thank you and my dates are 06/12 and back on 06/14.", "agent: Can I know your airport codes?", "customer: The airport codes are from DFW to IAD.", "agent: Ok, please wait a moment.", "customer: Sure.", "agent: There is a flight with connection 1 and price 200, can I proceed with this flight?", "customer: Yes, do proceed with booking.", "agent: Ok, your ticket has been booked.", "customer: Thank you for your assistance in my flight ticket reservation.", "agent: Thank you for choosing us.", "customer: You are welcome."], "expected_action": {"status": "book", "name": "Emily Edwards", "flight": [1027]}, "correct_sample": true}
BuilderConfig: air_dialogue_kb
{"kb": [{"return_airport": "DTW", "airline": "Spirit", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1000, "departure_month": "June", "departure_time_num": 17, "class": "economy", "return_time_num": 2, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "DTW", "airline": "Frontier", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1001, "departure_month": "June", "departure_time_num": 0, "class": "business", "return_time_num": 15, "return_month": "June", "return_day": "13", "num_connections": 0, "price": 500}, {"return_airport": "DTW", "airline": "JetBlue", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1002, "departure_month": "June", "departure_time_num": 0, "class": "business", "return_time_num": 13, "return_month": "June", "return_day": "13", "num_connections": 1, "price": 600}, {"return_airport": "IAD", "airline": "Hawaiian", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1003, "departure_month": "June", "departure_time_num": 6, "class": "economy", "return_time_num": 5, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "DFW", "airline": "AA", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1004, "departure_month": "June", "departure_time_num": 9, "class": "economy", "return_time_num": 11, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "IAD", "airline": "AA", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1005, "departure_month": "June", "departure_time_num": 3, "class": "economy", "return_time_num": 17, "return_month": "June", "return_day": "13", "num_connections": 1, "price": 100}, {"return_airport": "DTW", "airline": "Frontier", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1006, "departure_month": "June", "departure_time_num": 10, "class": "economy", "return_time_num": 10, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "IAD", "airline": "UA", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1007, "departure_month": "June", "departure_time_num": 14, "class": "economy", "return_time_num": 20, "return_month": "June", "return_day": "13", "num_connections": 1, "price": 100}, {"return_airport": "DFW", "airline": "AA", "departure_day": "13", "departure_airport": "DTW", "flight_number": 1008, "departure_month": "June", "departure_time_num": 6, "class": "economy", "return_time_num": 8, "return_month": "June", "return_day": "14", "num_connections": 2, "price": 400}, {"return_airport": "DFW", "airline": "Delta", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1009, "departure_month": "June", "departure_time_num": 18, "class": "economy", "return_time_num": 6, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "DFW", "airline": "Frontier", "departure_day": "13", "departure_airport": "DTW", "flight_number": 1010, "departure_month": "June", "departure_time_num": 4, "class": "economy", "return_time_num": 2, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "DFW", "airline": "Southwest", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1011, "departure_month": "June", "departure_time_num": 17, "class": "economy", "return_time_num": 22, "return_month": "June", "return_day": "13", "num_connections": 0, "price": 100}, {"return_airport": "DTW", "airline": "JetBlue", "departure_day": "11", "departure_airport": "DFW", "flight_number": 1012, "departure_month": "June", "departure_time_num": 13, "class": "economy", "return_time_num": 22, "return_month": "June", "return_day": "13", "num_connections": 1, "price": 100}, {"return_airport": "DTW", "airline": "Southwest", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1013, "departure_month": "June", "departure_time_num": 16, "class": "economy", "return_time_num": 13, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "DTW", "airline": "Delta", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1014, "departure_month": "June", "departure_time_num": 0, "class": "economy", "return_time_num": 8, "return_month": "June", "return_day": "15", "num_connections": 1, "price": 100}, {"return_airport": "DTW", "airline": "Southwest", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1015, "departure_month": "June", "departure_time_num": 17, "class": "economy", "return_time_num": 1, "return_month": "June", "return_day": "15", "num_connections": 1, "price": 300}, {"return_airport": "DTW", "airline": "UA", "departure_day": "11", "departure_airport": "DFW", "flight_number": 1016, "departure_month": "June", "departure_time_num": 10, "class": "economy", "return_time_num": 4, "return_month": "June", "return_day": "14", "num_connections": 0, "price": 200}, {"return_airport": "DFW", "airline": "AA", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1017, "departure_month": "June", "departure_time_num": 14, "class": "economy", "return_time_num": 23, "return_month": "June", "return_day": "14", "num_connections": 2, "price": 400}, {"return_airport": "DTW", "airline": "JetBlue", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1018, "departure_month": "June", "departure_time_num": 3, "class": "economy", "return_time_num": 1, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "DFW", "airline": "Hawaiian", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1019, "departure_month": "June", "departure_time_num": 7, "class": "economy", "return_time_num": 18, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "DFW", "airline": "Delta", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1020, "departure_month": "June", "departure_time_num": 6, "class": "economy", "return_time_num": 18, "return_month": "June", "return_day": "14", "num_connections": 2, "price": 200}, {"return_airport": "IAD", "airline": "Delta", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1021, "departure_month": "June", "departure_time_num": 11, "class": "business", "return_time_num": 8, "return_month": "June", "return_day": "14", "num_connections": 0, "price": 1000}, {"return_airport": "IAD", "airline": "JetBlue", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1022, "departure_month": "June", "departure_time_num": 4, "class": "economy", "return_time_num": 14, "return_month": "June", "return_day": "13", "num_connections": 0, "price": 200}, {"return_airport": "IAD", "airline": "Frontier", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1023, "departure_month": "June", "departure_time_num": 19, "class": "economy", "return_time_num": 23, "return_month": "June", "return_day": "13", "num_connections": 1, "price": 200}, {"return_airport": "DFW", "airline": "UA", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1024, "departure_month": "June", "departure_time_num": 11, "class": "economy", "return_time_num": 19, "return_month": "June", "return_day": "15", "num_connections": 1, "price": 200}, {"return_airport": "DTW", "airline": "Hawaiian", "departure_day": "11", "departure_airport": "IAD", "flight_number": 1025, "departure_month": "June", "departure_time_num": 6, "class": "economy", "return_time_num": 10, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "DTW", "airline": "UA", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1026, "departure_month": "June", "departure_time_num": 0, "class": "economy", "return_time_num": 18, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 300}, {"return_airport": "IAD", "airline": "Delta", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1027, "departure_month": "June", "departure_time_num": 17, "class": "economy", "return_time_num": 15, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "IAD", "airline": "Southwest", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1028, "departure_month": "June", "departure_time_num": 23, "class": "economy", "return_time_num": 13, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "DFW", "airline": "Spirit", "departure_day": "11", "departure_airport": "DTW", "flight_number": 1029, "departure_month": "June", "departure_time_num": 22, "class": "business", "return_time_num": 4, "return_month": "June", "return_day": "14", "num_connections": 0, "price": 800}], "reservation": 0}
BuilderConfig: air_dialogue_data : Provides for customer context, dialogue states and environment
key name | Description |
---|---|
'search_action' | search action performed by customer |
'action' | Action taken by the agent |
'intent' | Intents from the conversation |
'timestamps' | Timestamp for each of the dialogues |
'dialogue' | Dialogue recorded between agent & customer |
'expected_action' | Expected action from agent (human-annotated) |
'correct_sample' | whether action performed by agent was same as expected_action |
BuilderConfig: air_dialogue_kb : Provides for the Agent Context ca = ( db , r )
key name | Description |
---|---|
'kb' | Available flights in the database |
'reservation' | whether customer has an existing reservation |
Data is split into Train/Dev & Test in the ration of 80%, 10% and 10%
[Needs More Information]
[Needs More Information]
Who are the source language producers?[Needs More Information]
To collect this dataset, we create a contextgenerator which provides travel and flight restrictions. We then ask human annotators to play the role of a customer or an agent and interact with the goal of successfully booking a trip given the restrictions. Key to our environment is the ease of evaluating the success of the dialogue, which is achieved by using ground-truth states (e.g., the flight being booked) generated by the restrictions. Any dialogue agent that does not generate the correct states is considered to fail.
Who are the annotators?[Needs More Information]
No personal and sensitive information is stored
[Needs More Information]
[Needs More Information]
[Needs More Information]
AirDialogue team
For issues regarding HuggingFace Dataset Hub implementation Aakash Gupta
cc-by-nc-4.0
@inproceedings{wei-etal-2018-airdialogue, title = "{A}ir{D}ialogue: An Environment for Goal-Oriented Dialogue Research", author = "Wei, Wei and Le, Quoc and Dai, Andrew and Li, Jia", booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing", month = oct # "-" # nov, year = "2018", address = "Brussels, Belgium", publisher = "Association for Computational Linguistics", url = " https://www.aclweb.org/anthology/D18-1419" , doi = "10.18653/v1/D18-1419", pages = "3844--3854", abstract = "Recent progress in dialogue generation has inspired a number of studies on dialogue systems that are capable of accomplishing tasks through natural language interactions. A promising direction among these studies is the use of reinforcement learning techniques, such as self-play, for training dialogue agents. However, current datasets are limited in size, and the environment for training agents and evaluating progress is relatively unsophisticated. We present AirDialogue, a large dataset that contains 301,427 goal-oriented conversations. To collect this dataset, we create a context-generator which provides travel and flight restrictions. We then ask human annotators to play the role of a customer or an agent and interact with the goal of successfully booking a trip given the restrictions. Key to our environment is the ease of evaluating the success of the dialogue, which is achieved by using ground-truth states (e.g., the flight being booked) generated by the restrictions. Any dialogue agent that does not generate the correct states is considered to fail. Our experimental results indicate that state-of-the-art dialogue models can only achieve a score of 0.17 while humans can reach a score of 0.91, which suggests significant opportunities for future improvement.", }
Thanks to @skyprince999 for adding this dataset.