数据集:
GEM/conversational_weather
任务:
表格到文本语言:
en计算机处理:
unknown语言创建人:
unknown批注创建人:
none源数据集:
original其他:
data-to-text许可:
cc-by-nc-4.0You can find the main data card on the GEM Website .
The purpose of this dataset is to assess how well a model can learn a template-like structure in a very low data setting. The task here is to produce a response to a weather-related query. The reply is further specified through the data attributes and discourse structure in the input. The output contains both the lexicalized text and discourse markers for attributes (e.g., _ARG_TEMP_ 34 ).
You can load the dataset via:
import datasets data = datasets.load_dataset('GEM/conversational_weather')
The data loader can be found here .
paper authorsAnusha Balakrishnan, Jinfeng Rao, Kartikeya Upasani, Michael White, Rajen Subba (Facebook Conversational AI)
@inproceedings{balakrishnan-etal-2019-constrained, title = "Constrained Decoding for Neural {NLG} from Compositional Representations in Task-Oriented Dialogue", author = "Balakrishnan, Anusha and Rao, Jinfeng and Upasani, Kartikeya and White, Michael and Subba, Rajen", booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics", month = jul, year = "2019", address = "Florence, Italy", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/P19-1080", doi = "10.18653/v1/P19-1080", pages = "831--844" }Contact Name
Kartikeya Upasani
Contact Emailkart@fb.com
Has a Leaderboard?no
no
Covered LanguagesEnglish
Licensecc-by-nc-4.0: Creative Commons Attribution Non Commercial 4.0 International
Intended UseThis dataset is intended to help develop conversational agents that exhibit human-like properties such as matching the framing of the response with the query or contrasting relevant data attributes.
Primary TaskData-to-Text
Communicative GoalProducing a text that is a response to a weather query as per the discourse structure and data attributes specified in the input meaning representation.
industry
Curation Organization(s)Anusha Balakrishnan, Jinfeng Rao, Kartikeya Upasani, Michael White, Rajen Subba (Facebook Conversational AI)
FundingVipul Raheja (Grammarly)
{'gem_id': 'weather-train-11', 'id': '1108963', 'synthetic_user_context': '[__DG_INFORM__ [__ARG_TASK__ get_forecast ] ' '[__ARG_TEMP__ 37 ] [__ARG_TEMP_UNIT__ fahrenheit ] ' '[__ARG_CLOUD_COVERAGE__ partly cloudy ] ' '[__ARG_DATE_TIME__ [__ARG_COLLOQUIAL__ currently ] ' '] [__ARG_LOCATION__ [__ARG_CITY__ Oakland ] ' '[__ARG_COUNTRY__ United States ] [__ARG_REGION__ ' 'California ] ] ] [__DG_INFORM__ [__ARG_TASK__ ' 'get_forecast ] [__ARG_TEMP_SUMMARY__ mid 40s ] ' '[__ARG_DATE_TIME_RANGE__ [__ARG_COLLOQUIAL__ This ' 'afternoon ] ] [__ARG_LOCATION__ [__ARG_CITY__ ' 'Oakland ] [__ARG_COUNTRY__ United States ] ' '[__ARG_REGION__ California ] ] ] [__DG_INFORM__ ' '[__ARG_TASK__ get_forecast ] ' '[__ARG_CLOUD_COVERAGE__ mostly sunny ] ' '[__ARG_DATE_TIME_RANGE__ [__ARG_COLLOQUIAL__ This ' 'afternoon ] ] [__ARG_LOCATION__ [__ARG_CITY__ ' 'Oakland ] [__ARG_COUNTRY__ United States ] ' '[__ARG_REGION__ California ] ] ]', 'tree_str_mr': "[__DG_INFORM__ It's [__ARG_DATE_TIME__ [__ARG_COLLOQUIAL__ " 'currently ] ] [__ARG_CLOUD_COVERAGE__ partly cloudy ] and ' '[__ARG_TEMP__ __ARG_TEMP__ ] [__ARG_TEMP_UNIT__ ' '__ARG_TEMP_UNIT__ ] [__ARG_LOCATION__ in [__ARG_CITY__ ' '__ARG_CITY__ ] , [__ARG_REGION__ __ARG_REGION__ ] , ' '[__ARG_COUNTRY__ __ARG_COUNTRY__ ] ] . ] [__DG_INFORM__ ' '[__ARG_DATE_TIME_RANGE__ [__ARG_COLLOQUIAL__ This afternoon ] ' "] , it'll be [__ARG_CLOUD_COVERAGE__ mostly sunny ] ] " '[__DG_INFORM__ with temperatures in the [__ARG_TEMP_SUMMARY__ ' 'mid <number> ] ]', 'user_query': 'Show weather forecast for Oakland, CA. '}Data Splits
The test set contains 3,121 examples, of which 1.1K (35%) have unique MRs that have never been seen in the training set.
{'gem_id': 'weather-train-13333', 'data_id': '1260610', 'user_query': 'Sundown', 'tree_str_mr': '[__DG_INFORM__ [__ARG_TASK__ get_weather_attribute ] [__ARG_SUNSET_TIME_DATE_TIME__ [__ARG_TIME__ 05:04 PM ] ] ]', 'response': '[__DG_INFORM__ The sun will go down at [__ARG_SUNSET_TIME_DATE_TIME__ [__ARG_TIME__ __ARG_TIME__ ] ] ]'}
The dataset was curated to develop a weather bot that exhibits human-like properties such as matching the framing of the response with the query or contrasting relevant data attributes.
The dataset offers rich tree-based meaning representations that offer fine-grained control over the response, e.g. by specifying which two attributes are to be contrasted. The natural language input queries are also provided to model the coherence of the response based on the input. The output response is annotated with the input meaning components using special bracketing tokens, which enables developing new techniques such as constrained decoding to improve quality of output responses
Similar Datasetsno
Ability that the Dataset measuresAdequately expressing CONTRAST and JUSTIFY discourse relations with appropriate grouping of arguments; adequately generalizing to many combinations of arguments.
yes
GEM Modificationsdata points removed
Modification DetailsThe original repo contained a challenge set disc_test.tsv, which is a subset of the test set consisting of discourse relations (CONTRAST and JUSTIFY) , but also contained JOIN relations. This discrepancy has been rectified in the GEM version. The rectified version has been added in the challenge_sets
Additional Splits?no
Adequately expressing CONTRAST and JUSTIFY discourse relations with appropriate grouping of arguments; adequately generalizing to many combinations of arguments.
MetricsBLEU , Other: Other Metrics
Other MetricsTree accuracy: It measures whether the tree structure in the prediction matches that of the input MR exactly (modulo repeated arguments that need only appear once).
Proposed EvaluationAutomatic metrics are evaluated on the raw model predictions (which have de-lexicalized fields):
Authors also performed human evaluation studies by asking annotators to evaluate the quality of responses produced by different models. Annotators provided binary ratings on the following dimensions: • Grammaticality: Measures fluency of the responses. • Correctness: Measures semantic correctness of the responses.
Previous results available?no
The dataset was curated to develop a weather bot that exhibits human-like properties such as matching the framing of the response with the query or contrasting relevant data attributes. To achieve this, the dataset contains rich tree-structured meaning representations that are specified using several data arguments and discourse acts, the input natural language queries, and annotations for the responses.
Communicative GoalProducing a text that is a response to a weather query as per the discourse structure and data attributes specified in the input meaning representation.
Sourced from Different Sourcesno
Crowdsourced , Machine-generated
Where was it crowdsourced?Other crowdworker platform
Topics CoveredThe dataset is focused on the weather domain: Weather was the first successful case of NLG put into production back in the 80s (Reiter & Dale, 1997). This domain offers significant complexity for NLG. Weather forecast summaries in particular can be very long, and require reasoning over several disjoint pieces of information.
Data Validationvalidated by crowdworker
Data PreprocessingPlease refer to Appendix D of the original paper for details.
Was Data Filtered?hybrid
Filter CriteriaPlease refer to Appendix C of the original paper for details.
none
Annotation Service?no
no
Justification for Using the DataAnnotation was done as work for hire and contains no PII.
no PII
Justification for no PIIData is simulated and not specific to annotator.
no
no
no
unsure
Are the Language Producers Representative of the Language?Grammatical evaluations performed with the data to date have used norms from informal Standard American English. These prescriptive notions of grammaticality potentially serve to perpetuate systemic power imbalances as they’re conveyed by language.
Since the data only contains informal Standard American English, its use to train a model may not be appropriate depending on the potential use case.
Annotation was done as work for hire and contains no PII. Annotated data is simulated and not specific to annotator.
An imperfect model used to convey actual weather data could mislead users about weather conditions?