数据集:

shibing624/AdvertiseGen

中文

Dataset Card for AdvertiseGen

Dataset Description

数据集介绍

AdvertiseGen是电商广告文案生成数据集。

AdvertiseGen以商品网页的标签与文案的信息对应关系为基础构造,是典型的开放式生成任务,在模型基于key-value输入生成开放式文案时,与输入信息的事实一致性需要得到重点关注。

  • 任务描述:给定商品信息的关键词和属性列表kv-list,生成适合该商品的广告文案adv;
  • 数据规模:训练集114k,验证集1k,测试集3k;
  • 数据来源:清华大学CoAI小组;

Supported Tasks and Leaderboards

The dataset designed for generate e-commerce advertise.

Languages

The data in AdvertiseGen are in Chinese.

Dataset Structure

Data Instances

An example of "train" looks as follows:

{
  "content": "类型#上衣*材质#牛仔布*颜色#白色*风格#简约*图案#刺绣*衣样式#外套*衣款式#破洞",
  "summary": "简约而不简单的牛仔外套,白色的衣身十分百搭。衣身多处有做旧破洞设计,打破单调乏味,增加一丝造型看点。衣身后背处有趣味刺绣装饰,丰富层次感,彰显别样时尚。"
}

Citation Information

数据集引用

如在学术论文中使用本数据集,请添加相关引用说明,具体如下:

Shao, Zhihong, et al. "Long and Diverse Text Generation with Planning-based Hierarchical Variational Model." Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019.