数据集:
shunk031/CAMERA
任务:
文本生成语言:
ja计算机处理:
monolingual语言创建人:
found批注创建人:
crowdsourced源数据集:
original许可:
cc-by-nc-sa-4.0From the official README.md :
CAMERA (CyberAgent Multimodal Evaluation for Ad Text GeneRAtion) is the Japanese ad text generation dataset. We hope that our dataset will be useful in research for realizing more advanced ad text generation models.
[More Information Needed]
Supported Tasks[More Information Needed]
Leaderboard[More Information Needed]
The language data in CAMERA is in Japanese ( BCP-47 ja-JP ).
When loading a specific configuration, users has to append a version dependent suffix:
without-lp-imagesfrom datasets import load_dataset dataset = load_dataset("shunk031/CAMERA", name="without-lp-images") print(dataset) # DatasetDict({ # train: Dataset({ # features: ['asset_id', 'kw', 'lp_meta_description', 'title_org', 'title_ne1', 'title_ne2', 'title_ne3', 'domain', 'parsed_full_text_annotation'], # num_rows: 12395 # }) # validation: Dataset({ # features: ['asset_id', 'kw', 'lp_meta_description', 'title_org', 'title_ne1', 'title_ne2', 'title_ne3', 'domain', 'parsed_full_text_annotation'], # num_rows: 3098 # }) # test: Dataset({ # features: ['asset_id', 'kw', 'lp_meta_description', 'title_org', 'title_ne1', 'title_ne2', 'title_ne3', 'domain', 'parsed_full_text_annotation'], # num_rows: 872 # }) # })
An example of the CAMERA (w/o LP images) dataset looks as follows:
{ "asset_id": 13861, "kw": "仙台 ホテル", "lp_meta_description": "仙台のホテルや旅館をお探しなら楽天トラベルへ!楽天ポイントが使えて、貯まって、とってもお得な宿泊予約サイトです。さらに割引クーポンも使える!国内ツアー・航空券・レンタカー・バス予約も!", "title_org": "仙台市のホテル", "title_ne1": "", "title_ne2": "", "title_ne3": "", "domain": "", "parsed_full_text_annotation": { "text": [ "trivago", "Oops...AccessDenied 可", "Youarenotallowedtoviewthispage!Ifyouthinkthisisanerror,pleasecontacttrivago.", "Errorcode:0.3c99e86e.1672026945.25ba640YourIP:240d:1a:4d8:2800:b9b0:ea86:2087:d141AffectedURL:https://www.trivago.jp/ja/odr/%E8%BB%92", "%E4%BB%99%E5%8F%B0-%E5%9B%BD%E5%86%85?search=20072325", "Backtotrivago" ], "xmax": [ 653, 838, 765, 773, 815, 649 ], "xmin": [ 547, 357, 433, 420, 378, 550 ], "ymax": [ 47, 390, 475, 558, 598, 663 ], "ymin": [ 18, 198, 439, 504, 566, 651 ] } }with-lp-images
from datasets import load_dataset dataset = load_dataset("shunk031/CAMERA", name="with-lp-images") print(dataset) # DatasetDict({ # train: Dataset({ # features: ['asset_id', 'kw', 'lp_meta_description', 'title_org', 'title_ne1', 'title_ne2', 'title_ne3', 'domain', 'parsed_full_text_annotation', 'lp_image'], # num_rows: 12395 # }) # validation: Dataset({ # features: ['asset_id', 'kw', 'lp_meta_description', 'title_org', 'title_ne1', 'title_ne2', 'title_ne3', 'domain', 'parsed_full_text_annotation', 'lp_image'], # num_rows: 3098 # }) # test: Dataset({ # features: ['asset_id', 'kw', 'lp_meta_description', 'title_org', 'title_ne1', 'title_ne2', 'title_ne3', 'domain', 'parsed_full_text_annotation', 'lp_image'], # num_rows: 872 # }) # })
An example of the CAMERA (w/ LP images) dataset looks as follows:
{ "asset_id": 13861, "kw": "仙台 ホテル", "lp_meta_description": "仙台のホテルや旅館をお探しなら楽天トラベルへ!楽天ポイントが使えて、貯まって、とってもお得な宿泊予約サイトです。さらに割引クーポンも使える!国内ツアー・航空券・レンタカー・バス予約も!", "title_org": "仙台市のホテル", "title_ne1": "", "title_ne2": "", "title_ne3": "", "domain": "", "parsed_full_text_annotation": { "text": [ "trivago", "Oops...AccessDenied 可", "Youarenotallowedtoviewthispage!Ifyouthinkthisisanerror,pleasecontacttrivago.", "Errorcode:0.3c99e86e.1672026945.25ba640YourIP:240d:1a:4d8:2800:b9b0:ea86:2087:d141AffectedURL:https://www.trivago.jp/ja/odr/%E8%BB%92", "%E4%BB%99%E5%8F%B0-%E5%9B%BD%E5%86%85?search=20072325", "Backtotrivago" ], "xmax": [ 653, 838, 765, 773, 815, 649 ], "xmin": [ 547, 357, 433, 420, 378, 550 ], "ymax": [ 47, 390, 475, 558, 598, 663 ], "ymin": [ 18, 198, 439, 504, 566, 651 ] }, "lp_image": <PIL.PngImagePlugin.PngImageFile image mode=RGBA size=1200x680 at 0x7F8513446B20> }
From the official paper :
Split | # of data | # of reference ad text | industry domain label |
---|---|---|---|
Train | 12,395 | 1 | - |
Valid | 3,098 | 1 | - |
Test | 869 | 4 | ✔ |
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
@inproceedings{mita-et-al:nlp2023, author = "三田 雅人 and 村上 聡一朗 and 張 培楠", title = "広告文生成タスクの規定とベンチマーク構築", booktitle = "言語処理学会 第 29 回年次大会", year = 2023, }
Thanks to Masato Mita , Soichiro Murakami , and Peinan Zhang for creating this dataset.