数据集:
bsd_ja_en
任务:
翻译计算机处理:
translation大小:
10K<n<100K语言创建人:
expert-generated批注创建人:
expert-generated源数据集:
original许可:
cc-by-nc-sa-4.0This is the Business Scene Dialogue (BSD) dataset, a Japanese-English parallel corpus containing written conversations in various business scenarios.
The dataset was constructed in 3 steps:
Half of the monolingual scenarios were written in Japanese and the other half were written in English.
[More Information Needed]
English, Japanese.
Each instance contains a conversation identifier, a sentence number that indicates its position within the conversation, speaker name in English and Japanese, text in English and Japanese, original language, scene of the scenario (tag), and title of the scenario (title).
{ "id": "190315_E004_13", "no": 14, "speaker": "Mr. Sam Lee", "ja_speaker": "サム リーさん", "en_sentence": "Would you guys consider a different scheme?", "ja_sentence": "別の事業案も考慮されますか?", "original_language": "en", "tag": "phone call", "title": "Phone: Review spec and scheme" }
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Dataset provided for research purposes only. Please check dataset license for additional information.
[More Information Needed]
This dataset was released under the Creative Commons Attribution-NonCommercial-ShareAlike (CC BY-NC-SA) license.
@inproceedings{rikters-etal-2019-designing, title = "Designing the Business Conversation Corpus", author = "Rikters, Mat{\=\i}ss and Ri, Ryokan and Li, Tong and Nakazawa, Toshiaki", booktitle = "Proceedings of the 6th Workshop on Asian Translation", month = nov, year = "2019", address = "Hong Kong, China", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/D19-5204", doi = "10.18653/v1/D19-5204", pages = "54--61" }
Thanks to @j-chim for adding this dataset.