数据集:

a6kme/minds14-mirror

任务:

自动语音识别

子任务:

keyword-spotting

语言:

计算机处理:

multilingual

大小:

10K<n<100K

语言创建人:

crowdsourced expert-generated

批注创建人:

expert-generated crowdsourced machine-generated

预印本库:

arxiv:2104.08524

许可:

cc-by-4.0

数据集介绍文件清单

中文

MInDS-14

MINDS-14 is training and evaluation resource for intent detection task with spoken data. It covers 14 intents extracted from a commercial system in the e-banking domain, associated with spoken examples in 14 diverse language varieties.

Example

MInDS-14 can be downloaded and used as follows:

from datasets import load_dataset

minds_14 = load_dataset("PolyAI/minds14", "fr-FR") # for French
# to download all data for multi-lingual fine-tuning uncomment following line
# minds_14 = load_dataset("PolyAI/all", "all")

# see structure
print(minds_14)

# load audio sample on the fly
audio_input = minds_14["train"][0]["audio"]  # first decoded audio sample
intent_class = minds_14["train"][0]["intent_class"]  # first transcription
intent = minds_14["train"].features["intent_class"].names[intent_class]

# use audio_input and language_class to fine-tune your model for audio classification

Dataset Structure

We show detailed information the example configurations fr-FR of the dataset. All other configurations have the same structure.

Data Instances

fr-FR

Size of downloaded dataset files: 471 MB
Size of the generated dataset: 300 KB
Total amount of disk used: 471 MB

An example of a datainstance of the config fr-FR looks as follows:

{
    "path": "/home/patrick/.cache/huggingface/datasets/downloads/extracted/3ebe2265b2f102203be5e64fa8e533e0c6742e72268772c8ac1834c5a1a921e3/fr-FR~ADDRESS/response_4.wav",
    "audio": {
        "path": "/home/patrick/.cache/huggingface/datasets/downloads/extracted/3ebe2265b2f102203be5e64fa8e533e0c6742e72268772c8ac1834c5a1a921e3/fr-FR~ADDRESS/response_4.wav",
        "array": array(
            [0.0, 0.0, 0.0, ..., 0.0, 0.00048828, -0.00024414], dtype=float32
        ),
        "sampling_rate": 8000,
    },
    "transcription": "je souhaite changer mon adresse",
    "english_transcription": "I want to change my address",
    "intent_class": 1,
    "lang_id": 6,
}

Data Fields

The data fields are the same among all splits.

path (str): Path to the audio file
audio (dict): Audio object including loaded audio array, sampling rate and path ot audio
transcription (str): Transcription of the audio file
english_transcription (str): English transcription of the audio file
intent_class (int): Class id of intent
lang_id (int): Id of language

Data Splits

Every config only has the "train" split containing of ca. 600 examples.

Dataset Creation

More Information Needed

Considerations for Using the Data

Additional Information

Dataset Curators

More Information Needed

Licensing Information

All datasets are licensed under the Creative Commons license (CC-BY) .

Citation Information

@article{DBLP:journals/corr/abs-2104-08524,
  author    = {Daniela Gerz and
               Pei{-}Hao Su and
               Razvan Kusztos and
               Avishek Mondal and
               Michal Lis and
               Eshan Singhal and
               Nikola Mrksic and
               Tsung{-}Hsien Wen and
               Ivan Vulic},
  title     = {Multilingual and Cross-Lingual Intent Detection from Spoken Data},
  journal   = {CoRR},
  volume    = {abs/2104.08524},
  year      = {2021},
  url       = {https://arxiv.org/abs/2104.08524},
  eprinttype = {arXiv},
  eprint    = {2104.08524},
  timestamp = {Mon, 26 Apr 2021 17:25:10 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2104-08524.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Contributions

Thanks to @patrickvonplaten for adding this dataset

作者:

a6kme

数据集大小:

12.57 KB