数据集:

LanceaKing/asvspoof2019

语言:

en

计算机处理:

monolingual

大小:

100K<n<1M

语言创建人:

other

批注创建人:

other

源数据集:

extended|vctk

预印本库:

arxiv:1911.01601

许可:

odc-by
中文

Dataset Card for asvspoof2019

Dataset Summary

This is a database used for the Third Automatic Speaker Verification Spoofing and Countermeasuers Challenge, for short, ASVspoof 2019 ( http://www.asvspoof.org ) organized by Junichi Yamagishi, Massimiliano Todisco, Md Sahidullah, Héctor Delgado, Xin Wang, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Ville Vestman, and Andreas Nautsch in 2019.

Supported Tasks and Leaderboards

[Needs More Information]

Languages

English

Dataset Structure

Data Instances

{'speaker_id': 'LA_0091',
 'audio_file_name': 'LA_T_8529430',
 'audio': {'path': 'D:/Users/80304531/.cache/huggingface/datasets/downloads/extracted/8cabb6d5c283b0ed94b2219a8d459fea8e972ce098ef14d8e5a97b181f850502/LA/ASVspoof2019_LA_train/flac/LA_T_8529430.flac',
  'array': array([-0.00201416, -0.00234985, -0.0022583 , ...,  0.01309204,
          0.01339722,  0.01461792], dtype=float32),
  'sampling_rate': 16000},
 'system_id': 'A01',
 'key': 1}

Data Fields

Logical access (LA):

  • speaker_id : LA_**** , a 4-digit speaker ID
  • audio_file_name : name of the audio file
  • audio : A dictionary containing the path to the downloaded audio file, the decoded audio array, and the sampling rate. Note that when accessing the audio column: dataset[0]["audio"] the audio file is automatically decoded and resampled to dataset.features["audio"].sampling_rate . Decoding and resampling of a large number of audio files might take a significant amount of time. Thus it is important to first query the sample index before the "audio" column, i.e. dataset[0]["audio"] should always be preferred over dataset["audio"][0] .
  • system_id : ID of the speech spoofing system (A01 - A19), or, for bonafide speech SYSTEM-ID is left blank ('-')
  • key : 'bonafide' for genuine speech, or, 'spoof' for spoofing speech

Physical access (PA):

  • speaker_id : PA_**** , a 4-digit speaker ID

  • audio_file_name : name of the audio file

  • audio : A dictionary containing the path to the downloaded audio file, the decoded audio array, and the sampling rate. Note that when accessing the audio column: dataset[0]["audio"] the audio file is automatically decoded and resampled to dataset.features["audio"].sampling_rate . Decoding and resampling of a large number of audio files might take a significant amount of time. Thus it is important to first query the sample index before the "audio" column, i.e. dataset[0]["audio"] should always be preferred over dataset["audio"][0] .

  • environment_id : a triplet (S,R,D_s), which take one letter in the set {a,b,c} as categorical value, defined as

    a b c
    S: Room size (square meters) 2-5 5-10 10-20
    R: T60 (ms) 50-200 200-600 600-1000
    D_s: Talker-to-ASV distance (cm) 10-50 50-100 100-150
  • attack_id : a duple (D_a,Q), which take one letter in the set {A,B,C} as categorical value, defined as

    A B C
    Z: Attacker-to-talker distance (cm) 10-50 50-100 > 100
    Q: Replay device quality perfect high low

    for bonafide speech, attack_id is left blank ('-')

  • key : 'bonafide' for genuine speech, or, 'spoof' for spoofing speech

Data Splits

Training set Development set Evaluation set
Bonafide 2580 2548 7355
Spoof 22800 22296 63882
Total 25380 24844 71237

Dataset Creation

Curation Rationale

[Needs More Information]

Source Data

Initial Data Collection and Normalization

[Needs More Information]

Who are the source language producers?

[Needs More Information]

Annotations

Annotation process

[Needs More Information]

Who are the annotators?

[Needs More Information]

Personal and Sensitive Information

[Needs More Information]

Considerations for Using the Data

Social Impact of Dataset

[Needs More Information]

Discussion of Biases

[Needs More Information]

Other Known Limitations

[Needs More Information]

Additional Information

Dataset Curators

[Needs More Information]

Licensing Information

This ASVspoof 2019 dataset is made available under the Open Data Commons Attribution License: http://opendatacommons.org/licenses/by/1.0/

Citation Information

@InProceedings{Todisco2019,
  Title     = {{ASV}spoof 2019: {F}uture {H}orizons in {S}poofed and {F}ake {A}udio {D}etection},
  Author    = {Todisco, Massimiliano and
               Wang, Xin and
               Sahidullah, Md and
               Delgado, H ́ector and
               Nautsch, Andreas and
               Yamagishi, Junichi and
               Evans, Nicholas and
               Kinnunen, Tomi and
               Lee, Kong Aik},
  booktitle = {Proc. of Interspeech 2019},
  Year      = {2019}
}