数据集:

cvssp/WavCaps

语言:

en

预印本库:

arxiv:2303.17395

许可:

cc-by-4.0

大小:

100B<n<1T
中文

WavCaps

WavCaps is a ChatGPT-assisted weakly-labelled audio captioning dataset for audio-language multimodal research, where the audio clips are sourced from three websites ( FreeSound , BBC Sound Effects , and SoundBible ) and a sound event detection dataset ( AudioSet Strongly-labelled Subset ).

Statistics

Data Source # audio avg. audio duration (s) avg. text length
FreeSound 262300 85.98 6.77
BBC Sound Effects 31201 115.04 9.67
SoundBible 1232 13.12 5.87
AudioSet SL subset 108317 10.00 9.79
WavCaps 403050 67.59 7.80

Download

We provide a json file for each data source. For audio clips sourced from websites, we provide processed caption, raw description, as well as other metadata. For audio clips from AudioSet, we use the version from PANNs, where each file name is appended with a 'Y' at the start. For the start time, please refer to the original metadata of AudioSet SL subset.

Waveforms with flac format can be downloaded through Zip_files directory.

Pretrained models can be downloaded here .

If you get "error: invalid zip file with overlapped components (possible zip bomb)" when unzipping, please try the following commands:

zip -F AudioSet_SL.zip --out AS.zip

unzip AS.zip

License

Only academic uses are allowed for WavCaps dataset. By downloading audio clips through the links provided in the json files, you agree that you will use the audios for research purposes only. For credits for audio clips from FreeSound, please refer to its own page.

For detailed license information, please refer to: FreeSound , BBC Sound Effects , SoundBible

The models we provided are created under a UK data copyright exemption for non-commercial research.

Code for related tasks

We provide codes and pre-trained models for audio-language retrieval, automated audio captioning, and zero-shot audio classification.

Citation

Please cite the following if you make use of the dataset.

@article{mei2023wavcaps,
  title={WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research},
  author={Mei, Xinhao and Meng, Chutong and Liu, Haohe and Kong, Qiuqiang and Ko, Tom and Zhao, Chengqi and Plumbley, Mark D and Zou, Yuexian and Wang, Wenwu},
  journal={arXiv preprint arXiv:2303.17395},
  year={2023}
}