数据集:
Jzuluaga/atco2_corpus_1h
ATCO2 project aims at developing a unique platform allowing to collect, organize and pre-process air-traffic control (voice communication) data from air space. This project has received funding from the Clean Sky 2 Joint Undertaking (JU) under grant agreement No 864702. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and the Clean Sky 2 JU members other than the Union.
The project collected the real-time voice communication between air-traffic controllers and pilots available either directly through publicly accessible radio frequency channels or indirectly from air-navigation service providers (ANSPs). In addition to the voice communication data, contextual information is available in a form of metadata (i.e. surveillance data). The dataset consists of two distinct packages:
The text and the recordings are in English. For more information see Table 3 and Table 4 of ATCO2 corpus paper
The licensing status of the ATCO2-test-set-1h corpus is in the file ATCO2-ASRdataset-v1_beta - End-User Data Agreement in the data folder. Download the data in ATCO2 project homepage
Contributors who prepared, processed, normalized and uploaded the dataset in HuggingFace:
@article{zuluaga2022how, title={How Does Pre-trained Wav2Vec2. 0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications}, author={Zuluaga-Gomez, Juan and Prasad, Amrutha and Nigmatulina, Iuliia and Sarfjoo, Saeed and others}, journal={IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar}, year={2022} } @article{zuluaga2022bertraffic, title={BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications}, author={Zuluaga-Gomez, Juan and Sarfjoo, Seyyed Saeed and Prasad, Amrutha and others}, journal={IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar}, year={2022} } @article{zuluaga2022atco2, title={ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications}, author={Zuluaga-Gomez, Juan and Vesel{\`y}, Karel and Sz{\"o}ke, Igor and Motlicek, Petr and others}, journal={arXiv preprint arXiv:2211.04054}, year={2022} }