数据集:

sander-wood/massive_abcnotation_dataset

语言:

en

大小:

100K<n<1M

其他:

music

许可:

mit
中文

Dataset Summary

The Massive ABC notation Dataset (MABCD) used to train and evaluate TunesFormer is collected from two sources: The Session and ABCnotation.com . The Session is a community website focused on Irish traditional music, while ABCnotation.com is a website that provides a standard for folk and traditional music notation in the form of ASCII text files. Both of them provide a platform for sharing folk and traditional music. The combined dataset consists of 285,449 ABC tunes, with 99% (282,595) of the tunes used as the training set and the remaining 1% (2854) used as the evaluation set.

Control codes are symbols that are added to the ABC notation representation to indicate the desired musical form of the generated melodies. We add the following control codes to each ABC tune in the dataset through an automated process to indicate its musical form:

  • Number of Bars (NB): controls the number of bars in a section of the melody. For example, users could specify that they want a section to contain 8 bars, and TunesFormer would generate a section that fits within that structure. It counts on the bar symbol | .
  • Number of Sections (NS): controls the number of sections in the entire melody. This can be used to create a sense of structure and coherence within the melody, as different sections can be used to create musical themes or motifs. It counts on several symbols that are commonly used in ABC notation and can be used to represent section boundaries: [| , || , |] , |: , :: , and :| .
  • Edit Distance Similarity (EDS): controls the similarity level between the current section and a previous section in the melody.

To ensure consistency and standardization among the ABC tunes in the dataset, we first converted them all into MusicXML format and then re-converted them back into ABC notation. In order to focus solely on the musical content, we removed any natural language elements (such as titles, composers, and lyrics) and unnecessary information (such as reference numbers and sources).

ABC notation can be converted to sheet music or audio using this website , or this software .

Copyright Disclaimer

The dataset is provided solely for research purposes and is not intended for commercial use. While most of the tunes in the dataset are freely shared, some may be protected by copyright. It is the responsibility of the user to determine the copyright status of each tune and obtain any necessary permissions before using the data.

If you are the copyright owner of any tune included in the MABCD and have concerns about its inclusion, please contact us at shangda@mail.ccom.edu.cn to have it removed.

Special Thanks

We would like to extend a special thanks to abcnotation.com and thesession.org for their contributions to the development and promotion of ABC notation, as well as their contributions to the field of music information retrieval. Their platforms have provided invaluable resources for the traditional and folk music community, and have made it possible for researchers like us to create and study large datasets like the Massive ABC notation Dataset.