The Massive ABC notation Dataset (MABCD) used to train and evaluate TunesFormer is collected from two sources: The Session and ABCnotation.com . The Session is a community website focused on Irish traditional music, while ABCnotation.com is a website that provides a standard for folk and traditional music notation in the form of ASCII text files. Both of them provide a platform for sharing folk and traditional music. The combined dataset consists of 285,449 ABC tunes, with 99% (282,595) of the tunes used as the training set and the remaining 1% (2854) used as the evaluation set.
Control codes are symbols that are added to the ABC notation representation to indicate the desired musical form of the generated melodies. We add the following control codes to each ABC tune in the dataset through an automated process to indicate its musical form:
To ensure consistency and standardization among the ABC tunes in the dataset, we first converted them all into MusicXML format and then re-converted them back into ABC notation. In order to focus solely on the musical content, we removed any natural language elements (such as titles, composers, and lyrics) and unnecessary information (such as reference numbers and sources).
ABC notation can be converted to sheet music or audio using this website , or this software .
The dataset is provided solely for research purposes and is not intended for commercial use. While most of the tunes in the dataset are freely shared, some may be protected by copyright. It is the responsibility of the user to determine the copyright status of each tune and obtain any necessary permissions before using the data.
If you are the copyright owner of any tune included in the MABCD and have concerns about its inclusion, please contact us at shangda@mail.ccom.edu.cn to have it removed.
We would like to extend a special thanks to abcnotation.com and thesession.org for their contributions to the development and promotion of ABC notation, as well as their contributions to the field of music information retrieval. Their platforms have provided invaluable resources for the traditional and folk music community, and have made it possible for researchers like us to create and study large datasets like the Massive ABC notation Dataset.