Note: This model has been moved to linhd-postdata/alberti-bert-base-multilingual-cased
ALBERTI is a set of two BERT-based multilingual model for poetry. One for verses and another one for stanzas. This model has been further trained with the PULPO corpus for verses using Flax , including training scripts.
This is part of the Flax/Jax Community Week , organised by HuggingFace and TPU usage sponsored by Google.
PULPO, the Prodigious Unannotated Literary Poetry Corpus, is a set of multilingual corpora of verses and stanzas with over 95M words.
The following corpora has been downloaded using the Averell tool, developed by the POSTDATA team:
Also, we obtained the following corpora from these sources:
This project would not have been possible without the infrastructure and resources provided by HuggingFace and Google Cloud. Moreover, we want to thank POSTDATA Project (ERC-StG-679528) and the Computational Literary Studies Infrastructure (CLS INFRA No. 101004984) of the European Union's Horizon 2020 research and innovation programme for their support and time allowance.