RoBERTa Tagalog Large

Tagalog RoBERTa trained as an improvement over our previous Tagalog pretrained Transformers. Trained with TLUnified, a newer, larger, more topically-varied pretraining corpus for Filipino. This model is part of a larger research project. We open-source the model to allow greater usage within the Filipino NLP community.

This model is a cased model. We do not release uncased RoBERTa models.

Citations

All model details and training setups can be found in our papers. If you use our model or find it useful in your projects, please cite our work:

@article{cruz2021improving,
  title={Improving Large-scale Language Models and Resources for Filipino},
  author={Jan Christian Blaise Cruz and Charibeth Cheng},
  journal={arXiv preprint arXiv:2111.06053},
  year={2021}
}

Data and Other Resources

Data used to train this model as well as other benchmark datasets in Filipino can be found in my website at https://blaisecruz.com

Contact

If you have questions, concerns, or if you just want to chat about NLP and low-resource languages in general, you may reach me through my work email at me@blaisecruz.com

作者:

Jan Christian Blaise Cruz

数据集大小:

2.61 GB