模型:
pysentimiento/robertuito-pos
Repository: https://github.com/pysentimiento/pysentimiento/
Model trained with the Spanish/English split of the LinCE NER corpus , a code-switched benchmark . Base model is RoBERTuito , a RoBERTa model trained in Spanish tweets.
If you want to use this model, we suggest you use it directly from the pysentimiento library as it is not working properly with the pipeline due to tokenization issues
from pysentimiento import create_analyzer pos_analyzer = create_analyzer("pos", lang="es") pos_analyzer.predict("Quiero que esto funcione correctamente! @perezjotaeme") >[{'type': 'PROPN', 'text': 'Quiero', 'start': 0, 'end': 6}, > {'type': 'SCONJ', 'text': 'que', 'start': 7, 'end': 10}, > {'type': 'PRON', 'text': 'esto', 'start': 11, 'end': 15}, > {'type': 'VERB', 'text': 'funcione', 'start': 16, 'end': 24}, > {'type': 'ADV', 'text': 'correctamente', 'start': 25, 'end': 38}, > {'type': 'PUNCT', 'text': '!', 'start': 38, 'end': 39}, > {'type': 'NOUN', 'text': '@perezjotaeme', 'start': 40, 'end': 53}]
Results are taken from the LinCE leaderboard
Model | Sentiment | NER | POS |
---|---|---|---|
RoBERTuito | 60.6 | 68.5 | 97.2 |
XLM Large | -- | 69.5 | 97.2 |
XLM Base | -- | 64.9 | 97.0 |
C2S mBERT | 59.1 | 64.6 | 96.9 |
mBERT | 56.4 | 64.0 | 97.1 |
BERT | 58.4 | 61.1 | 96.9 |
BETO | 56.5 | -- | -- |
If you use this model in your research, please cite pysentimiento, RoBERTuito and LinCE papers:
@misc{perez2021pysentimiento, title={pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks}, author={Juan Manuel Pérez and Juan Carlos Giudici and Franco Luque}, year={2021}, eprint={2106.09462}, archivePrefix={arXiv}, primaryClass={cs.CL} } @inproceedings{ortega2019overview, title={Overview of the task on irony detection in Spanish variants}, author={Ortega-Bueno, Reynier and Rangel, Francisco and Hern{\'a}ndez Far{\i}as, D and Rosso, Paolo and Montes-y-G{\'o}mez, Manuel and Medina Pagola, Jos{\'e} E}, booktitle={Proceedings of the Iberian languages evaluation forum (IberLEF 2019), co-located with 34th conference of the Spanish Society for natural language processing (SEPLN 2019). CEUR-WS. org}, volume={2421}, pages={229--256}, year={2019} } @inproceedings{aguilar2020lince, title={LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation}, author={Aguilar, Gustavo and Kar, Sudipta and Solorio, Thamar}, booktitle={Proceedings of the 12th Language Resources and Evaluation Conference}, pages={1803--1813}, year={2020} }