模型:
cardiffnlp/tweet-topic-21-multi
此模型基于 TimeLMs 语言模型,该模型是根据2018年1月至2021年12月的大约124M条推文进行训练的(参见 here ),并在11267个语料库上进行了多标签主题分类的微调。此模型适用于英语。
标签:
0: arts_&_culture | 5: fashion_&_style | 10: learning_&_educational | 15: science_&_technology |
---|---|---|---|
1: business_&_entrepreneurs | 6: film_tv_&_video | 11: music | 16: sports |
2: celebrity_&_pop_culture | 7: fitness_&_health | 12: news_&_social_concern | 17: travel_&_adventure |
3: diaries_&_daily_life | 8: food_&_dining | 13: other_hobbies | 18: youth_&_student_life |
4: family | 9: gaming | 14: relationships |
from transformers import AutoModelForSequenceClassification, TFAutoModelForSequenceClassification from transformers import AutoTokenizer import numpy as np from scipy.special import expit MODEL = f"cardiffnlp/tweet-topic-21-multi" tokenizer = AutoTokenizer.from_pretrained(MODEL) # PT model = AutoModelForSequenceClassification.from_pretrained(MODEL) class_mapping = model.config.id2label text = "It is great to see athletes promoting awareness for climate change." tokens = tokenizer(text, return_tensors='pt') output = model(**tokens) scores = output[0][0].detach().numpy() scores = expit(scores) predictions = (scores >= 0.5) * 1 # TF #tf_model = TFAutoModelForSequenceClassification.from_pretrained(MODEL) #class_mapping = tf_model.config.id2label #text = "It is great to see athletes promoting awareness for climate change." #tokens = tokenizer(text, return_tensors='tf') #output = tf_model(**tokens) #scores = output[0][0] #scores = expit(scores) #predictions = (scores >= 0.5) * 1 # Map to classes for i in range(len(predictions)): if predictions[i]: print(class_mapping[i])
输出:
news_&_social_concern sports
如果您使用此模型,请引用 reference paper 。
@inproceedings{antypas-etal-2022-twitter, title = "{T}witter Topic Classification", author = "Antypas, Dimosthenis and Ushio, Asahi and Camacho-Collados, Jose and Silva, Vitor and Neves, Leonardo and Barbieri, Francesco", booktitle = "Proceedings of the 29th International Conference on Computational Linguistics", month = oct, year = "2022", address = "Gyeongju, Republic of Korea", publisher = "International Committee on Computational Linguistics", url = "https://aclanthology.org/2022.coling-1.299", pages = "3386--3400" }