模型:
zhayunduo/roberta-base-stocktwits-finetuned
This model is fine tuned with roberta-base model on 3200000 comments from stocktwits, with the user labeled tags 'Bullish' or 'Bearish'
try something that the individual investors may say on the investment forum on the inference API, for example, try 'red' and 'green'.
Train loss | Validation loss | Validation accuracy | |
---|---|---|---|
epoch1 | 0.3495 | 0.2956 | 0.8679 |
epoch2 | 0.2717 | 0.2235 | 0.9021 |
epoch3 | 0.2360 | 0.1875 | 0.9210 |
epoch4 | 0.2106 | 0.1603 | 0.9343 |
from transformers import RobertaForSequenceClassification, RobertaTokenizer from transformers import pipeline import pandas as pd import emoji # the model was trained upon below preprocessing def process_text(texts): # remove URLs texts = re.sub(r'https?://\S+', "", texts) texts = re.sub(r'www.\S+', "", texts) # remove ' texts = texts.replace(''', "'") # remove symbol names texts = re.sub(r'(\#)(\S+)', r'hashtag_\2', texts) texts = re.sub(r'(\$)([A-Za-z]+)', r'cashtag_\2', texts) # remove usernames texts = re.sub(r'(\@)(\S+)', r'mention_\2', texts) # demojize texts = emoji.demojize(texts, delimiters=("", " ")) return texts.strip() tokenizer_loaded = RobertaTokenizer.from_pretrained('zhayunduo/roberta-base-stocktwits-finetuned') model_loaded = RobertaForSequenceClassification.from_pretrained('zhayunduo/roberta-base-stocktwits-finetuned') nlp = pipeline("text-classification", model=model_loaded, tokenizer=tokenizer_loaded) sentences = pd.Series(['just buy','just sell it', 'entity rocket to the sky!', 'go down','even though it is going up, I still think it will not keep this trend in the near future']) # sentences = list(sentences.apply(process_text)) # if input text contains https, @ or # or $ symbols, better apply preprocess to get a more accurate result sentences = list(sentences) results = nlp(sentences) print(results) # 2 labels, label 0 is bearish, label 1 is bullish