reddit_finance_43_250k is a collection of 250k post/comment pairs from 43 financial, investing and crypto subreddits. Post must have all been text, with a length of 250chars, and a positive score. Each subreddit is narrowed down to the 70th qunatile before being mergered with their top 3 comments and than the other subs. Further score based methods are used to select the top 250k post/comment pairs.
The code to recreate the dataset is here: https://github.com/getorca/ProfitsBot_V0_OLLM/tree/main/ds_builder
The trained lora model is here: https://huggingface.co/winddude/pb_lora_7b_v0.1