数据集:
blinoff/kinopoisk
Kinopoisk movie reviews dataset (TOP250 & BOTTOM100 rank lists).
In total it contains 36,591 reviews from July 2004 to November 2012.
With following distribution along the 3-point sentiment scale:
Each sample contains the following fields:
import pandas as pd df = pd.read_json('kinopoisk.jsonl', lines=True) df.sample(5)
@article{blinov2013research, title={Research of lexical approach and machine learning methods for sentiment analysis}, author={Blinov, PD and Klekovkina, Maria and Kotelnikov, Eugeny and Pestov, Oleg}, journal={Computational Linguistics and Intellectual Technologies}, volume={2}, number={12}, pages={48--58}, year={2013} }