数据集:
Abirate/french_book_reviews
The majority of review datasets are in English. There are datasets in other languages, but not many. Through this work, I would like to enrich the datasets in the French language(my mother tongue with Arabic). The data was retrieved from two French websites: Babelio and Critiques Libres Like Wikipedia, these two French sites are made possible by the contributions of volunteers who use the Internet to share their knowledge and reading experiences. The French book reviews is a dataset of a huge number of reader reviews on French books that ill be constantly updated over time.
The texts in the dataset are in French (fr).
A JSON-formatted example of a typical instance in the dataset:
{ "book_title": "La belle histoire des maths", "author": "Michel Rousselet", "reader_review": "C’est un livre impressionnant, qui inspire le respect par la qualité de sa reliure et son contenu. Je le feuillette et je découvre à chaque tour de page un thème distinct magnifiquement illustré. Très beau livre !", "rating": 4.0, "label": 1 }Data Fields
I kept the dataset as one block (train), so it can be shuffled and split by users later using methods of the hugging face dataset library like the (.train_test_split()) method.
The majority of review datasets are in English. There are datasets in other languages, but not many. Through this work, I would like to enrich the datasets in the French language (French is my mother tongue with Arabic) and slightly contribute to advancing data science and AI, not only for English NLP tasks but for other languages around the world.
French is an international language and it is gaining ground. In addition, it is the language of love. The richness of the French language, so appreciated around the world, is largely related to the richness of its culture. The most telling example is French literature, which has many world-famous writers, such as Gustave Flaubert , Albert Camus , Victor Hugo , Molière , Simone de Beauvoir , Antoine de Saint-Exupéry : the author of "Le Petit Prince" (The Little Prince), which is still among the most translated books in literary history. And one of the world-famous quotes from this book is: "Voici mon secret. Il est très simple: on ne voit bien qu'avec le coeur. L'essentiel est invisible pour les yeux." etc.
Source DataThe source of Data is: two French websites: Babelio and Critiques Libres
Initial Data Collection and NormalizationThe data was collected using web scraping (with Scrapy Framework) and subjected to additional data processing. For more details, see this notebook, which details the dataset creation process. Notebook of the Dataset creation
Note : This dataset will be constantly updated to include the most recent reviews on French books by aggregating the old datasets with the updated ones in order to have a huge dataset over time.
Who are the source Data producers ?I created the Dataset using web scraping, by building a spider and a crawler to scrape the two french web sites Babelio and Critiques Libres
AnnotationsAnnotations are part of the initial data collection (see the script above).
Abir ELTAIEF
Licensing InformationThis work is licensed under CC0: Public Domain
ContributionsThanks to @Abirate for creating and adding this dataset.