数据集:
embedding-data/Amazon-QA
This dataset contains Question and Answer data from Amazon.
Disclaimer: The team releasing Amazon-QA did not upload the dataset to the Hub and did not write a dataset card. These steps were done by the Hugging Face team.
Each example in the dataset contains pairs of query and answer sentences and is formatted as a dictionary:
{"query": [sentence_1], "pos": [sentence_2]} {"query": [sentence_1], "pos": [sentence_2]} ... {"query": [sentence_1], "pos": [sentence_2]}
This dataset is useful for training Sentence Transformers models. Refer to the following post on how to train models using similar sentences.
Install the ? Datasets library with pip install datasets and load the dataset from the Hub with:
from datasets import load_dataset dataset = load_dataset("embedding-data/Amazon-QA")
The dataset is loaded as a DatasetDict and has the format:
DatasetDict({ train: Dataset({ features: ['query', 'pos'], num_rows: 1095290 }) })
Review an example i with:
dataset["train"][0]