数据集:
embedding-data/SPECTER
Dataset containing triplets (three sentences): anchor, positive, and negative. Contains titles of papers.
Disclaimer: The team releasing SPECTER did not upload the dataset to the Hub and did not write a dataset card. These steps were done by the Hugging Face team.
Each example in the dataset contains triplets of equivalent sentences and is formatted as a dictionary with the key "set" and a list with the sentences as "value".
Each example is a dictionary with a key, "set", containing a list of three sentences (anchor, positive, and negative):
{"set": [anchor, positive, negative]} {"set": [anchor, positive, negative]} ... {"set": [anchor, positive, negative]}
This dataset is useful for training Sentence Transformers models. Refer to the following post on how to train models using triplets.
Install the ? Datasets library with pip install datasets and load the dataset from the Hub with:
from datasets import load_dataset dataset = load_dataset("embedding-data/SPECTER")
The dataset is loaded as a DatasetDict and has the format:
DatasetDict({ train: Dataset({ features: ['set'], num_rows: 684100 }) })
Review an example i with:
dataset["train"][i]["set"]