数据集:
wiki_dpr
语言:
en计算机处理:
multilingual大小:
10M<n<100M语言创建人:
crowdsourced批注创建人:
no-annotation源数据集:
original预印本库:
arxiv:2004.04906其他:
text-searchThis is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model. It contains 21M passages from wikipedia along with their DPR embeddings. The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.
The wikipedia dump is the one from Dec. 20, 2018.
Each instance contains a paragraph of at most 100 words, as well as the title of the wikipedia page it comes from, and the DPR embedding (a 768-d vector).
psgs_w100.multiset.compressedAn example of 'train' looks as follows.
This example was too long and was cropped: {'id': '1', 'text': 'Aaron Aaron ( or ; "Ahärôn") is a prophet, high priest, and the brother of Moses in the Abrahamic religions. Knowledge of Aaron, along with his brother Moses, comes exclusively from religious texts, such as the Bible and Quran. The Hebrew Bible relates that, unlike Moses, who grew up in the Egyptian royal court, Aaron and his elder sister Miriam remained with their kinsmen in the eastern border-land of Egypt (Goshen). When Moses first confronted the Egyptian king about the Israelites, Aaron served as his brother\'s spokesman ("prophet") to the Pharaoh. Part of the Law (Torah) that Moses received from'], 'title': 'Aaron', 'embeddings': [-0.07233893871307373, 0.48035329580307007, 0.18650995194911957, -0.5287084579467773, -0.37329429388046265, 0.37622880935668945, 0.25524479150772095, ... -0.336689829826355, 0.6313082575798035, -0.7025573253631592]}psgs_w100.multiset.exact
An example of 'train' looks as follows.
This example was too long and was cropped: {'id': '1', 'text': 'Aaron Aaron ( or ; "Ahärôn") is a prophet, high priest, and the brother of Moses in the Abrahamic religions. Knowledge of Aaron, along with his brother Moses, comes exclusively from religious texts, such as the Bible and Quran. The Hebrew Bible relates that, unlike Moses, who grew up in the Egyptian royal court, Aaron and his elder sister Miriam remained with their kinsmen in the eastern border-land of Egypt (Goshen). When Moses first confronted the Egyptian king about the Israelites, Aaron served as his brother\'s spokesman ("prophet") to the Pharaoh. Part of the Law (Torah) that Moses received from'], 'title': 'Aaron', 'embeddings': [-0.07233893871307373, 0.48035329580307007, 0.18650995194911957, -0.5287084579467773, -0.37329429388046265, 0.37622880935668945, 0.25524479150772095, ... -0.336689829826355, 0.6313082575798035, -0.7025573253631592]}psgs_w100.multiset.no_index
An example of 'train' looks as follows.
This example was too long and was cropped: {'id': '1', 'text': 'Aaron Aaron ( or ; "Ahärôn") is a prophet, high priest, and the brother of Moses in the Abrahamic religions. Knowledge of Aaron, along with his brother Moses, comes exclusively from religious texts, such as the Bible and Quran. The Hebrew Bible relates that, unlike Moses, who grew up in the Egyptian royal court, Aaron and his elder sister Miriam remained with their kinsmen in the eastern border-land of Egypt (Goshen). When Moses first confronted the Egyptian king about the Israelites, Aaron served as his brother\'s spokesman ("prophet") to the Pharaoh. Part of the Law (Torah) that Moses received from'], 'title': 'Aaron', 'embeddings': [-0.07233893871307373, 0.48035329580307007, 0.18650995194911957, -0.5287084579467773, -0.37329429388046265, 0.37622880935668945, 0.25524479150772095, ... -0.336689829826355, 0.6313082575798035, -0.7025573253631592]}psgs_w100.nq.compressed
An example of 'train' looks as follows.
This example was too long and was cropped: {'id': '1', 'text': 'Aaron Aaron ( or ; "Ahärôn") is a prophet, high priest, and the brother of Moses in the Abrahamic religions. Knowledge of Aaron, along with his brother Moses, comes exclusively from religious texts, such as the Bible and Quran. The Hebrew Bible relates that, unlike Moses, who grew up in the Egyptian royal court, Aaron and his elder sister Miriam remained with their kinsmen in the eastern border-land of Egypt (Goshen). When Moses first confronted the Egyptian king about the Israelites, Aaron served as his brother\'s spokesman ("prophet") to the Pharaoh. Part of the Law (Torah) that Moses received from'], 'title': 'Aaron', 'embeddings': [0.013342111371457577, 0.582173764705658, -0.31309744715690613, -0.6991612911224365, -0.5583199858665466, 0.5187504887580872, 0.7152731418609619, ... -0.5385938286781311, 0.8093984127044678, -0.4741983711719513]}psgs_w100.nq.exact
An example of 'train' looks as follows.
This example was too long and was cropped: {'id': '1', 'text': 'Aaron Aaron ( or ; "Ahärôn") is a prophet, high priest, and the brother of Moses in the Abrahamic religions. Knowledge of Aaron, along with his brother Moses, comes exclusively from religious texts, such as the Bible and Quran. The Hebrew Bible relates that, unlike Moses, who grew up in the Egyptian royal court, Aaron and his elder sister Miriam remained with their kinsmen in the eastern border-land of Egypt (Goshen). When Moses first confronted the Egyptian king about the Israelites, Aaron served as his brother\'s spokesman ("prophet") to the Pharaoh. Part of the Law (Torah) that Moses received from'], 'title': 'Aaron', 'embeddings': [0.013342111371457577, 0.582173764705658, -0.31309744715690613, -0.6991612911224365, -0.5583199858665466, 0.5187504887580872, 0.7152731418609619, ... -0.5385938286781311, 0.8093984127044678, -0.4741983711719513]}
The data fields are the same among all splits.
psgs_w100.multiset.compressedname | train |
---|---|
psgs_w100.multiset.compressed | 21015300 |
psgs_w100.multiset.exact | 21015300 |
psgs_w100.multiset.no_index | 21015300 |
psgs_w100.nq.compressed | 21015300 |
psgs_w100.nq.exact | 21015300 |
@misc{karpukhin2020dense, title={Dense Passage Retrieval for Open-Domain Question Answering}, author={Vladimir Karpukhin and Barlas Oğuz and Sewon Min and Patrick Lewis and Ledell Wu and Sergey Edunov and Danqi Chen and Wen-tau Yih}, year={2020}, eprint={2004.04906}, archivePrefix={arXiv}, primaryClass={cs.CL} }
Thanks to @thomwolf , @lewtun , @lhoestq for adding this dataset.