数据集:
flaviagiammarino/path-vqa
PathVQA is a dataset of question-answer pairs on pathology images. The dataset is intended to be used for training and testing Medical Visual Question Answering (VQA) systems. The dataset includes both open-ended questions and binary "yes/no" questions. The dataset is built from two publicly-available pathology textbooks: "Textbook of Pathology" and "Basic Pathology", and a publicly-available digital library: "Pathology Education Informational Resource" (PEIR). The copyrights of images and captions belong to the publishers and authors of these two books, and the owners of the PEIR digital library.
Repository: PathVQA Official GitHub Repository Paper: PathVQA: 30000+ Questions for Medical Visual Question Answering Leaderboard: Papers with Code Leaderboard
The dataset was obtained from the updated Google Drive link shared by the authors on Feb 15, 2023, see the commit in the GitHub repository. This version of the dataset contains a total of 5,004 images and 32,795 question-answer pairs. Out of the 5,004 images, 4,289 images are referenced by a question-answer pair, while 715 images are not used. There are a few image-question-answer triplets which occur more than once in the same split (training, validation, test). After dropping the duplicate image-question-answer triplets, the dataset contains 32,632 question-answer pairs on 4,289 images.
Supported Tasks and LeaderboardsThe PathVQA dataset has an active leaderboard on Papers with Code where models are ranked based on three metrics: "Yes/No Accuracy", "Free-form accuracy" and "Overall accuracy". "Yes/No Accuracy" is the accuracy of a model's generated answers for the subset of binary "yes/no" questions. "Free-form accuracy" is the accuracy of a model's generated answers for the subset of open-ended questions. "Overall accuracy" is the accuracy of a model's generated answers across all questions.
LanguagesThe question-answer pairs are in English.
Each instance consists of an image-question-answer triplet.
{ 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=CMYK size=309x272>, 'question': 'where are liver stem cells (oval cells) located?', 'answer': 'in the canals of hering' }
The dataset is split into training, validation and test. The split is provided directly by the authors.
Training Set | Validation Set | Test Set | |
---|---|---|---|
QAs | 19,654 | 6,259 | 6,719 |
Images | 2,599 | 832 | 858 |
The authors have released the dataset under the MIT License .
@article{he2020pathvqa, title={PathVQA: 30000+ Questions for Medical Visual Question Answering}, author={He, Xuehai and Zhang, Yichen and Mou, Luntian and Xing, Eric and Xie, Pengtao}, journal={arXiv preprint arXiv:2003.10286}, year={2020} }