Dataset Card for PathVQA

Dataset Description

PathVQA is a dataset of question-answer pairs on pathology images. The dataset is intended to be used for training and testing Medical Visual Question Answering (VQA) systems. The dataset includes both open-ended questions and binary "yes/no" questions. The dataset is built from two publicly-available pathology textbooks: "Textbook of Pathology" and "Basic Pathology", and a publicly-available digital library: "Pathology Education Informational Resource" (PEIR). The copyrights of images and captions belong to the publishers and authors of these two books, and the owners of the PEIR digital library.

Repository: PathVQA Official GitHub Repository Paper: PathVQA: 30000+ Questions for Medical Visual Question Answering Leaderboard: Papers with Code Leaderboard

Dataset Summary

The dataset was obtained from the updated Google Drive link shared by the authors on Feb 15, 2023, see the commit in the GitHub repository. This version of the dataset contains a total of 5,004 images and 32,795 question-answer pairs. Out of the 5,004 images, 4,289 images are referenced by a question-answer pair, while 715 images are not used. There are a few image-question-answer triplets which occur more than once in the same split (training, validation, test). After dropping the duplicate image-question-answer triplets, the dataset contains 32,632 question-answer pairs on 4,289 images.

Supported Tasks and Leaderboards

The PathVQA dataset has an active leaderboard on Papers with Code where models are ranked based on three metrics: "Yes/No Accuracy", "Free-form accuracy" and "Overall accuracy". "Yes/No Accuracy" is the accuracy of a model's generated answers for the subset of binary "yes/no" questions. "Free-form accuracy" is the accuracy of a model's generated answers for the subset of open-ended questions. "Overall accuracy" is the accuracy of a model's generated answers across all questions.

Languages

The question-answer pairs are in English.

Dataset Structure

Data Instances

Each instance consists of an image-question-answer triplet.

{
  'image': <PIL.JpegImagePlugin.JpegImageFile image mode=CMYK size=309x272>,
  'question': 'where are liver stem cells (oval cells) located?',
  'answer': 'in the canals of hering'
}

Data Fields

'image' : the image referenced by the question-answer pair.
'question' : the question about the image.
'answer' : the expected answer.

Data Splits

The dataset is split into training, validation and test. The split is provided directly by the authors.

Training Set	Validation Set	Test Set
QAs	19,654	6,259	6,719
Images	2,599	832	858

Additional Information

Licensing Information

The authors have released the dataset under the MIT License .

Citation Information

@article{he2020pathvqa,
    title={PathVQA: 30000+ Questions for Medical Visual Question Answering},
    author={He, Xuehai and Zhang, Yichen and Mou, Luntian and Xing, Eric and Xie, Pengtao},
    journal={arXiv preprint arXiv:2003.10286},
    year={2020}
}

作者:

flaviagiammarino

数据集大小:

749.04 MB