数据集:
dali-does/clevr-math
任务:
视觉问答语言:
en计算机处理:
monolingual语言创建人:
machine-generated批注创建人:
machine-generated源数据集:
clevr预印本库:
arxiv:2208.05358许可:
cc-by-4.0Dataset for compositional multimodal mathematical reasoning based on CLEVR.
Loading the data, preprocessing text with CLIPfrom transformers import CLIPPreprocessor from datasets import load_dataset, DownloadConfig dl_config = DownloadConfig(resume_download=True, num_proc=8, force_download=True) # Load 'general' instance of dataset dataset = load_dataset('dali-does/clevr-math', download_config=dl_config) # Load version with only multihop in test data dataset_multihop = load_dataset('dali-does/clevr-math', 'multihop', download_config=dl_config) model_path = "openai/clip-vit-base-patch32" extractor = CLIPProcessor.from_pretrained(model_path) def transform_tokenize(e): e['image'] = [image.convert('RGB') for image in e['image']] return extractor(text=e['question'], images=e['image'], padding=True) dataset = dataset.map(transform_tokenize, batched=True, num_proc=8, padding='max_length') dataset_subtraction = dataset.filter(lambda e: e['template'].startswith('subtraction'), num_proc=4)
Leaderboard will be announced at a later date.
The dataset is currently only available in English. To extend the dataset to other languages, the CLEVR templates must be rewritten in the target language.
features = datasets.Features( { "template": datasets.Value("string"), "id": datasets.Value("string"), "question": datasets.Value("string"), "image": datasets.Image(), "label": datasets.Value("int64") } )
train/val/test
Data is generated using code provided with the CLEVR-dataset, using blender and templates constructed by the dataset curators.
[More Information Needed]
Adam Dahlgren Lindström - dali@cs.umu.se
Licensed under Creative Commons Attribution Share Alike 4.0 International (CC-by 4.0).
[More Information Needed]
@misc{https://doi.org/10.48550/arxiv.2208.05358, doi = {10.48550/ARXIV.2208.05358}, url = {https://arxiv.org/abs/2208.05358}, author = {Lindström, Adam Dahlgren and Abraham, Savitha Sam}, keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL), Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences, I.2.7; I.2.10; I.2.6; I.4.8; I.1.4}, title = {CLEVR-Math: A Dataset for Compositional Language, Visual, and Mathematical Reasoning}, publisher = {arXiv}, year = {2022}, copyright = {Creative Commons Attribution Share Alike 4.0 International} }
Thanks to @dali-does for adding this dataset.