数据集:
biglam/brill_iconclass
计算机处理:
other-iconclass-metadata大小:
10K<n<100K语言创建人:
expert-generated批注创建人:
expert-generated许可:
cc0-1.0A test dataset and challenge to apply machine learning to collections described with the Iconclass classification system.
This dataset contains 87749 images with Iconclass metadata assigned to the images. The iconclass metadata classification system is intended to provide 'the comprehensive classification system for the content of images.' .
Iconclass was developed in the Netherlands as a standard classification for recording collections, with the idea of assembling huge databases that will allow the retrieval of images featuring particular details, subjects or other common factors. It was developed in the 1970s and was loosely based on the Dewey Decimal System because it was meant to be used in art library card catalogs. source
The Iconclass
view of the world is subdivided in 10 main categories...An Iconclass concept consists of an alphanumeric class number (“notation”) and a corresponding content definition (“textual correlate”). An object can be tagged with as many concepts as the user sees fit. source
These ten divisions are as follows:
Within each of these divisions further subdivision's are possible (9 or 10 subdivisions). For example, under 4 Society, Civilization, Culture , one can find:
See https://iconclass.org/4 for the full list.
To illustrate we can look at some example Iconclass classifications.
41A12 represents castle . This classification is generated via building from the 'base' division 4 , with the following attributes:
The construction of Iconclass of parts makes it particularly interesting (and challenging) to tackle via Machine Learning. Whilst one could tackle this dataset as a (multi) label image classification problem, this is only one way of tackling it. For example in the above label castle giving the model the 'freedom' to predict only a partial label could result in the prediction 41A i.e. housing. Whilst a very particular form of housing this prediction for 'castle' is not 'wrong' so much as it is not as precise as a human cataloguer may provide.
As discussed above this dataset could be tackled in various ways:
This list is not exhaustive.
This dataset doesn't have a natural language. The labels themselves can be treated as a form of language i.e. the label can be thought of as a sequence of tokens that construct a 'sentence'.
The dataset contains a single configuration.
An example instance of the dataset is as follows:
{'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=390x500 at 0x7FC7FFBBD2D0>, 'label': ['31A235', '31A24(+1)', '61B(+54)', '61B:31A2212(+1)', '61B:31D14']}
The dataset is made up of
The dataset doesn't provide any predefined train, validation or test splits.
To facilitate the creation of better models in the cultural heritage domain, and promote the research on tools and techniques using Iconclass, we are making this dataset freely available. All that we ask is that any use is acknowledged and results be shared so that we can all benefit. The content is sampled from the Arkyves database. source
[More Information Needed]
[More Information Needed]
The images are samples from the Arkyves database . This collection includes images from
from libraries and museums in many countries, including the Rijksmuseum in Amsterdam, the Netherlands Institute for Art History (RKD), the Herzog August Bibliothek in Wolfenbüttel, and the university libraries of Milan, Utrecht and Glasgow. source
[More Information Needed]
Who are the source language producers?[More Information Needed]
The annotations are derived from the source dataset see above. Most annotations were likely created by staff with experience with the Iconclass metadata schema.
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
Iconclass as a metadata standard absorbs biases from the time and place of its creation (1940s Netherlands). In particular, '32B human races, peoples; nationalities' has been subject to criticism. '32B36 'primitive', 'pre-modern' peoples' is one example of a category which we may not wish to adopt. In general, there are components of the subdivisions of 32B which reflect a belief that race is a scientific category rather than socially constructed.
The Iconclass community is actively exploring these limitations; for example, see Revising Iconclass section 32B human races, peoples; nationalities .
One should be aware of these limitations to Iconclass, and in particular, before deploying a model trained on this data in any production settings.
[More Information Needed]
[More Information Needed]
Etienne Posthumus
@MISC{iconclass, title = {Brill Iconclass AI Test Set}, author={Etienne Posthumus}, year={2020} }
Thanks to @davanstrien for adding this dataset.