数据集:
oeg/CelebA_Sent2Vect_Sp
This corpus has 192050 entries made up of descriptive sentences of the faces of the CelebA dataset. The preprocessing of the corpus has been to translate into Spanish the captions of the CelebA dataset with the algorithm used in Text2FaceGAN . In particular, all sentences are combined to generate a larger corpus. Additionally, a data preprocessing was applied that consists of eliminating stopwords, separation symbols and complementary elements that are not useful for training. Finally, using the Sent2vec library and the corpus, training was done to obtain an encoder model for sentences in the Spanish language. Specifically for captions from the CelebA dataset
The training of Sent2vec + CelebA, using the present corpus was developed, resulting in the new model Sent2vec-CelebA-Sp .
Each corpus entry is composed of:
You can download the file with a .txt or .csv extension as appropriate.
Citing : If you used CelebA_Sent2vec_Sp corpus in your work, please cite the ???? :
This corpus is available under the Apache License 2.0 .
Universidad Nacional de Ingeniería , Ontology Engineering Group , Universidad Politécnica de Madrid.
See the full list of contributors here .