oeg/CelebA_Sent2Vect_Sp | ATYUN.COM 官网-人工智能教程资讯全方位服务平台

数据集:

oeg/CelebA_Sent2Vect_Sp

任务:

表格问答

问答

翻译

语言:

大小:

100M<n<1B

预印本库:

arxiv:1911.11378

其他:

CelebA Spanish celebFaces attributes celebFaces+attributes

数字对象标识符:

10.57967/hf/0446

许可:

apache-2.0

数据集介绍文件清单

中文

Corpus Summary

This corpus has 192050 entries made up of descriptive sentences of the faces of the CelebA dataset. The preprocessing of the corpus has been to translate into Spanish the captions of the CelebA dataset with the algorithm used in Text2FaceGAN . In particular, all sentences are combined to generate a larger corpus. Additionally, a data preprocessing was applied that consists of eliminating stopwords, separation symbols and complementary elements that are not useful for training. Finally, using the Sent2vec library and the corpus, training was done to obtain an encoder model for sentences in the Spanish language. Specifically for captions from the CelebA dataset

The training of Sent2vec + CelebA, using the present corpus was developed, resulting in the new model Sent2vec-CelebA-Sp .

Corpus Fields

Each corpus entry is composed of:

Descriptive sentence of a face from the CelebA dataset applied the corresponding preprocessing.

You can download the file with a .txt or .csv extension as appropriate.

Citation information

Citing : If you used CelebA_Sent2vec_Sp corpus in your work, please cite the ???? :

License

This corpus is available under the Apache License 2.0 .

Autors

Universidad Nacional de Ingeniería , Ontology Engineering Group , Universidad Politécnica de Madrid.

Contributors

See the full list of contributors here .

作者:

oeg

数据集大小:

57.74 MB