数据集:
Finnish-NLP/mc4_fi_cleaned
mC4 Finnish cleaned is cleaned version of the original mC4 Finnish split.
mC4 Finnish is mainly intended to pretrain Finnish language models and word representations.
Finnish
[Needs More Information]
The data have several fields:
Train Validation
[Needs More Information]
[Needs More Information]
Who are the source language producers?[Needs More Information]
[Needs More Information]
Who are the annotators?[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]