数据集:

musabg/wikipedia-oscar-tr

中文

Wikipedia and OSCAR Turkish Dataset

? Welcome to the "Wikipedia and OSCAR Turkish" Huggingface Repo!

? This repo contains a Turkish language dataset generated by merging Wikipedia and OSCAR cleaned Common Crawl. The dataset contains over 13 million examples with a single feature - text.

? This dataset can be useful for natural language processing tasks in Turkish language.

? To download the dataset, you can use the Hugging Face Datasets library. Here's some sample code to get started:

from datasets import load_dataset

dataset = load_dataset("musabg/wikipedia-oscar-tr")

? Have fun exploring this dataset and training language models on it!