数据集:
Vipitis/Shadertoys-fine
fine variant of the Shadertoys dataset (still WIP), where individual functions are avaialable as Datapoints.
language-modeling : The dataset can be used to train a model for modelling programming languages, which consists in building language models for programming languages.
A data point consists of the function string, it's name as well as a bit of metadata like the author and source URL. (in the future there might be a function string without comments)
{ 'name': '<type> <name>', 'code': '<type> <name>(<inputs>) { <body> return <outputs>; }\n', 'source': 'https://shadertoy.com/view/<shaderID>', 'author': '<username>' }
A data point in the return_completion subset for the return-completion task in ShaderEval includes just two features:
{ 'body': '<type> <name> <type> <name>(<inputs>) { <body> return', 'return_statment': ' <outputs>: }\n', }
'name' funciton identifier composed of the type and the name of the function
'code' the raw code (including comments) of function.
'source' URL to the shader. It might be on a different renderpass
'author' username of the shader author
'body' the body of the function without the return statement (no comments)
'return_statment' the return statement of the function. everything infront of the semicolon is kept and white sapces are stripped in the custome Evaluator.
Currently available (shuffled):
These splits should be indexed the same across both subsets. So if you are fine-tuning on the fine subset you won't get exposed to the return_completion test split. However there are many duplicates among both subsets and splits.
Data retrieved starting 2022-07-20
All data was collected via the Shadertoy.com API and then by looking for keywords and counting curly brackets to figure out what is part of a function and what isn't.
Who are the source language producers?Shadertoy.com contributers which publish shaders as 'public+API'
The Default licnese for each Shader is CC BY-NC-SA 3.0. However, some Shaders might have a different license attached. The Dataset is currently not filtering for any licensis.