模型:
naclbit/gpt-j-japanese-6.8b
This pre-trained model is work in progress! Model weight download will be available in the future.
A 6.8 billion parameter pre-trained model for Japanese language, based on EleutherAI's Mesh Transformer JAX, that has a similar model structure to their GPT-J-6B pre-trained model.
EleutherAIによるMesh Transformer JAXをコードベースとした、GPT-J-6Bに似たストラクチャと約68.7億パラメータを持つ日本語pre-trainedモデルです。
Hyperparameter | Value |
---|---|
n_parameters | 6,876,450,080 |
n_layers | 32 |
d_model | 4,096 |
d_ff | 16,384 |
n_heads | 16 |
d_head | 256 |
n_ctx | 2,048 |
n_vocab | 52,512 |
position encoding | Rotary position encodings (RoPE) |
RoPE dimensions | 64 |
We recommend to use finetuneanon's forked transformer codebase for inferencing as split checkpoint loads up a lot faster than monolithic checkpoint supported by HuggingFace Transformers repository.
The tokenizer still uses 50256 as the <|endoftext|> substitute. Therefore 50256 should be excluded when inferencing.
Lack of quality Japanese corpus was one of the major challenges when we trained the model. We aimed to compile well-formatted corpuses outside of Common Crawl.
The dataset is normalized and sanitized against leading and trailing spaces, excessive CR/LF repetitions.
The whole dataset is about 400GB (as of October 2021) and 106B tokens (compared to 825GB/300B tokens for The Pile).
** Common Crawl
** Books
** News
** Wikipedia
** Other Corpuses