模型:
naclbit/gpt-j-japanese-6.8b
This pre-trained model is work in progress! Model weight download will be available in the future.
A 6.8 billion parameter pre-trained model for Japanese language, based on EleutherAI's Mesh Transformer JAX, that has a similar model structure to their GPT-J-6B pre-trained model.
EleutherAIによるMesh Transformer JAXをコードベースとした、GPT-J-6Bに似たストラクチャと約68.7億パラメータを持つ日本語pre-trainedモデルです。
| Hyperparameter | Value | 
|---|---|
| n_parameters | 6,876,450,080 | 
| n_layers | 32 | 
| d_model | 4,096 | 
| d_ff | 16,384 | 
| n_heads | 16 | 
| d_head | 256 | 
| n_ctx | 2,048 | 
| n_vocab | 52,512 | 
| position encoding | Rotary position encodings (RoPE) | 
| RoPE dimensions | 64 | 
We recommend to use finetuneanon's forked transformer codebase for inferencing as split checkpoint loads up a lot faster than monolithic checkpoint supported by HuggingFace Transformers repository.
The tokenizer still uses 50256 as the <|endoftext|> substitute. Therefore 50256 should be excluded when inferencing.
Lack of quality Japanese corpus was one of the major challenges when we trained the model. We aimed to compile well-formatted corpuses outside of Common Crawl.
The dataset is normalized and sanitized against leading and trailing spaces, excessive CR/LF repetitions.
The whole dataset is about 400GB (as of October 2021) and 106B tokens (compared to 825GB/300B tokens for The Pile).
** Common Crawl
** Books
** News
** Wikipedia
** Other Corpuses