模型:
EMBO/BioMegatron345mCased
!---
-->
BioMegatron is a transformer developed by the Applied Deep Learning Research team at NVIDIA. This particular Megatron model trained on top of the Megatron-LM model, adding a PubMed corpusto the Megatron-LM corpora(Wikipedia, RealNews, OpenWebText, and CC-Stories). BioMegatron follows a similar (albeit not identical) architecture as BERT and it has 345 million parameters:
More information available at nVIDIA NGC CATALOG
In this implementation we have followed the commands of the nvidia/megatron-bert-uncased-345m repository to make BioMegatron available in ?.
However, the file convert_megatron_bert_checkpoint.py needed a modification. The reason is that the Megatron model shown in nvidia/megatron-bert-uncased-345m has included head layers, while the weights of the BioMegatron model that we upload to this repository do not contain a head.
The code below is a modification of the original convert_megatron_bert_checkpoint.py .
import os import torch from convert_biomegatron_checkpoint import convert_megatron_checkpoint print_checkpoint_structure = True path_to_checkpoint = "/path/to/BioMegatron345mUncased/" # Extract the basename. basename = os.path.dirname(path_to_checkpoint).split('/')[-1] # Load the model. input_state_dict = torch.load(os.path.join(path_to_checkpoint, 'model_optim_rng.pt'), map_location="cpu") # Convert. print("Converting") output_state_dict, output_config = convert_megatron_checkpoint(input_state_dict, head_model=False) # Print the structure of converted state dict. if print_checkpoint_structure: recursive_print(None, output_state_dict) # Store the config to file. output_config_file = os.path.join(path_to_checkpoint, "config.json") print(f'Saving config to "{output_config_file}"') with open(output_config_file, "w") as f: json.dump(output_config, f) # Store the state_dict to file. output_checkpoint_file = os.path.join(path_to_checkpoint, "pytorch_model.bin") print(f'Saving checkpoint to "{output_checkpoint_file}"') torch.save(output_state_dict, output_checkpoint_file)
We provide in the repository an alternative version of the python script in order to any user to cross-check the validity of the model replicated in this repository.
BioMegatron can be run with the standard ? script for loading models. Here we show an example identical to that of nvidia/megatron-bert-uncased-345m .
import os import torch from transformers import BertTokenizer, MegatronBertForMaskedLM, AutoModelForMaskedLM checkpoint = "EMBO/BioMegatron345mCased" # The tokenizer. Megatron was trained with standard tokenizer(s). tokenizer = BertTokenizer.from_pretrained(checkpoint) # Load the model from $MYDIR/nvidia/megatron-bert-uncased-345m. model = AutoModelForMaskedLM.from_pretrained(checkpoint) device = torch.device("cpu") # Create inputs (from the BERT example page). input = tokenizer("The capital of France is [MASK]", return_tensors="pt").to(device) label = tokenizer("The capital of France is Paris", return_tensors="pt")["input_ids"].to(device) # Run the model. with torch.no_grad(): output = model(**input, labels=label) print(output)
This implementation has not been fine-tuned in any task. It has only the weights of the official nVIDIA checkpoint. It needs to be trained to perform any downstream task.
The original code for Megatron can be found here: https://github.com/NVIDIA/Megatron-LM .