模型:
akdeniz27/bert-base-turkish-cased-ner
This model is the fine-tuned model of "dbmdz/bert-base-turkish-cased" using a reviewed version of well known Turkish NER dataset ( https://github.com/stefan-it/turkish-bert/files/4558187/nerdata.txt ).
task = "ner" model_checkpoint = "dbmdz/bert-base-turkish-cased" batch_size = 8 label_list = ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC'] max_length = 512 learning_rate = 2e-5 num_train_epochs = 3 weight_decay = 0.01
model = AutoModelForTokenClassification.from_pretrained("akdeniz27/bert-base-turkish-cased-ner") tokenizer = AutoTokenizer.from_pretrained("akdeniz27/bert-base-turkish-cased-ner") ner = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="first") ner("your text here")
Pls refer " https://huggingface.co/transformers/_modules/transformers/pipelines/token_classification.html" for entity grouping with aggregation_strategy parameter.
Evaluation results with the test sets proposed in "Küçük, D., Küçük, D., Arıcı, N. 2016. Türkçe Varlık İsmi Tanıma için bir Veri Kümesi ("A Named Entity Recognition Dataset for Turkish"). IEEE Sinyal İşleme, İletişim ve Uygulamaları Kurultayı. Zonguldak, Türkiye." paper.