模型:
sileod/deberta-v3-base-tasksource-nli
任务:
零样本分类数据集:
glue super_glue anli tasksource/babi_nli sick snli scitail OpenAssistant/oasst1 universal_dependencies hans qbao775/PARARULE-Plus alisawuffles/WANLI metaeval/recast sileod/probability_words_nli joey234/nan-nli pietrolesci/nli_fever pietrolesci/breaking_nli pietrolesci/conj_nli pietrolesci/fracas pietrolesci/dialogue_nli pietrolesci/mpe pietrolesci/dnc pietrolesci/gpt3_nli pietrolesci/recast_white pietrolesci/joci martn-nguyen/contrast_nli pietrolesci/robust_nli pietrolesci/robust_nli_is_sd pietrolesci/robust_nli_li_ts pietrolesci/gen_debiased_nli pietrolesci/add_one_rte metaeval/imppres pietrolesci/glue_diagnostics hlgd PolyAI/banking77 paws quora medical_questions_pairs conll2003 Anthropic/hh-rlhf Anthropic/model-written-evals truthful_qa nightingal3/fig-qa tasksource/bigbench blimp cos_e cosmos_qa dream openbookqa qasc quartz quail head_qa sciq social_i_qa wiki_hop wiqa piqa hellaswag pkavumba/balanced-copa 12ml/e-CARE art tasksource/mmlu winogrande codah ai2_arc definite_pronoun_resolution swag math_qa metaeval/utilitarianism mteb/amazon_counterfactual SetFit/insincere-questions SetFit/toxic_conversations turingbench/TuringBench trec tals/vitaminc hope_edi strombergnlp/rumoureval_2019 ethos tweet_eval discovery pragmeval silicone lex_glue papluca/language-identification imdb rotten_tomatoes ag_news yelp_review_full financial_phrasebank poem_sentiment dbpedia_14 amazon_polarity app_reviews hate_speech18 sms_spam humicroedit snips_built_in_intents banking77 hate_speech_offensive yahoo_answers_topics pacovaldez/stackoverflow-questions zapsdcn/hyperpartisan_news zapsdcn/sciie zapsdcn/citation_intent go_emotions scicite liar relbert/lexical_relation_classification metaeval/linguisticprobing tasksource/crowdflower metaeval/ethics emo google_wellformed_query tweets_hate_speech_detection has_part wnut_17 ncbi_disease acronym_identification jnlpba species_800 SpeedOfMagic/ontonotes_english blog_authorship_corpus launch/open_question_type health_fact commonsense_qa mc_taco ade_corpus_v2 prajjwal1/discosense circa YaHi/EffectiveFeedbackStudentWriting Ericwang/promptSentiment Ericwang/promptNLI Ericwang/promptSpoke Ericwang/promptProficiency Ericwang/promptGrammar Ericwang/promptCoherence PiC/phrase_similarity copenlu/scientific-exaggeration-detection quarel mwong/fever-evidence-related numer_sense dynabench/dynasent raquiba/Sarcasm_News_Headline sem_eval_2010_task_8 demo-org/auditor_review medmcqa aqua_rat RuyuanWan/Dynasent_Disagreement RuyuanWan/Politeness_Disagreement RuyuanWan/SBIC_Disagreement RuyuanWan/SChem_Disagreement RuyuanWan/Dilemmas_Disagreement lucasmccabe/logiqa wiki_qa metaeval/cycic_classification metaeval/cycic_multiplechoice metaeval/sts-companion metaeval/commonsense_qa_2.0 metaeval/lingnli metaeval/monotonicity-entailment metaeval/arct metaeval/scinli metaeval/naturallogic onestop_qa demelin/moral_stories corypaik/prost aps/dynahate metaeval/syntactic-augmentation-nli metaeval/autotnli lasha-nlp/CONDAQA openai/webgpt_comparisons Dahoas/synthetic-instruct-gptj-pairwise metaeval/scruples metaeval/wouldyourather sileod/attempto-nli metaeval/defeasible-nli metaeval/help-nli metaeval/nli-veridicality-transitivity metaeval/natural-language-satisfiability metaeval/lonli metaeval/dadc-limit-nli ColumbiaNLP/FLUTE metaeval/strategy-qa openai/summarize_from_feedback metaeval/folio metaeval/tomi-nli metaeval/avicenna stanfordnlp/SHP GBaker/MedQA-USMLE-4-options-hf sileod/wikimedqa declare-lab/cicero amydeng2000/CREAK metaeval/mutual inverse-scaling/NeQA inverse-scaling/quote-repetition inverse-scaling/redefine-math metaeval/puzzte metaeval/implicatures race metaeval/spartqa-yn metaeval/spartqa-mchoice metaeval/temporal-nli metaeval/ScienceQA_text_only AndyChiang/cloth metaeval/logiqa-2.0-nli tasksource/oasst1_dense_flat metaeval/boolq-natural-perturbations metaeval/path-naturalness-prediction riddle_sense Jiangjie/ekar_english metaeval/implicit-hate-stg1 metaeval/chaos-mnli-ambiguity IlyaGusev/headline_cause metaeval/race-c metaeval/equate metaeval/ambient AndyChiang/dgen metaeval/clcd-english civil_comments metaeval/acceptability-prediction maximedb/twentyquestions metaeval/counterfactually-augmented-snli tasksource/I2D2 sileod/mindgames metaeval/counterfactually-augmented-imdb metaeval/cnli metaeval/reclor tasksource/oasst1_pairwise_rlhf_reward tasksource/zero-shot-label-nli 3Atasksource/zero-shot-label-nli 3Atasksource/oasst1_pairwise_rlhf_reward 3Ametaeval/reclor 3Ametaeval/cnli 3Ametaeval/counterfactually-augmented-imdb 3Asileod/mindgames 3Atasksource/I2D2 3Ametaeval/counterfactually-augmented-snli 3Amaximedb/twentyquestions 3Ametaeval/acceptability-prediction 3Acivil_comments 3Ametaeval/clcd-english 3AAndyChiang/dgen 3Ametaeval/ambient 3Ametaeval/equate 3Ametaeval/race-c 3AIlyaGusev/headline_cause 3Ametaeval/chaos-mnli-ambiguity 3Ametaeval/implicit-hate-stg1 3AJiangjie/ekar_english 3Ariddle_sense 3Ametaeval/path-naturalness-prediction 3Ametaeval/boolq-natural-perturbations 3Atasksource/oasst1_dense_flat 3Ametaeval/logiqa-2.0-nli 3AAndyChiang/cloth 3Ametaeval/ScienceQA_text_only 3Ametaeval/temporal-nli 3Ametaeval/spartqa-mchoice 3Ametaeval/spartqa-yn 3Arace 3Ametaeval/implicatures 3Ametaeval/puzzte 3Ainverse-scaling/redefine-math 3Ainverse-scaling/quote-repetition 3Ainverse-scaling/NeQA 3Ametaeval/mutual 3Aamydeng2000/CREAK 3Adeclare-lab/cicero 3Asileod/wikimedqa 3AGBaker/MedQA-USMLE-4-options-hf 3Astanfordnlp/SHP 3Ametaeval/avicenna 3Ametaeval/tomi-nli 3Ametaeval/folio 3Aopenai/summarize_from_feedback 3Ametaeval/strategy-qa 3AColumbiaNLP/FLUTE 3Ametaeval/dadc-limit-nli 3Ametaeval/lonli 3Ametaeval/natural-language-satisfiability 3Ametaeval/nli-veridicality-transitivity 3Ametaeval/help-nli 3Ametaeval/defeasible-nli 3Asileod/attempto-nli 3Ametaeval/wouldyourather 3Ametaeval/scruples 3ADahoas/synthetic-instruct-gptj-pairwise 3Aopenai/webgpt_comparisons 3Alasha-nlp/CONDAQA 3Ametaeval/autotnli 3Ametaeval/syntactic-augmentation-nli 3Aaps/dynahate 3Acorypaik/prost 3Ademelin/moral_stories 3Aonestop_qa 3Ametaeval/naturallogic 3Ametaeval/scinli 3Ametaeval/arct 3Ametaeval/monotonicity-entailment 3Ametaeval/lingnli 3Ametaeval/commonsense_qa_2.0 3Ametaeval/sts-companion 3Ametaeval/cycic_multiplechoice 3Ametaeval/cycic_classification 3Awiki_qa 3Alucasmccabe/logiqa 3ARuyuanWan/Dilemmas_Disagreement 3ARuyuanWan/SChem_Disagreement 3ARuyuanWan/SBIC_Disagreement 3ARuyuanWan/Politeness_Disagreement 3ARuyuanWan/Dynasent_Disagreement 3Aaqua_rat 3Amedmcqa 3Ademo-org/auditor_review 3Asem_eval_2010_task_8 3Araquiba/Sarcasm_News_Headline 3Adynabench/dynasent 3Anumer_sense 3Amwong/fever-evidence-related 3Aquarel 3Acopenlu/scientific-exaggeration-detection 3APiC/phrase_similarity 3AEricwang/promptCoherence 3AEricwang/promptGrammar 3AEricwang/promptProficiency 3AEricwang/promptSpoke 3AEricwang/promptNLI 3AEricwang/promptSentiment 3AYaHi/EffectiveFeedbackStudentWriting 3Acirca 3Aprajjwal1/discosense 3Aade_corpus_v2 3Amc_taco 3Acommonsense_qa 3Ahealth_fact 3Alaunch/open_question_type 3Ablog_authorship_corpus 3ASpeedOfMagic/ontonotes_english 3Aspecies_800 3Ajnlpba 3Aacronym_identification 3Ancbi_disease 3Awnut_17 3Ahas_part 3Atweets_hate_speech_detection 3Agoogle_wellformed_query 3Aemo 3Ametaeval/ethics 3Atasksource/crowdflower 3Ametaeval/linguisticprobing 3Arelbert/lexical_relation_classification 3Aliar 3Ascicite 3Ago_emotions 3Azapsdcn/citation_intent 3Azapsdcn/sciie 3Azapsdcn/hyperpartisan_news 3Apacovaldez/stackoverflow-questions 3Ayahoo_answers_topics 3Ahate_speech_offensive 3Abanking77 3Asnips_built_in_intents 3Ahumicroedit 3Asms_spam 3Ahate_speech18 3Aapp_reviews 3Aamazon_polarity 3Adbpedia_14 3Apoem_sentiment 3Afinancial_phrasebank 3Ayelp_review_full 3Aag_news 3Arotten_tomatoes 3Aimdb 3Apapluca/language-identification 3Alex_glue 3Asilicone 3Apragmeval 3Adiscovery 3Atweet_eval 3Aethos 3Astrombergnlp/rumoureval_2019 3Ahope_edi 3Atals/vitaminc 3Atrec 3Aturingbench/TuringBench 3ASetFit/toxic_conversations 3ASetFit/insincere-questions 3Amteb/amazon_counterfactual 3Ametaeval/utilitarianism 3Amath_qa 3Aswag 3Adefinite_pronoun_resolution 3Aai2_arc 3Acodah 3Awinogrande 3Atasksource/mmlu 3Aart 3A12ml/e-CARE 3Apkavumba/balanced-copa 3Ahellaswag 3Apiqa 3Awiqa 3Awiki_hop 3Asocial_i_qa 3Asciq 3Ahead_qa 3Aquail 3Aquartz 3Aqasc 3Aopenbookqa 3Adream 3Acosmos_qa 3Acos_e 3Ablimp 3Atasksource/bigbench 3Anightingal3/fig-qa 3Atruthful_qa 3AAnthropic/model-written-evals 3AAnthropic/hh-rlhf 3Anlpaueb/finer-139 3Aconll2003 3Amedical_questions_pairs 3Aquora 3Apaws 3APolyAI/banking77 3Ahlgd 3Apietrolesci/glue_diagnostics 3Ametaeval/imppres 3Apietrolesci/add_one_rte 3Apietrolesci/gen_debiased_nli 3Apietrolesci/robust_nli_li_ts 3Apietrolesci/robust_nli_is_sd 3Apietrolesci/robust_nli 3Amartn-nguyen/contrast_nli 3Apietrolesci/joci 3Apietrolesci/recast_white 3Apietrolesci/gpt3_nli 3Apietrolesci/dnc 3Apietrolesci/mpe 3Apietrolesci/dialogue_nli 3Apietrolesci/fracas 3Apietrolesci/breaking_nli 3Apietrolesci/conj_nli 3Apietrolesci/nli_fever 3Ajoey234/nan-nli 3Asileod/probability_words_nli 3Ametaeval/recast 3Aalisawuffles/WANLI 3Aqbao775/PARARULE-Plus 3Ahans 3Auniversal_dependencies 3AOpenAssistant/oasst1 3Ascitail 3Asnli 3Asick 3Atasksource/babi_nli 3Aanli 3Asuper_glue 3Aglue语言:
en其他:
deberta-v2 文本分类 deberta-v3-base deberta-v3 deberta nli natural-language-inference multitask multi-task pipeline extreme-multi-task extreme-mtl tasksource zero-shot rlhf Eval Results预印本库:
arxiv:2301.05948许可:
apache-2.0This is DeBERTa-v3-base fine-tuned with multi-task learning on 600 tasks of the tasksource collection . This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI), and can be used for:
from transformers import pipeline classifier = pipeline("zero-shot-classification",model="sileod/deberta-v3-base-tasksource-nli") text = "one day I will see the world" candidate_labels = ['travel', 'cooking', 'dancing'] classifier(text, candidate_labels)
NLI training data of this model includes label-nli , a NLI dataset specially constructed to improve this kind of zero-shot classification.
!pip install tasknet tasksource import tasknet as tn pipe = tn.load_pipeline('sileod/deberta-v3-base-tasksource-nli','glue/sst2') # works for 500+ tasksource tasks pipe(['That movie was great !', 'Awful movie.']) # [{'label': 'positive', 'score': 0.9956}, {'label': 'negative', 'score': 0.9967}]
The list of tasks is available in model config.json. This is more efficient than ZS since it requires only one forward pass per example, but it is less flexible.
This model ranked 1st among all models with the microsoft/deberta-v3-base architecture according to the IBM model recycling evaluation. https://ibm.github.io/model-recycling/
https://github.com/sileod/tasksource/ https://github.com/sileod/tasknet/ Training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing
This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched. The number of examples per task was capped to 64k. The model was trained for 120k steps with a batch size of 384, and a peak learning rate of 2e-5. Training took 10 days on RTX6000 24GB gpu.
More details on this article:
@article{sileo2023tasksource, title={tasksource: Structured Dataset Preprocessing Annotations for Frictionless Extreme Multi-Task Learning and Evaluation}, author={Sileo, Damien}, url= {https://arxiv.org/abs/2301.05948}, journal={arXiv preprint arXiv:2301.05948}, year={2023} }
damien.sileo@inria.fr