模型:

sileod/deberta-v3-large-tasksource-nli

数据集:

glue super_glue anli metaeval/babi_nli sick snli scitail hans alisawuffles/WANLI metaeval/recast sileod/probability_words_nli joey234/nan-nli pietrolesci/nli_fever pietrolesci/breaking_nli pietrolesci/conj_nli pietrolesci/fracas pietrolesci/dialogue_nli pietrolesci/mpe pietrolesci/dnc pietrolesci/gpt3_nli pietrolesci/recast_white pietrolesci/joci martn-nguyen/contrast_nli pietrolesci/robust_nli pietrolesci/robust_nli_is_sd pietrolesci/robust_nli_li_ts pietrolesci/gen_debiased_nli pietrolesci/add_one_rte metaeval/imppres pietrolesci/glue_diagnostics hlgd paws quora medical_questions_pairs conll2003 Anthropic/hh-rlhf Anthropic/model-written-evals truthful_qa nightingal3/fig-qa tasksource/bigbench bigbench blimp cos_e cosmos_qa dream openbookqa qasc quartz quail head_qa sciq social_i_qa wiki_hop wiqa piqa hellaswag pkavumba/balanced-copa 12ml/e-CARE art tasksource/mmlu winogrande codah ai2_arc definite_pronoun_resolution swag math_qa metaeval/utilitarianism mteb/amazon_counterfactual SetFit/insincere-questions SetFit/toxic_conversations turingbench/TuringBench trec tals/vitaminc hope_edi strombergnlp/rumoureval_2019 ethos tweet_eval discovery pragmeval silicone lex_glue papluca/language-identification imdb rotten_tomatoes ag_news yelp_review_full financial_phrasebank poem_sentiment dbpedia_14 amazon_polarity app_reviews hate_speech18 sms_spam humicroedit snips_built_in_intents banking77 hate_speech_offensive yahoo_answers_topics pacovaldez/stackoverflow-questions zapsdcn/hyperpartisan_news zapsdcn/sciie zapsdcn/citation_intent go_emotions scicite liar relbert/lexical_relation_classification metaeval/linguisticprobing metaeval/crowdflower metaeval/ethics emo google_wellformed_query tweets_hate_speech_detection has_part wnut_17 ncbi_disease acronym_identification jnlpba species_800 SpeedOfMagic/ontonotes_english blog_authorship_corpus launch/open_question_type health_fact commonsense_qa mc_taco ade_corpus_v2 prajjwal1/discosense circa YaHi/EffectiveFeedbackStudentWriting Ericwang/promptSentiment Ericwang/promptNLI Ericwang/promptSpoke Ericwang/promptProficiency Ericwang/promptGrammar Ericwang/promptCoherence PiC/phrase_similarity copenlu/scientific-exaggeration-detection quarel mwong/fever-evidence-related numer_sense dynabench/dynasent raquiba/Sarcasm_News_Headline sem_eval_2010_task_8 demo-org/auditor_review medmcqa aqua_rat RuyuanWan/Dynasent_Disagreement RuyuanWan/Politeness_Disagreement RuyuanWan/SBIC_Disagreement RuyuanWan/SChem_Disagreement RuyuanWan/Dilemmas_Disagreement lucasmccabe/logiqa wiki_qa metaeval/cycic_classification metaeval/cycic_multiplechoice metaeval/sts-companion metaeval/commonsense_qa_2.0 metaeval/lingnli metaeval/monotonicity-entailment metaeval/arct metaeval/scinli metaeval/naturallogic onestop_qa demelin/moral_stories corypaik/prost aps/dynahate metaeval/syntactic-augmentation-nli metaeval/autotnli lasha-nlp/CONDAQA openai/webgpt_comparisons Dahoas/synthetic-instruct-gptj-pairwise metaeval/scruples metaeval/wouldyourather sileod/attempto-nli metaeval/defeasible-nli metaeval/help-nli metaeval/nli-veridicality-transitivity metaeval/natural-language-satisfiability metaeval/lonli metaeval/dadc-limit-nli ColumbiaNLP/FLUTE metaeval/strategy-qa openai/summarize_from_feedback metaeval/folio metaeval/tomi-nli metaeval/avicenna stanfordnlp/SHP GBaker/MedQA-USMLE-4-options-hf sileod/wikimedqa declare-lab/cicero amydeng2000/CREAK metaeval/mutual inverse-scaling/NeQA inverse-scaling/quote-repetition inverse-scaling/redefine-math metaeval/puzzte metaeval/implicatures race metaeval/spartqa-yn metaeval/spartqa-mchoice metaeval/temporal-nli 3Ametaeval/temporal-nli 3Ametaeval/spartqa-mchoice 3Ametaeval/spartqa-yn 3Arace 3Ametaeval/implicatures 3Ametaeval/puzzte 3Ainverse-scaling/redefine-math 3Ainverse-scaling/quote-repetition 3Ainverse-scaling/NeQA 3Ametaeval/mutual 3Aamydeng2000/CREAK 3Adeclare-lab/cicero 3Asileod/wikimedqa 3AGBaker/MedQA-USMLE-4-options-hf 3Astanfordnlp/SHP 3Ametaeval/avicenna 3Ametaeval/tomi-nli 3Ametaeval/folio 3Aopenai/summarize_from_feedback 3Ametaeval/strategy-qa 3AColumbiaNLP/FLUTE 3Ametaeval/dadc-limit-nli 3Ametaeval/lonli 3Ametaeval/natural-language-satisfiability 3Ametaeval/nli-veridicality-transitivity 3Ametaeval/help-nli 3Ametaeval/defeasible-nli 3Asileod/attempto-nli 3Ametaeval/wouldyourather 3Ametaeval/scruples 3ADahoas/synthetic-instruct-gptj-pairwise 3Aopenai/webgpt_comparisons 3Alasha-nlp/CONDAQA 3Ametaeval/autotnli 3Ametaeval/syntactic-augmentation-nli 3Aaps/dynahate 3Acorypaik/prost 3Ademelin/moral_stories 3Aonestop_qa 3Ametaeval/naturallogic 3Ametaeval/scinli 3Ametaeval/arct 3Ametaeval/monotonicity-entailment 3Ametaeval/lingnli 3Ametaeval/commonsense_qa_2.0 3Ametaeval/sts-companion 3Ametaeval/cycic_multiplechoice 3Ametaeval/cycic_classification 3Awiki_qa 3Alucasmccabe/logiqa 3ARuyuanWan/Dilemmas_Disagreement 3ARuyuanWan/SChem_Disagreement 3ARuyuanWan/SBIC_Disagreement 3ARuyuanWan/Politeness_Disagreement 3ARuyuanWan/Dynasent_Disagreement 3Aaqua_rat 3Amedmcqa 3Ademo-org/auditor_review 3Asem_eval_2010_task_8 3Araquiba/Sarcasm_News_Headline 3Adynabench/dynasent 3Anumer_sense 3Amwong/fever-evidence-related 3Aquarel 3Acopenlu/scientific-exaggeration-detection 3APiC/phrase_similarity 3AEricwang/promptCoherence 3AEricwang/promptGrammar 3AEricwang/promptProficiency 3AEricwang/promptSpoke 3AEricwang/promptNLI 3AEricwang/promptSentiment 3AYaHi/EffectiveFeedbackStudentWriting 3Acirca 3Aprajjwal1/discosense 3Aade_corpus_v2 3Amc_taco 3Acommonsense_qa 3Ahealth_fact 3Alaunch/open_question_type 3Ablog_authorship_corpus 3ASpeedOfMagic/ontonotes_english 3Aspecies_800 3Ajnlpba 3Aacronym_identification 3Ancbi_disease 3Awnut_17 3Ahas_part 3Atweets_hate_speech_detection 3Agoogle_wellformed_query 3Aemo 3Ametaeval/ethics 3Ametaeval/crowdflower 3Ametaeval/linguisticprobing 3Arelbert/lexical_relation_classification 3Aliar 3Ascicite 3Ago_emotions 3Azapsdcn/citation_intent 3Azapsdcn/sciie 3Azapsdcn/hyperpartisan_news 3Apacovaldez/stackoverflow-questions 3Ayahoo_answers_topics 3Ahate_speech_offensive 3Abanking77 3Asnips_built_in_intents 3Ahumicroedit 3Asms_spam 3Ahate_speech18 3Aapp_reviews 3Aamazon_polarity 3Adbpedia_14 3Apoem_sentiment 3Afinancial_phrasebank 3Ayelp_review_full 3Aag_news 3Arotten_tomatoes 3Aimdb 3Apapluca/language-identification 3Alex_glue 3Asilicone 3Apragmeval 3Adiscovery 3Atweet_eval 3Aethos 3Astrombergnlp/rumoureval_2019 3Ahope_edi 3Atals/vitaminc 3Atrec 3Aturingbench/TuringBench 3ASetFit/toxic_conversations 3ASetFit/insincere-questions 3Amteb/amazon_counterfactual 3Ametaeval/utilitarianism 3Amath_qa 3Aswag 3Adefinite_pronoun_resolution 3Aai2_arc 3Acodah 3Awinogrande 3Atasksource/mmlu 3Aart 3A12ml/e-CARE 3Apkavumba/balanced-copa 3Ahellaswag 3Apiqa 3Awiqa 3Awiki_hop 3Asocial_i_qa 3Asciq 3Ahead_qa 3Aquail 3Aquartz 3Aqasc 3Aopenbookqa 3Adream 3Acosmos_qa 3Acos_e 3Ablimp 3Abigbench 3Atasksource/bigbench 3Anightingal3/fig-qa 3Atruthful_qa 3AAnthropic/model-written-evals 3AAnthropic/hh-rlhf 3Aconll2003 3Amedical_questions_pairs 3Aquora 3Apaws 3Ahlgd 3Apietrolesci/glue_diagnostics 3Ametaeval/imppres 3Apietrolesci/add_one_rte 3Apietrolesci/gen_debiased_nli 3Apietrolesci/robust_nli_li_ts 3Apietrolesci/robust_nli_is_sd 3Apietrolesci/robust_nli 3Amartn-nguyen/contrast_nli 3Apietrolesci/joci 3Apietrolesci/recast_white 3Apietrolesci/gpt3_nli 3Apietrolesci/dnc 3Apietrolesci/mpe 3Apietrolesci/dialogue_nli 3Apietrolesci/fracas 3Apietrolesci/conj_nli 3Apietrolesci/breaking_nli 3Apietrolesci/nli_fever 3Ajoey234/nan-nli 3Asileod/probability_words_nli 3Ametaeval/recast 3Aalisawuffles/WANLI 3Ahans 3Ascitail 3Asnli 3Asick 3Ametaeval/babi_nli 3Aanli 3Asuper_glue 3Aglue

语言:

en

预印本库:

arxiv:2301.05948

许可:

apache-2.0
中文

Model Card for DeBERTa-v3-large-tasksource-nli

DeBERTa-v3-large fine-tuned with multi-task learning on 520 tasks of the tasksource collection You can further fine-tune this model to use it for any classification or multiple-choice task. This checkpoint has strong zero-shot validation performance on many tasks (e.g. 77% on WNLI). The untuned model CLS embedding also has strong linear probing performance (90% on MNLI), due to the multitask training.

This is the shared model with the MNLI classifier on top. Its encoder was trained on many datasets including bigbench, Anthropic rlhf, anli... alongside many NLI and classification tasks with a SequenceClassification heads while using only one shared encoder. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched. The number of examples per task was capped to 64k. The model was trained for 30k steps with a batch size of 384, and a peak learning rate of 2e-5.

tasksource training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing

Software

https://github.com/sileod/tasksource/ https://github.com/sileod/tasknet/ Training took 6 days on Nvidia A100 40GB GPU.

Citation

More details on this article:

@article{sileo2023tasksource,
  title={tasksource: Structured Dataset Preprocessing Annotations for Frictionless Extreme Multi-Task Learning and Evaluation},
  author={Sileo, Damien},
  url= {https://arxiv.org/abs/2301.05948},
  journal={arXiv preprint arXiv:2301.05948},
  year={2023}
}

Loading a specific classifier

Classifiers for all tasks available. See https://huggingface.co/sileod/deberta-v3-large-tasksource-adapters

Model Card Contact

damien.sileo@inria.fr