数据集:

PolyAI/banking77

中文

Dataset Card for BANKING77

Dataset Summary

Dataset composed of online banking queries annotated with their corresponding intents.

BANKING77 dataset provides a very fine-grained set of intents in a banking domain. It comprises 13,083 customer service queries labeled with 77 intents. It focuses on fine-grained single-domain intent detection.

Supported Tasks and Leaderboards

Intent classification, intent detection

Languages

English

Dataset Structure

Data Instances

An example of 'train' looks as follows:

{
  'label': 11, # integer label corresponding to "card_arrival" intent
  'text': 'I am still waiting on my card?'
}

Data Fields

  • text : a string feature.
  • label : One of classification labels (0-76) corresponding to unique intents.

Intent names are mapped to label in the following way:

label intent (category)
0 activate_my_card
1 age_limit
2 apple_pay_or_google_pay
3 atm_support
4 automatic_top_up
5 balance_not_updated_after_bank_transfer
6 balance_not_updated_after_cheque_or_cash_deposit
7 beneficiary_not_allowed
8 cancel_transfer
9 card_about_to_expire
10 card_acceptance
11 card_arrival
12 card_delivery_estimate
13 card_linking
14 card_not_working
15 card_payment_fee_charged
16 card_payment_not_recognised
17 card_payment_wrong_exchange_rate
18 card_swallowed
19 cash_withdrawal_charge
20 cash_withdrawal_not_recognised
21 change_pin
22 compromised_card
23 contactless_not_working
24 country_support
25 declined_card_payment
26 declined_cash_withdrawal
27 declined_transfer
28 direct_debit_payment_not_recognised
29 disposable_card_limits
30 edit_personal_details
31 exchange_charge
32 exchange_rate
33 exchange_via_app
34 extra_charge_on_statement
35 failed_transfer
36 fiat_currency_support
37 get_disposable_virtual_card
38 get_physical_card
39 getting_spare_card
40 getting_virtual_card
41 lost_or_stolen_card
42 lost_or_stolen_phone
43 order_physical_card
44 passcode_forgotten
45 pending_card_payment
46 pending_cash_withdrawal
47 pending_top_up
48 pending_transfer
49 pin_blocked
50 receiving_money
51 Refund_not_showing_up
52 request_refund
53 reverted_card_payment?
54 supported_cards_and_currencies
55 terminate_account
56 top_up_by_bank_transfer_charge
57 top_up_by_card_charge
58 top_up_by_cash_or_cheque
59 top_up_failed
60 top_up_limits
61 top_up_reverted
62 topping_up_by_card
63 transaction_charged_twice
64 transfer_fee_charged
65 transfer_into_account
66 transfer_not_received_by_recipient
67 transfer_timing
68 unable_to_verify_identity
69 verify_my_identity
70 verify_source_of_funds
71 verify_top_up
72 virtual_card_not_working
73 visa_or_mastercard
74 why_verify_identity
75 wrong_amount_of_cash_received
76 wrong_exchange_rate_for_cash_withdrawal

Data Splits

Dataset statistics Train Test
Number of examples 10 003 3 080
Average character length 59.5 54.2
Number of intents 77 77
Number of domains 1 1

Dataset Creation

Curation Rationale

Previous intent detection datasets such as Web Apps, Ask Ubuntu, the Chatbot Corpus or SNIPS are limited to small number of classes (<10), which oversimplifies the intent detection task and does not emulate the true environment of commercial systems. Although there exist large scale multi-domain datasets ( HWU64 and CLINC150 ), the examples per each domain may not sufficiently capture the full complexity of each domain as encountered "in the wild". This dataset tries to fill the gap and provides a very fine-grained set of intents in a single-domain i.e. banking . Its focus on fine-grained single-domain intent detection makes it complementary to the other two multi-domain datasets.

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

The dataset does not contain any additional annotations.

Who are the annotators?

[N/A]

Personal and Sensitive Information

[N/A]

Considerations for Using the Data

Social Impact of Dataset

The purpose of this dataset it to help develop better intent detection systems.

Any comprehensive intent detection evaluation should involve both coarser-grained multi-domain datasets and a fine-grained single-domain dataset such as BANKING77.

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

PolyAI

Licensing Information

Creative Commons Attribution 4.0 International

Citation Information

@inproceedings{Casanueva2020,
    author      = {I{\~{n}}igo Casanueva and Tadas Temcinas and Daniela Gerz and Matthew Henderson and Ivan Vulic},
    title       = {Efficient Intent Detection with Dual Sentence Encoders},
    year        = {2020},
    month       = {mar},
    note        = {Data available at https://github.com/PolyAI-LDN/task-specific-datasets},
    url         = {https://arxiv.org/abs/2003.04807},
    booktitle   = {Proceedings of the 2nd Workshop on NLP for ConvAI - ACL 2020}
}

Contributions

Thanks to @dkajtoch for adding this dataset.