数据集:
swda
任务:
文本分类语言:
en计算机处理:
monolingual大小:
100K<n<1M语言创建人:
found批注创建人:
found源数据集:
extended|other-Switchboard-1 Telephone Speech Corpus, Release 2 extended|other-Switchboard-1+Telephone+Speech+Corpus,+Release+2许可:
cc-by-nc-sa-3.0The Switchboard Dialog Act Corpus (SwDA) extends the Switchboard-1 Telephone Speech Corpus, Release 2 with turn/utterance-level dialog-act tags. The tags summarize syntactic, semantic, and pragmatic information about the associated turn. The SwDA project was undertaken at UC Boulder in the late 1990s. The SwDA is not inherently linked to the Penn Treebank 3 parses of Switchboard, and it is far from straightforward to align the two resources. In addition, the SwDA is not distributed with the Switchboard's tables of metadata about the conversations and their participants.
The language supported is English.
Utterance are tagged with the SWBD-DAMSL DA.
An example from the dataset is:
{'act_tag': 115, 'caller': 'A', 'conversation_no': 4325, 'damsl_act_tag': 26, 'from_caller': 1632, 'from_caller_birth_year': 1962, 'from_caller_dialect_area': 'WESTERN', 'from_caller_education': 2, 'from_caller_sex': 'FEMALE', 'length': 5, 'pos': 'Okay/UH ./.', 'prompt': 'FIND OUT WHAT CRITERIA THE OTHER CALLER WOULD USE IN SELECTING CHILD CARE SERVICES FOR A PRESCHOOLER. IS IT EASY OR DIFFICULT TO FIND SUCH CARE?', 'ptb_basename': '4/sw4325', 'ptb_treenumbers': '1', 'subutterance_index': 1, 'swda_filename': 'sw00utt/sw_0001_4325.utt', 'talk_day': '03/23/1992', 'text': 'Okay. /', 'to_caller': 1519, 'to_caller_birth_year': 1971, 'to_caller_dialect_area': 'SOUTH MIDLAND', 'to_caller_education': 1, 'to_caller_sex': 'FEMALE', 'topic_description': 'CHILD CARE', 'transcript_index': 0, 'trees': '(INTJ (UH Okay) (. .) (-DFL- E_S))', 'utterance_index': 1}
name | act_tag | example | train_count | full_count | |
---|---|---|---|---|---|
1 | Statement-non-opinion | sd | Me, I'm in the legal department. | 72824 | 75145 |
2 | Acknowledge (Backchannel) | b | Uh-huh. | 37096 | 38298 |
3 | Statement-opinion | sv | I think it's great | 25197 | 26428 |
4 | Agree/Accept | aa | That's exactly it. | 10820 | 11133 |
5 | Abandoned or Turn-Exit | % | So, - | 10569 | 15550 |
6 | Appreciation | ba | I can imagine. | 4633 | 4765 |
7 | Yes-No-Question | qy | Do you have to have any special training? | 4624 | 4727 |
8 | Non-verbal | x | [Laughter], [Throat_clearing] | 3548 | 3630 |
9 | Yes answers | ny | Yes. | 2934 | 3034 |
10 | Conventional-closing | fc | Well, it's been nice talking to you. | 2486 | 2582 |
11 | Uninterpretable | % | But, uh, yeah | 2158 | 15550 |
12 | Wh-Question | qw | Well, how old are you? | 1911 | 1979 |
13 | No answers | nn | No. | 1340 | 1377 |
14 | Response Acknowledgement | bk | Oh, okay. | 1277 | 1306 |
15 | Hedge | h | I don't know if I'm making any sense or not. | 1182 | 1226 |
16 | Declarative Yes-No-Question | qy^d | So you can afford to get a house? | 1174 | 1219 |
17 | Other | fo_o_fw_by_bc | Well give me a break, you know. | 1074 | 883 |
18 | Backchannel in question form | bh | Is that right? | 1019 | 1053 |
19 | Quotation | ^q | You can't be pregnant and have cats | 934 | 983 |
20 | Summarize/reformulate | bf | Oh, you mean you switched schools for the kids. | 919 | 952 |
21 | Affirmative non-yes answers | na | It is. | 836 | 847 |
22 | Action-directive | ad | Why don't you go first | 719 | 746 |
23 | Collaborative Completion | ^2 | Who aren't contributing. | 699 | 723 |
24 | Repeat-phrase | b^m | Oh, fajitas | 660 | 688 |
25 | Open-Question | qo | How about you? | 632 | 656 |
26 | Rhetorical-Questions | qh | Who would steal a newspaper? | 557 | 575 |
27 | Hold before answer/agreement | ^h | I'm drawing a blank. | 540 | 556 |
28 | Reject | ar | Well, no | 338 | 346 |
29 | Negative non-no answers | ng | Uh, not a whole lot. | 292 | 302 |
30 | Signal-non-understanding | br | Excuse me? | 288 | 298 |
31 | Other answers | no | I don't know | 279 | 286 |
32 | Conventional-opening | fp | How are you? | 220 | 225 |
33 | Or-Clause | qrr | or is it more of a company? | 207 | 209 |
34 | Dispreferred answers | arp_nd | Well, not so much that. | 205 | 207 |
35 | 3rd-party-talk | t3 | My goodness, Diane, get down from there. | 115 | 117 |
36 | Offers, Options, Commits | oo_co_cc | I'll have to check that out | 109 | 110 |
37 | Self-talk | t1 | What's the word I'm looking for | 102 | 103 |
38 | Downplayer | bd | That's all right. | 100 | 103 |
39 | Maybe/Accept-part | aap_am | Something like that | 98 | 105 |
40 | Tag-Question | ^g | Right? | 93 | 92 |
41 | Declarative Wh-Question | qw^d | You are what kind of buff? | 80 | 80 |
42 | Apology | fa | I'm sorry. | 76 | 79 |
43 | Thanking | ft | Hey thanks a lot | 67 | 78 |
I used info from the Probabilistic-RNN-DA-Classifier repo: The same training and test splits as used by Stolcke et al. (2000) . The development set is a subset of the training set to speed up development and testing used in the paper Probabilistic Word Association for Dialogue Act Classification with Recurrent Neural Networks .
Dataset | # Transcripts | # Utterances |
---|---|---|
Training | 1115 | 192,768 |
Validation | 21 | 3,196 |
Test | 19 | 4,088 |
[More Information Needed]
The SwDA is not inherently linked to the Penn Treebank 3 parses of Switchboard, and it is far from straightforward to align the two resources Calhoun et al. 2010, §2.4. In addition, the SwDA is not distributed with the Switchboard's tables of metadata about the conversations and their participants.
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Christopher Potts , Stanford Linguistics.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
@techreport{Jurafsky-etal:1997, Address = {Boulder, CO}, Author = {Jurafsky, Daniel and Shriberg, Elizabeth and Biasca, Debra}, Institution = {University of Colorado, Boulder Institute of Cognitive Science}, Number = {97-02}, Title = {Switchboard {SWBD}-{DAMSL} Shallow-Discourse-Function Annotation Coders Manual, Draft 13}, Year = {1997}} @article{Shriberg-etal:1998, Author = {Shriberg, Elizabeth and Bates, Rebecca and Taylor, Paul and Stolcke, Andreas and Jurafsky, Daniel and Ries, Klaus and Coccaro, Noah and Martin, Rachel and Meteer, Marie and Van Ess-Dykema, Carol}, Journal = {Language and Speech}, Number = {3--4}, Pages = {439--487}, Title = {Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech?}, Volume = {41}, Year = {1998}} @article{Stolcke-etal:2000, Author = {Stolcke, Andreas and Ries, Klaus and Coccaro, Noah and Shriberg, Elizabeth and Bates, Rebecca and Jurafsky, Daniel and Taylor, Paul and Martin, Rachel and Meteer, Marie and Van Ess-Dykema, Carol}, Journal = {Computational Linguistics}, Number = {3}, Pages = {339--371}, Title = {Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech}, Volume = {26}, Year = {2000}}
Thanks to @gmihaila for adding this dataset.