数据集:
TalTechNLP/AMIsum
AMIsum is meeting summaryzation dataset based on the AMI Meeting Corpus ( https://groups.inf.ed.ac.uk/ami/corpus/ ). The dataset utilizes the transcripts as the source data and abstract summaries as the target data.
English
{'transcript': '<PM> Okay. <PM> Right. <PM> Um well this is the kick-off meeting for our our project. <PM> Um and um this is just what we're gonna be doing over the next twenty five minutes. <ME> Mm-hmm. <PM> Um so first of all, just to kind of make sure that we all know each other, I'm Laura and I'm the project manager. <PM> Do you want to introduce yourself again? <ME> Great. [...]', 'summary': 'The project manager introduced the upcoming project to the team members and then the team members participated in an exercise in which they drew their favorite animal and discussed what they liked about the animal. The project manager talked about the project finances and selling prices. The team then discussed various features to consider in making the remote.', 'id': 'ES2002a',
transcript: Expert generated transcript. summary: Expert generated summary. id: Meeting id.
train | validation | test |
---|---|---|
97 | 20 | 20 |