数据集:

sms_spam

中文

Dataset Card for [Dataset Name]

Dataset Summary

The SMS Spam Collection v.1 is a public set of SMS labeled messages that have been collected for mobile phone spam research. It has one collection composed by 5,574 English, real and non-enconded messages, tagged according being legitimate (ham) or spam.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

English

Dataset Structure

Data Instances

[More Information Needed]

Data Fields

  • sms: the sms message
  • label: indicating if the sms message is ham or spam, ham means it is not spam

Data Splits

[More Information Needed]

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

[More Information Needed]

Citation Information

@inproceedings{Almeida2011SpamFiltering, title={Contributions to the Study of SMS Spam Filtering: New Collection and Results}, author={Tiago A. Almeida and Jose Maria Gomez Hidalgo and Akebo Yamakami}, year={2011}, booktitle = "Proceedings of the 2011 ACM Symposium on Document Engineering (DOCENG'11)", }

Contributions

Thanks to @czabo for adding this dataset.