数据集:
harpomaxx/dga-detection
许可:
cc-by-2.0A dataset containing both DGA and normal domain names. The normal domain names were taken from the Alexa top one million domains. An additional 3,161 normal domains were included in the dataset, provided by the Bambenek Consulting feed. This later group is particularly interesting since it consists of suspicious domain names that were not generated by DGA. Therefore, the total amount of domains normal in the dataset is 1,003,161. DGA domains were obtained from the repositories of DGA domains of Andrey Abakumov and John Bambenek . The total amount of DGA domains is 1,915,335, and they correspond to 51 different malware families. DGA domains were generated by 51 different malware families. About the 55% of of the DGA portion of dataset is composed of samples from the Banjori, Post, Timba, Cryptolocker, Ramdo and Conficker malware.
The DGA generation scheme followed by the malware families includes the simple arithmetical (A) and the recent word based (W) schemes. Under the arithmetic scheme, the algorithm usually calculates a sequence of values that have a direct ASCII representation usable for a domain name. On the other hand, word-based consists of concatenating a sequence of words from one or more wordlists.