SHED: Spam Ham Email Dataset

Upasana Sharma, Surinder Singh Khurana

doi:10.17762/ijritcc.v5i6.903

PDF

Published: Jun 30, 2017

DOI: https://doi.org/10.17762/ijritcc.v5i6.903

Upasana Sharma, Surinder Singh Khurana

Abstract

Automatic filtering of spam emails becomes essential feature for a good email service provider. To gain direct or indirect benefits organizations/individuals are sending a lot of spam emails. Such kind emails activities are not only distracting the user but also consume lot of resources including processing power, memory and network bandwidth. The security issues are also associated with these unwanted emails as these emails may contain malicious content and/or links. Content based spam filtering is one of the effective approaches used for filtering. However, its efficiency depends upon the training set. The most of the existing datasets were collected and prepared a long back and the spammers have been changing the content to evade the filters trained based on these datasets. In this paper, we introduce Spam Ham email dataset (SHED): a dataset consisting spam and ham email. We evaluated the performance of filtering techniques trained by previous datasets and filtering techniques trained by SHED. It was observed that the filtering techniques trained by SHED outperformed the technique trained by other dataset. Furthermore, we also classified the spam email into various categories.

How to Cite

, U. S. S. S. K. (2017). SHED: Spam Ham Email Dataset. International Journal on Recent and Innovation Trends in Computing and Communication, 5(6), 1078 –. https://doi.org/10.17762/ijritcc.v5i6.903