On feature extraction for spam e-mail detection

Günal S., ERGİN S., GÜLMEZOĞLU M. B., Gerek Ö. N.

International Workshop on Multimedia Content Representation, Classification and Security, MRCS 2006, İstanbul, Türkiye, 11 - 13 Eylül 2006, cilt.4105 LNCS, ss.635-642, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası: 4105 LNCS
Doi Numarası: 10.1007/11848035_84
Basıldığı Şehir: İstanbul
Basıldığı Ülke: Türkiye
Sayfa Sayıları: ss.635-642
Anadolu Üniversitesi Adresli: Evet

Özet

Electronic mail is an important communication method for most computer users. Spam e-mails however consume bandwidth resource, fill-up server storage and are also a waste of time to tackle. The general way to label an e-mail as spam or non-spam is to set up a finite set of discriminative features and use a classifier for the detection. In most cases, the selection of such features is empirically verified. In this paper, two different methods are proposed to select the most discriminative features among a set of reasonably arbitrary features for spam e-mail detection. The selection methods are developed using the Common Vector Approach (CVA) which is actually a subspace-based pattern classifier. Experimental results indicate that the proposed feature selection methods give considerable reduction on the number of features without affecting recognition rates. © Springer-Verlag Berlin Heidelberg 2006.