Abstract: Today, the Internet is full of harmful and wasteful elements, such as phishing and spam messages, which must
be properly classified before reaching end-users. This issue has attracted the pattern recognition community’s
attention and motivated to determine which strategies achieve best classification results. Several methods use
as many features as content-based properties the data set have, which leads to a high dimensional classification
problem. In this context, this paper presents a feature selection approach that simultaneously determines a nonlinear
classification function with minimal error and minimizes the number of features by penalizing their use
in the dual formulation of binary Support Vector Machines (SVM). The method optimizes the width of an
anisotropic RBF Kernel via successive gradient descent steps, eliminating features that have low relevance
for the model. Experiments with two real-world Spam and Phishing data sets demonstrate that our approach
accomplishes the best performance compared to well-known feature selection methods using consistently a
small number of features.