Two novel feature selection methods for imbalanced text classification

TİRYAKİ, HANDE; Uysal, Alper

doi:10.1007/s11042-026-21273-y

Two novel feature selection methods for imbalanced text classification

TİRYAKİ H., Uysal A. K.

Multimedia Tools and Applications, cilt.85, sa.4, 2026 (Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 85 Sayı: 4
Basım Tarihi: 2026
Doi Numarası: 10.1007/s11042-026-21273-y
Dergi Adı: Multimedia Tools and Applications
Derginin Tarandığı İndeksler: Scopus, ABI/INFORM, Compendex, INSPEC, zbMATH
Anahtar Kelimeler: Dimension reduction, Feature selection, Imbalanced text classification, Term selection
Anadolu Üniversitesi Adresli: Evet

Özet

The distribution of text data across classes is often imbalanced. This situation negatively impacts classifier performance in the text classification process. In the literature, many studies have been performed on imbalanced text classification and it is still a popular research field. The feature selection stage, that is one of the crucial stages of the text classification process, is also critical in the imbalanced text classification problem. Two novel feature selection methods (EFS_IMP1 and EFS_IMP2) are proposed for imbalanced text classification. These methods are derived from a recent feature selection method called Extensive Feature Selector (EFS). The comparison of performances of EFS_IMP1 and EFS_IMP2 methods are carried out with filter-based feature selection methods, which are GSS, Class-Index Corpus-Index measure, Max–Min Ratio, Comprehensively Measure Feature Selection, Discriminative and Semantic Feature Selection, Extensive Feature Selector methods. Three benchmark imbalanced text datasets were employed with Support Vector Machines (SVM), Decision Tree (DT), Random Forest (RF) and k Nearest Neighbor (kNN) classifiers. Experimental results demonstrated that, on Enron1 dataset using RF classifier EFS_IMP2 gets 94.441, on Spam SMS dataset using SVM classifier EFS_IMP1 gets 95.376 and also on Reuters 21578 TOP8 MM dataset using RF classifier, EFS_IMP1 gets 99.695, which are the greatest macro-F1 scores. Consequently, the EFS_IMP1 and EFS_IMP2 offer superior or comparative performance compared with other feature selection methods according to macro-F1 for imbalanced text classification.