On feature weighting and selection for medical document classification


Parlak B., Uysal A. K.

Studies in Computational Intelligence, cilt.718, ss.269-282, 2018 (Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 718
  • Basım Tarihi: 2018
  • Doi Numarası: 10.1007/978-3-319-58965-7_19
  • Dergi Adı: Studies in Computational Intelligence
  • Derginin Tarandığı İndeksler: Scopus
  • Sayfa Sayıları: ss.269-282
  • Anahtar Kelimeler: Disease classification, Medical documents, MeSH, Text classification
  • Anadolu Üniversitesi Adresli: Evet

Özet

© Springer International Publishing AG 2018.Medical document classification is still one of the popular research problems inside text classification domain. In this study, the impact of feature selection and feature weighting on medical document classification is analyzed using two datasets containing MEDLINE documents. The performances of two different feature selection methods namely Gini index and distinguishing feature selector and two different term weighting methods namely term frequency (TF) and term frequency-inverse document frequency (TF-IDF) are analyzed using two pattern classifiers. These pattern classifiers are Bayesian network and C4.5 decision tree. As this study deals with single-label classification, a subset of documents inside OHSUMED and a self-constructed dataset is used for assessment of these methods. Due to having low amount of documents for some categories in self-compiled dataset, only documents belonging to 10 different disease categories are used in the experiments for both datasets. Experimental results show that the better result is obtained with combination of distinguishing feature selector, TF feature weighting, and Bayesian network classifier.