Classification of Medical Documents According to Diseases

23nd Signal Processing and Communications Applications Conference (SIU), Malatya, Türkiye, 16 - 19 Mayıs 2015, ss.1635-1638, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/siu.2015.7130164
Basıldığı Şehir: Malatya
Basıldığı Ülke: Türkiye
Sayfa Sayıları: ss.1635-1638
Anahtar Kelimeler: Text classification, medical documents, disease classification, MeSH headings, TEXT CLASSIFICATION, ALGORITHM
Anadolu Üniversitesi Adresli: Evet

Özet

Medical text classification is still one of the popular research problems inside text classification domain. Apart from some text data compiled from hospital records, most of the researchers in this field evaluate their classification methodologies on documents from MEDLINE database. When whole documents in the database are taken into consideration, MEDLINE is a multi-class and multi-label database. A dataset, containing a small subset of MEDLINE documents belonging to disease categories, is constructed in this study. It is a multi-class but single-label dataset. Due to the highly unbalanced distribution of this dataset, only documents belonging to top-10 disease categories are used in the experiments. The performances of three different pattern classifiers are analyzed on disease classification problem using this dataset. These three pattern classifiers are Bayesian network, C4.5 decision tree, and Random Forest trees. Experiments are realized for the two different cases where the stemming preprocessing step is applied or not. Experimental results show that the most successful classifier among three classifiers is Bayesian network classifier. Also, the best performance is obtained without applying stemming.