Classification of medical documents according to diseases [Tibbi dokümanlarin hastaliklara göre siniflandirilmasi]

Parlak, Bekir; Uysal, Alper Kurşat

Gelişmiş Arama

Göster/Aç

Tam Metin / Full Text (240.1Kb)

Erişim

info:eu-repo/semantics/closedAccess

Tarih

2015

Yazar

Parlak, Bekir
Uysal, Alper Kurşat

Üst veri

Tüm öğe kaydını göster

Özet

Medical text classification is still one of the popular research problems inside text classification domain. Apart from some text data compiled from hospital records, most of the researchers in this field evaluate their classification methodologies on documents from MEDLINE database. When whole documents in the database are taken into consideration, MEDLINE is a multi-class and multi-label database. A dataset, containing a small subset of MEDLINE documents belonging to disease categories, is constructed in this study. It is a multi-class but single-label dataset. Due to the highly unbalanced distribution of this dataset, only documents belonging to top-10 disease categories are used in the experiments. The performances of three different pattern classifiers are analyzed on disease classification problem using this dataset. These three pattern classifiers are Bayesian network, C4.5 decision tree, and Random Forest trees. Experiments are realized for the two different cases where the stemming preprocessing step is applied or not. Experimental results show that the most successful classifier among three classifiers is Bayesian network classifier. Also, the best performance is obtained without applying stemming

Kaynak

2015 23rd Signal Processing and Communications Applications Conference, SIU 2015 - Proceedings

Bağlantı

https://dx.doi.org/10.1109/SIU.2015.7130164
https://hdl.handle.net/11421/20019

Koleksiyonlar

Bildiri Koleksiyonu [113]
Scopus İndeksli Yayınlar Koleksiyonu [8325]