Gelişmiş Arama

Basit öğe kaydını göster

dc.contributor.authorParlak, Bekir
dc.contributor.authorUysal, Alper Kurşat
dc.date.accessioned2019-10-21T19:44:30Z
dc.date.available2019-10-21T19:44:30Z
dc.date.issued2019
dc.identifier.issn0165-5515
dc.identifier.issn1741-6485
dc.identifier.urihttps://dx.doi.org/10.1177/0165551519860982
dc.identifier.urihttps://hdl.handle.net/11421/19892
dc.descriptionWOS: 000474950600001en_US
dc.description.abstractClassification of medical documents was mostly carried out on English data sets and these studies were performed on hospital records rather than academic texts. The main reasons behind this situation are the lack of publicly available data sets and the tasks being costly and time-consuming. As the first contribution of this study, two data sets including Turkish and English counterparts of the same abstracts published in Turkish medical journals were constructed. Turkish is one of the widely used agglutinative languages worldwide and English is a good example of non-agglutinative languages. While English abstracts were obtained automatically from MEDLINE database with a computer program, Turkish counterparts of these documents were collected manually from the Internet. As the second contribution of this study, an extensive comparison on classification of abstracts obtained from Turkish medical journals was made by using these two equivalent data sets. Features were extracted from text documents with three different approaches: unigram, bigram and hybrid. Hybrid approach includes a combination of unigram and bigram features. In the experiments, three different feature selection methods and seven different classifiers were utilised. According to the results on both data sets, classification performance of the English abstracts outperformed the Turkish counterparts. Maximum accuracies were obtained from the combination of unigram features, distinguishing feature selector (DFS) and multinomial naive Bayes (MNB) classifier for both data sets. Unigram features were generally more efficient than bigram and hybrid features. However, analysis of top-10 features indicated that nearly half of the features were translations of each other for Turkish and English data sets.en_US
dc.description.sponsorshipAnadolu University [1503F136]en_US
dc.description.sponsorshipThe author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Anadolu University, Fund of Scientific Research Projects under grant number 1503F136.en_US
dc.language.isoengen_US
dc.publisherSAGE Publications LTDen_US
dc.relation.isversionof10.1177/0165551519860982en_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectFeature Selectionen_US
dc.subjectMedical Documentsen_US
dc.subjectPreprocessingen_US
dc.subjectText Classificationen_US
dc.subjectText Representationen_US
dc.titleOn classification of abstracts obtained from medical journalsen_US
dc.typearticleen_US
dc.relation.journalJournal of Information Scienceen_US
dc.contributor.departmentAnadolu Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümüen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US]
dc.contributor.institutionauthorUysal, Alper Kurşat


Bu öğenin dosyaları:

Thumbnail

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Basit öğe kaydını göster