Gelişmiş Arama

Basit öğe kaydını göster

dc.contributor.authorUysal, Alper Kurşat
dc.date.accessioned2019-10-21T19:44:41Z
dc.date.available2019-10-21T19:44:41Z
dc.date.issued2018
dc.identifier.issn2169-3536
dc.identifier.urihttps://dx.doi.org/10.1109/ACCESS.2018.2863547
dc.identifier.urihttps://hdl.handle.net/11421/19928
dc.descriptionWOS: 000443760300001en_US
dc.description.abstractText classification is a high dimensional pattern recognition problem where feature selection is an important step. Although researchers still propose new feature selection methods, there exist many two-stage feature selection methods combining existing filter-based feature selection methods with feature transformation and wrapper-based feature selection methods in different ways. The main focus of the study is to extensively analyze two-stage feature selection methods for text classification from a different point of view. Two-stage feature selection methods that are constituted by combining filter-based local feature selection methods with feature transformation and wrapper-based feature selection methods were investigated in this paper. In the first stage, four different filter-based local feature selection methods and three different feature set construction methods were employed. Feature sets were constructed either by using maximum globalization policy (MAX), by using weighted averaging globalization policy (AVG), or by selecting an equal number of features for each class (EQ). In the second stage, principal component analysis (PCA), latent semantic indexing (LSI), or genetic algorithms were utilized. Various settings were evaluated with a linear support vector machines classifier on two benchmark data sets, namely, Reuters and Ohsumed using Micro-Fl and Macro-Fl scores. According to the findings, AVG and EQ feature set construction methods are usually more successful than MAX method for two-stage feature selection methods. Most of the highest accuracies were obtained by employing PCA feature transformation in the second stage. However, there is a strong linear correlation between PCA and LSI for all settings but the degree of correlation is slightly more for Ohsumed data set in comparison with the Reuters data set.en_US
dc.language.isoengen_US
dc.publisherIEEE-Inst Electrical Electronics Engineers Incen_US
dc.relation.isversionof10.1109/ACCESS.2018.2863547en_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectFeature Selectionen_US
dc.subjectGenetic Algorithmsen_US
dc.subjectLsien_US
dc.subjectPcaen_US
dc.subjectText Classificationen_US
dc.titleOn Two-Stage Feature Selection Methods for Text Classificationen_US
dc.typearticleen_US
dc.relation.journalIEEE Accessen_US
dc.contributor.departmentAnadolu Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümüen_US
dc.identifier.volume6en_US
dc.identifier.startpage43233en_US
dc.identifier.endpage43251en_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US]
dc.contributor.institutionauthorUysal, Alper Kurşat


Bu öğenin dosyaları:

Thumbnail

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Basit öğe kaydını göster