On Two-Stage Feature Selection Methods for Text Classification

Uysal, Alper Kurşat

dc.contributor.author	Uysal, Alper Kurşat
dc.date.accessioned	2019-10-21T19:44:41Z
dc.date.available	2019-10-21T19:44:41Z
dc.date.issued	2018
dc.identifier.issn	2169-3536
dc.identifier.uri	https://dx.doi.org/10.1109/ACCESS.2018.2863547
dc.identifier.uri	https://hdl.handle.net/11421/19928
dc.description	WOS: 000443760300001	en_US
dc.description.abstract	Text classification is a high dimensional pattern recognition problem where feature selection is an important step. Although researchers still propose new feature selection methods, there exist many two-stage feature selection methods combining existing filter-based feature selection methods with feature transformation and wrapper-based feature selection methods in different ways. The main focus of the study is to extensively analyze two-stage feature selection methods for text classification from a different point of view. Two-stage feature selection methods that are constituted by combining filter-based local feature selection methods with feature transformation and wrapper-based feature selection methods were investigated in this paper. In the first stage, four different filter-based local feature selection methods and three different feature set construction methods were employed. Feature sets were constructed either by using maximum globalization policy (MAX), by using weighted averaging globalization policy (AVG), or by selecting an equal number of features for each class (EQ). In the second stage, principal component analysis (PCA), latent semantic indexing (LSI), or genetic algorithms were utilized. Various settings were evaluated with a linear support vector machines classifier on two benchmark data sets, namely, Reuters and Ohsumed using Micro-Fl and Macro-Fl scores. According to the findings, AVG and EQ feature set construction methods are usually more successful than MAX method for two-stage feature selection methods. Most of the highest accuracies were obtained by employing PCA feature transformation in the second stage. However, there is a strong linear correlation between PCA and LSI for all settings but the degree of correlation is slightly more for Ohsumed data set in comparison with the Reuters data set.	en_US
dc.language.iso	eng	en_US
dc.publisher	IEEE-Inst Electrical Electronics Engineers Inc	en_US
dc.relation.isversionof	10.1109/ACCESS.2018.2863547	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Feature Selection	en_US
dc.subject	Genetic Algorithms	en_US
dc.subject	Lsi	en_US
dc.subject	Pca	en_US
dc.subject	Text Classification	en_US
dc.title	On Two-Stage Feature Selection Methods for Text Classification	en_US
dc.type	article	en_US
dc.relation.journal	IEEE Access	en_US
dc.contributor.department	Anadolu Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü	en_US
dc.identifier.volume	6	en_US
dc.identifier.startpage	43233	en_US
dc.identifier.endpage	43251	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US]
dc.contributor.institutionauthor	Uysal, Alper Kurşat

Bu öğenin dosyaları:

Ad:: 19928.-2pdf
Boyut:: 479.6Kb
Biçim:: Bilinmeyen
Açıklama:: Tam Metin / Full Text

Göster/Aç

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Makale Koleksiyonu [100]
Scopus İndeksli Yayınlar Koleksiyonu [8325]
Scopus Indexed Publications Collection
WoS İndeksli Yayınlar Koleksiyonu [7605]
WoS Indexed Publications Collection

Basit öğe kaydını göster