Skip navigation
User training | Reference and search service

Library catalog

Content aggregators
Please use this identifier to cite or link to this item:

Title: Feature selection for clustering categorical data with an embedded modeling approach
Authors: Silvestre, C.
Cardoso, M. G. M. S.
Figueiredo, M. A. T.
Keywords: Cluster analysis
Finite mixtures models
EM algorithm
Feature selection
Categorical features
Issue Date: 2015
Publisher: Wiley-Blackwell
Abstract: Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Description: WOS:000355958900009 (Nº de Acesso Web of Science)
Peer reviewed: Sim
ISSN: 0266-4720
Publisher version: The definitive version is available at:
Appears in Collections:BRU-RI - Artigo em revista científica internacional com arbitragem científica

Files in This Item:
File Description SizeFormat 
publisher_version_exsy12082_pdf529.56 kBAdobe PDFView/Open    Request a copy

FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpace
Formato BibTex MendeleyEndnote Currículo DeGóis 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.