Repositório ISCTE-IUL

. Latent Segments Models (LSM) are commonly used as an approach for market segmentation. When using LSM, several criteria are available to determine the number of segments. However, it is not established which criteria are more adequate when dealing with a specific application. Since most market segmentation problems involve the simultaneous use of categorical and continuous base variables, it is particularly useful to select the best criteria when dealing with LSM with mixed type base variables. We first present an empirical test, which provides the ranking of several information criteria for model selection based on ten mixed data sets. As a result, the ICL-BIC, BIC, CAIC and L criteria are selected as the best performing criteria in the estimation of mixed mixture models. We then present an application concerning a retail chain clients’ segmentation. The best information criteria yield two segments: Preferential Clients and Occasional Clients .


Introduction
Market segmentation is the division of a heterogeneous market into homogeneous sub-markets of consumers, clusters or segments, with similar behaviour within segments and different behaviour across segments. The first application of market segmentation emerged in 1956 [38].
Segmentation is an essential instrument of marketing [28], [39]. It provides a better market understanding and, consequently, means to develop more successful business strategies.
A market segmentation solution is a function of the market segmentation variables and of a specific segmentation (clustering) procedure. In what concerns base variables for segmentation, product-specific variables (see table 1) should be considered [39]. Other attributes may help profiling the segments' structures (for an overview of previous works on the use of demographics, psychographics, and other variables in segmentation studies, see [21], [28], [42]. Along with the selection of a set of potential segmentation variables, a segmentation procedure must be chosen, which delivers a segmentation solution. In this paper we present the segmentation of clients of a retail chain which is based on product-specific variables and results from the estimation of a Latent Segments Model (LSM) [14], [42]. This approach enables the simultaneous use of categorical and continuous segmentation base variables. It is a probabilistic clustering approach which assumes that the variables' observations in a sample arise from different segments of unknown proportions. Estimation of the LSM is typically based on maximum likelihood.

Latent Segments Models
The aim of Latent Segments Models [14] (or Finite Mixture Models) is to identify the latent segments required to explain the associations among a set of observed variables (segmentation base variables) and to allocate observations to these segments.
The use of LSM has become increasingly popular in the marketing literature [18], [42]. This approach to segmentation offers some advantages when compared with other techniques: it identifies market segments [19]; it provides means to select the number of segments [30]; it is able to deal with diverse types of data (different measurement levels) [40]; it outperforms more traditional approaches [41].
LSM provide a clustering model based approach, a statistical model being postulated for the population from which the sample under study is coming, and assuming that the data is generated by a mixture of underlying (density) probability distributions. Let ) ( ip y i y = be the vector representing the scores of the ith case for the pth segmentation base variable (i = 1,…,n ; p = 1,…,P). Several types of segmentation variables may be considered which have a conditional (within-cluster) distribution in the exponential family (such as Bernoulli, Poisson, Multinomial or Normal distribution) [24], [25], [32], [40]. Considering S as the number of segments and s=1,…,S, we define s When considering mixed type variables for segmentation we may additionally specify that for each one of the continuous attributes, and for the categorical attributes, with p C categories (e.g. see [25]).
Although continuous attributes could be categorized and also modelled by the multinomial distribution, this may result in considerable loss of information [13]. Furthermore it is difficult to establish an adequate number of categories [12]; however, discretization may be very useful in particular when continuous variables which do not belong to the exponential family are considered.
The LSM assumption of conditional independence can be relaxed by using the appropriate multivariate rather than univariate distributions for sets of locally dependent variables: multivariate normal distribution for sets of continuous variables and a set of categorical variables can be combined into a joint multinomial distribution.
The LSM estimation problem simultaneously addresses the estimation of distributional parameters and classification of cases into segments, yielding mixing probabilities.
Finally, modal allocation provides means for constituting a partition assigning each case to the segment with the highest posterior probability which is given by Maximum likelihood estimates of the vector parameter ψ can be obtained by treating the unobserved segment labels as missing data and using the EM algorithm [22], [30], [34].

Introduction
Several criteria may be considered for the selection of Latent Segments Models (LSM). In the present work we consider theoretical information based criteria. These criteria are generally based on the likelihood function (which we want to maximize) and a measure of model complexity (which we want to minimize). Thus, all theoretical information criteria balance parsimony (fitting a model with a large number of components requires the estimation of a very large number of parameters and a potential loss of precision in these estimates [29]), and model complexity (which tends to improve the model fit to the data).
The general form of information criteria is as follows: where the first term measures the lack of fit and the second term C includes a measure for model complexity, and a penalization factor. Some information criteria are shown on table 2. The emphasis on information criteria begins with the pioneer work of Akaike [2]. Akaike's Information Criterion (AIC) chooses a model with S segments that minimises (3) with C = 2 ψ n .
Later, Bozdogan [8] suggested the modified AIC criterion (AIC3) in the context of mixture models, using 3 instead of 2 on penalizing term; so, it chooses a model with S segments that minimises (3) with C = 3 ψ n .
Another variant of AIC, the corrected AIC, is proposed [26], focusing on the smallsample bias adjustment (AIC may perform poorly if there are too many parameters in relation to the sample size); AICc thus selects a model with S segments that A new criterion is then proposed -AICu -because AICc still tends to over fit as the sample size increase [31].
With the consistent AIC criterion (CAIC), Bozdogan [9] noted that the term ) log 1 n ( ψ n + has the effect of increasing the penalty term and, as a result, minimization of CAIC leads in general to models with fewer parameters than AIC does. The Bayesian information criterion (BIC) was proposed by Schwarz [36], and chooses a model with S segments that minimises (3) with C = ψ n log n; in a different way, from the notion of stochastic complexity, Rissanen [35] proposed an equivalent criterion in form, the minimum descriptive length (MDL).  The CLC -Complete Likelihood Classification criterion [30] is proposed as an approximation of the classification likelihood criterion [7]. It chooses a model with S segments that minimises (3)  The normalised entropy criterion (NEC) was introduced by Celeux and Soromenho [11]; an improvement is due to Biernacki, Celeux, and Govaert [6]. This improved NEC chooses a model with s segments if NEC(s) ≤ 1, (2 ≤ s ≤ S) and states that NEC (1) =1; otherwise NEC declares there is no clustering structure in the data.
An approximate Bayesian solution, which is a crude approximation to twice the log Bayes factor for S segments [30], the approximate weight of evidence (AWE) proposed by Banfield and Raftery [3], uses the classification likelihood, and chooses a model with S segments that minimises (3) with logn) Finally, the L criterion [22] depends on sample size, n, number of model parameters, ψ n , and the mixing probabilities, s λ , and chooses a model with S segments that minimises 1)/2 ψ S(n 2) S/2log(n/1 /12) s log(nλ /2) ψ (n LL + + + ∑ + − .

Information criteria selection
In order to select one particular criterion for determining the number of segments in a LSM based on mixed type variables, we run some auxiliary clustering analysis. We analyse ten real data sets (table 3) with mixed variables (continuous and categorical) and known structure (the clusters in the data set are previously known) and use all the criteria presented on table 2, for the LSM estimation.  Table 4 presents the proportion of data sets in which theoretical information criteria were able to recover the original cluster structure (in particular the true number of segments), as well as the corresponding criteria ranking. According to the obtained results ICL-BIC is the best performing criterion. It is able to recover the original data sets structure (it is able to detect the underlying true number of clusters or segments in the data set) in 8 of the 10 data sets (regardless the number of variables and sample size).
It is followed by CAIC and BIC (ex-aequo) and L (3 rd place). As a consequence, we opt for the use of the ICL-BIC criterion on the retail segmentation application (we also present results for BIC, CAIC, and L criteria).

Data set description
The retail data set includes attributes referring to 1504 supermarket clients. Data originates from a questionnaire responses and includes several characteristics ranging from attitudes to demographics. Travel time Categorical 2 minutes walking (mw), 2 to 5 mw, 5 to 10 mw, more than 10 mw, ..., more than 15 m by car, 10 to 15 m by car As already referred, product-specific base variables are preferable for segmentation purposes. In order to segment retail clients we thus select some attitudinal and behavioural variables, such as reasons to do the purchase, purchasing habits, usage frequency, visit pattern, travel time, amount spent and proportion of expenditure in the retail chain (the proportion of monthly expenditures which refers to the specific supermarket chain). These variables illustrate the relationship between consumers and retail stores. Demographics such as gender, age, income, occupation, and education are available for identifying the individuals in the segments, turning them more accessible. Table 5 presents the segmentation base variables: 2 continuous and 6 categorical.

Segment Structure
Results from the estimation of a LSM using the referred segmentation variables (see Table 2) yield a two-segments structure. The ICL-BIC values corresponding to this and alternative solutions are displayed on table 6. Since we found there was an interaction between some base variables we included the following in the adopted model: usage frequency and visit pattern; transportation and travel time; amount spent and proportion of expenditure in the retail chain.
As we can see the ICL-BIC criterion attains its minimum for S=2 and L yields the same conclusion. We thus select a LSM with two segments (of sizes 917 for segment 1, and 587 for segment 2, by "modal allocation"), which we characterize on table 7.
As a result from the segments' profiling we name Seg.1 as Preferential clients and Seg.2 as Occasional clients.
Preferential clients go often to the retail supermarkets; they leave nearby and walk to the super. These clients allocate 60% of their home monthly expenditures to the retail chain.
Occasional clients also include some clients that go often to the supermarket but they clearly differ from Seg.1 concerning the inclusion of occasional purchasers.
Location (Home proximity) is an important reason for purchasing for both segments; however for Occasional clients' job proximity is also relevant. This segment also includes more clients which go to the super by car.
These results agree, in general, with those obtained in previous segmentation based on a larger sample and on a similar inquiry conducted two years before [10]. It is thus possible to conclude that this segment structure is stable.

Conclusion and future work
In this article we discuss the use of Latent Segment Models for market segmentation. We focus on the utilization of theoretical information criteria to recover clustering structures. In particular, we discuss the use of these criteria for mixed type variables based clustering, since segmentation is typically based on attributes with diverse measurement levels. The discussion is motivated by an application: the segmentation of clients of a retail chain.
We first present the analysis of ten data sets with known clustering structure and rank several criteria according to their ability to recover the original structure, indicating the correct number of clusters. According to the obtained results we rank the best criteria as follows: ICL-BIC (1st place), BIC and CAIC (2nd place, ex-aequo) and L (3rd place). Using this empirical test's results we select the ICL-BIC (a criterion which was specifically designed for clustering applications) criterion as an indicator of the correct number of retail clients' latent segments.
We finally estimate a Latent Segments Model to obtain a segment structure which refers to the clients of a supermarkets retail chain. We use product specific variables as a base for segmentation (e.g. reasons for purchase). As a result (and using the ICL-BIC criterion) two segments are constituted. The Preferential clients segment and the Occasional Clients segment.
In addition to these substantive conclusions we consider that the issue concerning the selection of specific information criteria to estimate Latent Segments Models, based on mixed type data, should be further discussed. In fact, mixed type variables are commonly considered in segmentation studies and thus, the information criteria performance which is empirically observed in the present work deserves future research. In the present work empirical results provide a criteria ranking. Naturally, a larger amount of data sets with diverse characteristics (which may be obtained via simulation procedures) should be considered in order to further prove the consistency of the present conclusions.