Cluster analysis (in marketing)
From Academic Kids

Cluster analysis is a class of statistical techniques that can be applied to data that exhibits “natural” groupings. Cluster analysis sorts through the raw data and groups them into clusters. A cluster is a group of relatively homogeneous cases or observations. Objects in a cluster are similar to each other. They are also dissimilar to objects outside the cluster, particularly objects in other clusters.
The diagram below illustrates the results of a survey that studied drinkers’ perceptions of spirits (alcohol). Each point represents the results from one respondent. The research indicates there are four clusters in this market.
PerceptualMap2.png
alt text
Illustration of clusters
Another example is the vacation travel market. Recent research has identified three clusters or market segments. They are the: 1) The demanders  they want exceptional service and expect to be pampered; 2) The escapists  they want to get away and just relax; 3) The educationalist  they want to see new things, go to museums, go on a safari, or experience new cultures.
Cluster analysis, like factor analysis and multi dimensional scaling, is an interdependence technique : it makes no distinction between dependent and independent variables. The entire set of interdependent relationships is examined. It is similar to multi dimensional scaling in that both examine interobject similarity by examining the complete set of interdependent relationships. The difference is that multi dimensional scaling identifies underlying dimensions, while cluster analysis identifies clusters. Cluster analysis is the obverse of factor analysis. Whereas factor analysis reduces the number of variables by grouping them into a smaller set of factors, cluster analysis reduces the number of observations or cases by grouping them into a smaller set of clusters.
In marketing, cluster analysis is used for:
 Segmenting the market and determining target markets
 Product positioning and New Product Development
 Selecting test markets (see : experimental techniques)
The basic procedure is:
 Formulate the problem  select the variables that you wish to apply the clustering technique to
 Select a distance measure  various ways of computing distance:
 Squared Euclidean distance  the square root of the sum of the squared differences in value for each variable
 Manhattan distance  the sum of the absolute differences in value for any variable
 Chebychev distance  the maximum absolute difference in values for any variable
 Select a clustering procedure (see below)
 Decide on the number of clusters
 Map and interpret clusters  draw conclusions  illustrative techniques like perceptual maps, icicle plots, and dendrograms are useful
 Assess reliability and validity  various methods:
 repeat analysis but use different distance measure
 repeat analysis but use different clustering technique
 split the data randomly into two halves and analyze each part separately
 repeat analysis several times, deleting one variable each time
 repeat analysis several times, using a different order each time
Clustering procedures
There are several types of clustering methods:
 NonHierarchical clustering (also called kmeans clustering)
 first determine a cluster center, then group all objects that are within a certain distance
 examples:
 Sequential Threshold method  first determine a cluster center, then group all objects that are within a predetermined threshold from the center  one cluster is created at a time
 Parallel Threshold method  simultaneously several cluster centers are determined, then objects that are within a predetermined threshold from the centers are grouped
 Optimizing Partitioning method  first a nonhierarchical procedure is run, then objects are reassigned so as to optimize an overall criterion.
 Hierarchical clustering
 objects are organized into an hierarchical structure as part of the procedure
 examples:
 Divisive clustering  start by treating all objects as if they are part of a single large cluster, then divide the cluster into smaller and smaller clusters
 Agglomerative clustering  start by treating each object as a separate cluster, then group them into bigger and bigger clusters
 examples:
 Centroid methods  clusters are generated that maximize the distance between the centers of clusters (a centroid is the mean value for all the objects in the cluster)
 Variance methods  clusters are generated that minimize the withincluster variance
 example:
 Ward’s Procedure  clusters are generated that minimize the squared Euclidean distance to the center mean
 example:
 Linkage methods  cluster objects based on the distance between them
 examples:
 Single Linkage method  cluster objects based on the minimum distance between them (also called the nearest neighbour rule)
 Complete Linkage method  cluster objects based on the maximum distance between them (also called the furthest neighbour rule)
 Average Linkage method  cluster objects based on the average distance between all pairs of objects (one member of the pair must be from a different cluster)
 examples:
 examples:
See also : marketing, marketing research, factor analysis, multi dimensional scaling, quantitative marketing research, positioning, perceptual mapping