Determining the Best K for Clustering Transactional Datasets: A Coverage Density-based Approach

TitleDetermining the Best K for Clustering Transactional Datasets: A Coverage Density-based Approach
Publication TypeJournal Article
Year of Publication2008
AuthorsHua Yan, Keke Chen, Ling Liu
Keywordsan agglomerative hierachical clustering algorithm is developed and the Merge Dissimilarity Indexes, The problem of determining the optimal number of clusters is important but mysterious in cluster analysis. In this paper, we propose a novel method to find a set of candidate optimal number Ks of clusters in transactional datasets. Concretely, we propose Transactional-cluster-modes Dissimilarity based on the concept of coverage density as an intuitive transactional inter-cluster dissimilarity measure. Based on the above measure, which are generated in hierachical cluster merging processes
Abstract

The problem of determining the optimal number of clusters is important but mysterious in cluster analysis. In this paper, we propose a novel method to find a set of candidate optimal number Ks of clusters in transactional datasets. Concretely, we propose Transactional-cluster-modes Dissimilarity based on the concept of coverage density as an intuitive transactional inter-cluster dissimilarity measure. Based on the above measure, an agglomerative hierachical clustering algorithm is developed and the Merge Dissimilarity Indexes, which are generated in hierachical cluster merging processes, are used to find the candidate optimal number Ks of clusters of transactional data. Our experimental results on both synthetic and real data show that the new method often effectively estimates the number of clusters of transactional data.

Full Text

Hua Yan, Keke Chen and Ling Liu, 'Determining the Best K for Clustering Transactional Datasets: A Coverage Density-based Approach ',in Journal of Data and Knowledge Engineering (DKE), 2008. [pdf]
publisher: Journal of Data and Knowledge Engineering
year: 2008