Proposed by Thi-Bich-Hanh Dao
Cluster Analysis is a Data Mining task that aims at partitioning a given set of objects into clusters, such that the objects inside the same cluster are similar, while being different from the objects belonging to other clusters. We consider a dataset of objects and a dissimilarity measure between any two objects. The homogeneity of the cluster is usually expressed by an optimization criterion, which can be among other:
User previous knowledge can be integrated to clustering, which leads to Constrained Clustering. User constraints can be instance-level constraints or cluster-level constraints. Instance-level constraints are must-link or cannot-link constraints, which state that two objects must be or cannot be in the same cluster. Cluster-level constraints state requirements on the size, the diameter, the density, etc. of the clusters. All of the criteria except the split one are NP-Hard. The split criterion which is polynomial becomes NP-Hard with user constraints.