gap statistic

“A method of heuristically selecting the number of means (clusters) to use when clustering data. The number of clusters begins at K=1 and the total within-cluster variance is computed. As K is increased, this value drops. Plotting the total value against K often reveals a break point, presumably indicating the natural number of clusters to be used. Various gap statistics have been defined to formalize this change and highlight the number of clusters to choose for the K-means process.”

