data:image/s3,"s3://crabby-images/c55d0/c55d094512e27c6aae274aa50ac71cb140452e76" alt="Numerical Computing with Python"
The elbow method
The elbow method is used to determine the optimal number of clusters in k-means clustering. The elbow method plots the value of the cost function produced by different values of k. As you know, if k increases, average distortion will decrease, each cluster will have fewer constituent instances, and the instances will be closer to their respective centroids. However, the improvements in average distortion will decline as k increases. The value of k at which improvement in distortion declines the most is called the elbow, at which we should stop dividing the data into further clusters.
data:image/s3,"s3://crabby-images/28989/28989a0c9304a4676217fa1d9f3e35798d0c243a" alt=""
Evaluation of clusters with silhouette coefficient: the silhouette coefficient is a measure of the compactness and separation of the clusters. Higher values represent a better quality of cluster. The silhouette coefficient is higher for compact clusters that are well separated and lower for overlapping clusters. Silhouette coefficient values do change from -1 to +1, and the higher the value is, the better.
The silhouette coefficient is calculated per instance. For a set of instances, it is calculated as the mean of the individual sample's scores.
data:image/s3,"s3://crabby-images/59d58/59d58f76b6935c04100149adf792289c2075c917" alt=""
a is the mean distance between the instances in the cluster, b is the mean distance between the instance and the instances in the next closest cluster.