

YeeKal โ€ข โ€ข

Training datas without labels are just a set of points.


  • partitional algorithms
    • k-means clustering
    • mixture-model based clustering
  • hierarchical algorithms
    • bottom-up agglomerative
    • top-down divisive
  • density-based algorithms
    • DBSCAN

hierarchical clustering

Build a tree-based hierarchical cludtering of a set of documents

partioning algorithms

construct apartion of n objects into a set of k clusters

k-means algorithm


  • K(number of clusters)
  • training set ${x^{(1) \cdots x^{(m)}}}$

Optimization objective:


randomly initialize K cluster centroids
while true{
    for i=1 to m
        c:=index of cluster centroid closest to x_i
    for k=1 to K
        u:=means of points assigned to cluster k
//if no points assigned to cluster, then decrease k of re-initialize the centroids

In random initialization, randomly pick K training examples and set the centroids equal to these examples. Maybe the initialization could have many loops and the one with smallest J will be choosed.

In choosing the value of K, the elbow method suggests the K will be the corner of the J-k curve.