Operation |
Details |
public DataClustering(): |
|
public HierarchicalClustering( IClusterable[] data, int endClusters, int distance):void |
Notes: | Hierarchical clustering builds (agglomerative), or breaks up (divisive), a hierarchy of clusters. The traditional representation of this hierarchy is a tree, with individual elements at one end and a single cluster with every element at the other. Agglomerative algorithms begin at the top of the tree, whereas divisive algorithms begin at the bottom. Cutting the tree at a given height will give a clustering at a selected precision. This method builds the hierarchy from the individual elements by progressively merging clusters. We have six elements {a} {b} {c} {d} {e} and {f}. The first step is to determine which elements to merge in a cluster. Usually, we want to take the two closest elements, therefore we must define a distance d(element1,element2) between elements. Suppose we have merged the two closest elements b and c, we now have the following clusters {a}, {b, c}, {d}, {e} and {f}, and want to merge them further. But to do that, we need to take the distance between {a} and {b c}, and therefore define the distance between two clusters. Usually the distance between two clusters A and B is one of the following: - the maximum distance between elements of each cluster - the minimum distance between elements of each cluster - the mean distance between elements of each cluster | |
public KMeanClustering( IClusterable[] data, int endClusters):void |
Notes: | K-means clustering The k-means algorithm assigns each point to the cluster whose center (or centroid) is nearest. The centroid is the point generated by computing the arithmetic mean for each dimension separately for all the points in the cluster. Example: The data set has three dimensions and the cluster has two points: X = (x1, x2, x3) and Y = (y1, y2, y3). Then the centroid Z becomes Z = (z1, z2, z3), where z1 = (x1 + y1)/2 and z2 = (x2 + y2)/2 and z3 = (x3 + y3)/2 This is the basic structure of the algorithm: - Randomly generate k clusters and determine the cluster centers or directly generate k seed points as cluster centers - Assign each point to the nearest cluster center. - Recompute the new cluster centers. - Repeat until some convergence criterion is met (usually that the assignment hasn't changed). The main advantages of this algorithm are its simplicity and speed, which allows it to run on large datasets. Yet it does not systematically yield the same result with each run of the algorithm. Rather, the resulting clusters depend on the initial assignments. The k-means algorithm maximizes inter-cluster (or minimizes intra-cluster) variance, but does not ensure that the solution given is not a local minimum of variance | |
private Distance( int first, int second, ArrayList clusters, int distance):double |
Notes: | Calculate distance between two clusters
@returns distance | |
private VectorDistance( double[] a, double[] b):double |
Notes: | Calculating Eucledian distance between two vectors
@returns distance | |
private Center( ArrayList vectors):double |
Notes: | Calculating center of cluster
@returns position of cluster center | |
private ZeroVector( double[] vector):void |
Notes: | Set all elements of vectors to zero | |
|