class: DataClustering

public class: DataClustering
Project: Phase: 1.0; Status: Proposed; Version: 1.0; Complexity: 1
Dates: Created: 14.12.2005 23:04:09; Modified: 14.12.2005 23:06:19;
Flags: Active: false; IsRoot: false; IsLeaf: false;
Extension Points:
UUID: {F4EDE0E5-DBBC-43b6-A0F7-CC9EBC8BF327}
DataClustering is a class provided cluster algorithm


Goto: Fields, Methods

See also: IClusterable

Appears in: CD Clustering

  • Dependency link to interface [CD Clustering].IClusterable    DataClustering class cluster only elements, that implements IClusterable interface
DataClustering Attributes
Attribute Details
public static int
Initial: 1
public static int
Initial: 2
public static int
Initial: 3
public static int
Initial: 4
public static int
Initial: 5
DataClustering Methods
Operation Details
Notes: Constructor
   IClusterable[] data,
   int endClusters,
   int distance):void
Notes: Hierarchical clustering builds (agglomerative), or breaks up (divisive), a hierarchy of clusters. The traditional representation of this hierarchy is a tree, with individual elements at one end and a single cluster with every element at the other. Agglomerative algorithms begin at the top of the tree, whereas divisive algorithms begin at the bottom. Cutting the tree at a given height will give a clustering at a selected precision. This method builds the hierarchy from the individual elements by progressively merging clusters. We have six elements {a} {b} {c} {d} {e} and {f}. The first step is to determine which elements to merge in a cluster. Usually, we want to take the two closest elements, therefore we must define a distance d(element1,element2) between elements. Suppose we have merged the two closest elements b and c, we now have the following clusters {a}, {b, c}, {d}, {e} and {f}, and want to merge them further. But to do that, we need to take the distance between {a} and {b c}, and therefore define the distance between two clusters. Usually the distance between two clusters A and B is one of the following: - the maximum distance between elements of each cluster - the minimum distance between elements of each cluster - the mean distance between elements of each cluster
   IClusterable[] data,
   int endClusters):void
Notes: K-means clustering The k-means algorithm assigns each point to the cluster whose center (or centroid) is nearest. The centroid is the point generated by computing the arithmetic mean for each dimension separately for all the points in the cluster. Example: The data set has three dimensions and the cluster has two points: X = (x1, x2, x3) and Y = (y1, y2, y3). Then the centroid Z becomes Z = (z1, z2, z3), where z1 = (x1 + y1)/2 and z2 = (x2 + y2)/2 and z3 = (x3 + y3)/2 This is the basic structure of the algorithm: - Randomly generate k clusters and determine the cluster centers or directly generate k seed points as cluster centers - Assign each point to the nearest cluster center. - Recompute the new cluster centers. - Repeat until some convergence criterion is met (usually that the assignment hasn't changed). The main advantages of this algorithm are its simplicity and speed, which allows it to run on large datasets. Yet it does not systematically yield the same result with each run of the algorithm. Rather, the resulting clusters depend on the initial assignments. The k-means algorithm maximizes inter-cluster (or minimizes intra-cluster) variance, but does not ensure that the solution given is not a local minimum of variance
   int first,
   int second,
   ArrayList clusters,
   int distance):double
Notes: Calculate distance between two clusters

@returns distance
   double[] a,
   double[] b):double
Notes: Calculating Eucledian distance between two vectors

@returns distance
   ArrayList vectors):double
Notes: Calculating center of cluster

@returns position of cluster center
   double[] vector):void
Notes: Set all elements of vectors to zero