DataClustering

class: DataClustering

public class: DataClustering

Author:
Project:	Phase: 1.0; Status: Proposed; Version: 1.0; Complexity: 1
Dates:	Created: 14.12.2005 23:04:09; Modified: 14.12.2005 23:06:19;
Flags:	Active: false; IsRoot: false; IsLeaf: false;
Extension Points:
UUID:	{F4EDE0E5-DBBC-43b6-A0F7-CC9EBC8BF327}
DataClustering is a class provided cluster algorithm

Goto: Fields, Methods

See also: IClusterable

Appears in: CD Clustering

Connections

Dependency link to interface [CD Clustering].IClusterable DataClustering class cluster only elements, that implements IClusterable interface

DataClustering Attributes

Attribute Details

public static int
  MAXIMUM_DISTANCE Initial: 1

public static int
  MINIMUM_DISTANCE Initial: 2

public static int
  MEAN_DISTANCE Initial: 3

public static int
  HIERARCHICAL_CLUSTERING Initial: 4

public static int
  KMEAN_CLUSTERING Initial: 5

DataClustering Methods

Operation Details

public
DataClustering():

Notes: Constructor

public
HierarchicalClustering(
   IClusterable[] data,
   int endClusters,
   int distance):void

Notes: Hierarchical clustering builds (agglomerative), or breaks up (divisive), a hierarchy of clusters. The traditional representation of this hierarchy is a tree, with individual elements at one end and a single cluster with every element at the other. Agglomerative algorithms begin at the top of the tree, whereas divisive algorithms begin at the bottom. Cutting the tree at a given height will give a clustering at a selected precision. This method builds the hierarchy from the individual elements by progressively merging clusters. We have six elements {a} {b} {c} {d} {e} and {f}. The first step is to determine which elements to merge in a cluster. Usually, we want to take the two closest elements, therefore we must define a distance d(element1,element2) between elements. Suppose we have merged the two closest elements b and c, we now have the following clusters {a}, {b, c}, {d}, {e} and {f}, and want to merge them further. But to do that, we need to take the distance between {a} and {b c}, and therefore define the distance between two clusters. Usually the distance between two clusters A and B is one of the following: - the maximum distance between elements of each cluster - the minimum distance between elements of each cluster - the mean distance between elements of each cluster

public
KMeanClustering(
   IClusterable[] data,
   int endClusters):void

Notes: K-means clustering The k-means algorithm assigns each point to the cluster whose center (or centroid) is nearest. The centroid is the point generated by computing the arithmetic mean for each dimension separately for all the points in the cluster. Example: The data set has three dimensions and the cluster has two points: X = (x1, x2, x3) and Y = (y1, y2, y3). Then the centroid Z becomes Z = (z1, z2, z3), where z1 = (x1 + y1)/2 and z2 = (x2 + y2)/2 and z3 = (x3 + y3)/2 This is the basic structure of the algorithm: - Randomly generate k clusters and determine the cluster centers or directly generate k seed points as cluster centers - Assign each point to the nearest cluster center. - Recompute the new cluster centers. - Repeat until some convergence criterion is met (usually that the assignment hasn't changed). The main advantages of this algorithm are its simplicity and speed, which allows it to run on large datasets. Yet it does not systematically yield the same result with each run of the algorithm. Rather, the resulting clusters depend on the initial assignments. The k-means algorithm maximizes inter-cluster (or minimizes intra-cluster) variance, but does not ensure that the solution given is not a local minimum of variance

private
Distance(
   int first,
   int second,
   ArrayList clusters,
   int distance):double

Notes: Calculate distance between two clusters

@returns distance

private
VectorDistance(
   double[] a,
   double[] b):double

Notes: Calculating Eucledian distance between two vectors

@returns distance

private
Center(
   ArrayList vectors):double

Notes: Calculating center of cluster

@returns position of cluster center

private
ZeroVector(
   double[] vector):void

Notes: Set all elements of vectors to zero

class: DataClustering