This class provides an implemantation of CLARA (Clustering for Large Applications) algorithm. More...

Inheritance diagram for CLARA:

[legend]

Collaboration diagram for CLARA:

[legend]

Public Member Functions
_.Library.Boolean	IsPrepared ()
	Checks whether the model is ready for an analysis to be executed. More...

Public Member Functions inherited from PAM
_.Library.Double	ClusterCost (_.Library.Integer k)
	This class provides an implemantation of Partitioning Around Medoids (PAM) algorithm, a.k.a. More...

Public Member Functions inherited from AbstractModel
_.Library.Integer	ById (_.Library.RawString id)
	Returns the ordinal number of the point with the given ID id. More...

_.Library.Double	Distance (_.Library.Integer i, _.Library.Integer j, _.Library.Double p, _.Library.Boolean normalize)
	Returns the dissimilarity measure between two data points of the model. More...

_.Library.Double	Distance1 (_.Library.Integer i, z, _.Library.Double p, _.Library.Boolean normalize)
	Returns the dissimilarity measure between a data points of the model and a point with given coordinates. More...

_.Library.Double	Distance12 (z1, z2, _.Library.Double p, _.Library.Boolean normalize)
	Returns the dissimilarity measure between two points with given coordinates. More...

_.DeepSee.extensions.clusters.ASW	GetASWIndex ()
	Returns an object that can calculate an index used in Cluster Validation. More...

_.DeepSee.extensions.clusters.CalinskiHarabasz	GetCalinskiHarabaszIndex (_.Library.Integer normalize)
	Returns an object that can calculate an index used in Cluster Validation. More...

	GetCentroid (_.Library.Integer k, z)
	Returns the coordinates for the centroid for a given cluster. More...

_.Library.Integer	GetCluster (_.Library.Integer point)
	Returns the cluster ordinal for a given point. More...

	GetClusterSize (_.Library.Integer k)
	Returns the number of data points assigned to a given cluster. More...

_.Library.Integer	GetCost (_.Library.Integer i, _.Library.Integer j)
	Returns the dissimilarity measure as used by this clustering algorithm. More...

_.Library.Integer	GetCount ()
	Returns the number of all data points in the model.

_.Library.Integer	GetDimensions ()
	Returns the dimensionality of the model.

_.Library.String	GetId (_.Library.Integer i)
	Returns the unque Id of the point with the ordinal number specified by i. More...

_.Library.Integer	GetNumberOfClusters ()
	Returns the number of clusters in the model.

_.DeepSee.extensions.clusters.PearsonGamma	GetPearsonGammaIndex ()
	Returns an object that can calculate an index used in Cluster Validation. More...

	GlobalCentroid (z)
	Returns the coordinates for the centroid for the whole dataset. More...

_.Library.Double	RelativeClusterCost (_.Library.Integer k, _.Library.Integer m)
	Returns the realtive cost of a given cluster relative to a medoid point m. More...

	Reset ()
	Kills all the data associated with this model.

_.Library.Status	SetData (_.Library.IResultSet rs, _.Library.Integer dim, _.Library.Double nullReplacement)
	Sets the data to be associated with this model. More...

	iterateCluster (_.Library.Integer k, _.Library.Integer i, _.Library.String id, coordinates)
	Iterates over all the data points assigned to a given cluster. More...

	printAll ()
	Convenience method. More...

	printCluster (_.Library.Integer k)
	Convenience method. More...

Public Member Functions inherited from RegisteredObject
_.Library.Status	OnAddToSaveSet (_.Library.Integer depth, _.Library.Integer insert, _.Library.Integer callcount)
	This callback method is invoked when the current object is added to the SaveSet,. More...

_.Library.Status	OnClose ()
	This callback method is invoked by the <METHOD>Close</METHOD> method to. More...

_.Library.Status	OnConstructClone (_.Library.RegisteredObject object, _.Library.Boolean deep, _.Library.String cloned)
	This callback method is invoked by the <METHOD>ConstructClone</METHOD> method to. More...

_.Library.Status	OnNew ()
	This callback method is invoked by the <METHOD>New</METHOD> method to. More...

_.Library.Status	OnValidateObject ()
	This callback method is invoked by the <METHOD>ValidateObject</METHOD> method to. More...

Public Attributes
	CacheCost
	Unused in current implementation. More...

	NIdle
	A minimum number of idle iterations (i.e. More...

	SampleSize
	Sample Size to use for one PAM run. More...

	Treshold
	Treshold to determine when to stop. More...

	UseSA
	Whether to use Simulated Annealing in each PAM run for a sample (not recommended). More...

Public Attributes inherited from PAM
	K
	The number of clusters to create. More...

Public Attributes inherited from AbstractModel
	DSName
	More...

	Dim
	More...

	Normalize
	Whether to normalize distance across multiple dimensions. More...

	P
	The power to use in calculation of dissimilarity. More...

	Verbose
	More...

Additional Inherited Members
Static Public Member Functions inherited from AbstractModel
_.Library.Status	Delete (_.Library.String dataset)
	Deletes a model for a dataset with the name given by dataset argument.

_.Library.Boolean	Exists (_.Library.String dataset)
	Checks whether a model for a dataset with the name given by dataset argument already exists.

Static Public Attributes inherited from RegisteredObject
	CAPTION = None
	Optional name used by the Form Wizard for a class when generating forms. More...

	JAVATYPE = None
	The Java type to be used when exported.

	PROPERTYVALIDATION = None
	This parameter controls the default validation behavior for the object. More...

Detailed Description

This class provides an implemantation of CLARA (Clustering for Large Applications) algorithm.

An obvious way of clustering larger datasets is to try and extend existing methods so that they can cope with a larger number of objects. The focus is on clustering large numbers of objects rather than a small number of objects in high dimensions. Kaufman and Rousseeuw (1990) suggested the CLARA (Clustering for Large Applications) algorithm for tackling large applications. CLARA extends their k-medoids approach or a large number of objects. It works by clustering a sample from the dataset and then assigns all objects in the dataset to these clusters.

CLARA (CLustering LARge Applications) relies on the sampling approach to handle large data sets. Instead of finding medoids for the entire data set, CLARA draws a small sample from the data set and applies the PAM algorithm to generate an optimal set of medoids for the sample. The quality of resulting medoids is measured by the average dissimilarity between every object in the entire data set D and the medoid of its cluster

To alleviate sampling bias, CLARA repeats the sampling and clustering process a pre-defined number of times and subsequently selects as the final clustering result the set of medoids with the minimal cost.

Member Function Documentation

◆ IsPrepared()

_.Library.Boolean IsPrepared ( )

Checks whether the model is ready for an analysis to be executed.

This is dependent on a

specific algorithm and therefore this method is overriden by subclasses.

Reimplemented from PAM.

Member Data Documentation

◆ CacheCost

CacheCost

Unused in current implementation.

◆ NIdle

NIdle

A minimum number of idle iterations (i.e.

iterations that do not improve the total cost).

◆ SampleSize

SampleSize

Sample Size to use for one PAM run.

◆ Treshold

Treshold

Treshold to determine when to stop.

◆ UseSA

UseSA

Whether to use Simulated Annealing in each PAM run for a sample (not recommended).

Public Member Functions

Public Attributes

Additional Inherited Members