IRISLIB database
CLARA Class Reference

This class provides an implemantation of CLARA (Clustering for Large Applications) algorithm. More...

Inheritance diagram for CLARA:
Collaboration diagram for CLARA:

Public Member Functions

_.Library.Boolean IsPrepared ()
 Checks whether the model is ready for an analysis to be executed. More...
 
- Public Member Functions inherited from PAM
_.Library.Double ClusterCost (_.Library.Integer k)
 This class provides an implemantation of Partitioning Around Medoids (PAM) algorithm, a.k.a. More...
 
- Public Member Functions inherited from AbstractModel
_.Library.Integer ById (_.Library.RawString id)
 Returns the ordinal number of the point with the given ID id. More...
 
_.Library.Double Distance (_.Library.Integer i, _.Library.Integer j, _.Library.Double p, _.Library.Boolean normalize)
 Returns the dissimilarity measure between two data points of the model. More...
 
_.Library.Double Distance1 (_.Library.Integer i, z, _.Library.Double p, _.Library.Boolean normalize)
 Returns the dissimilarity measure between a data points of the model and a point with given coordinates. More...
 
_.Library.Double Distance12 (z1, z2, _.Library.Double p, _.Library.Boolean normalize)
 Returns the dissimilarity measure between two points with given coordinates. More...
 
_.DeepSee.extensions.clusters.ASW GetASWIndex ()
 Returns an object that can calculate an index used in Cluster Validation. More...
 
_.DeepSee.extensions.clusters.CalinskiHarabasz GetCalinskiHarabaszIndex (_.Library.Integer normalize)
 Returns an object that can calculate an index used in Cluster Validation. More...
 
 GetCentroid (_.Library.Integer k, z)
 Returns the coordinates for the centroid for a given cluster. More...
 
_.Library.Integer GetCluster (_.Library.Integer point)
 Returns the cluster ordinal for a given point. More...
 
 GetClusterSize (_.Library.Integer k)
 Returns the number of data points assigned to a given cluster. More...
 
_.Library.Integer GetCost (_.Library.Integer i, _.Library.Integer j)
 Returns the dissimilarity measure as used by this clustering algorithm. More...
 
_.Library.Integer GetCount ()
 Returns the number of all data points in the model.
 
_.Library.Integer GetDimensions ()
 Returns the dimensionality of the model.
 
_.Library.String GetId (_.Library.Integer i)
 Returns the unque Id of the point with the ordinal number specified by i. More...
 
_.Library.Integer GetNumberOfClusters ()
 Returns the number of clusters in the model.
 
_.DeepSee.extensions.clusters.PearsonGamma GetPearsonGammaIndex ()
 Returns an object that can calculate an index used in Cluster Validation. More...
 
 GlobalCentroid (z)
 Returns the coordinates for the centroid for the whole dataset. More...
 
_.Library.Double RelativeClusterCost (_.Library.Integer k, _.Library.Integer m)
 Returns the realtive cost of a given cluster relative to a medoid point m. More...
 
 Reset ()
 Kills all the data associated with this model.
 
_.Library.Status SetData (_.Library.IResultSet rs, _.Library.Integer dim, _.Library.Double nullReplacement)
 Sets the data to be associated with this model. More...
 
 iterateCluster (_.Library.Integer k, _.Library.Integer i, _.Library.String id, coordinates)
 Iterates over all the data points assigned to a given cluster. More...
 
 printAll ()
 Convenience method. More...
 
 printCluster (_.Library.Integer k)
 Convenience method. More...
 
- Public Member Functions inherited from RegisteredObject
_.Library.Status OnAddToSaveSet (_.Library.Integer depth, _.Library.Integer insert, _.Library.Integer callcount)
 This callback method is invoked when the current object is added to the SaveSet,. More...
 
_.Library.Status OnClose ()
 This callback method is invoked by the <METHOD>Close</METHOD> method to. More...
 
_.Library.Status OnConstructClone (_.Library.RegisteredObject object, _.Library.Boolean deep, _.Library.String cloned)
 This callback method is invoked by the <METHOD>ConstructClone</METHOD> method to. More...
 
_.Library.Status OnNew ()
 This callback method is invoked by the <METHOD>New</METHOD> method to. More...
 
_.Library.Status OnValidateObject ()
 This callback method is invoked by the <METHOD>ValidateObject</METHOD> method to. More...
 

Public Attributes

 CacheCost
 Unused in current implementation. More...
 
 NIdle
 A minimum number of idle iterations (i.e. More...
 
 SampleSize
 Sample Size to use for one PAM run. More...
 
 Treshold
 Treshold to determine when to stop. More...
 
 UseSA
 Whether to use Simulated Annealing in each PAM run for a sample (not recommended). More...
 
- Public Attributes inherited from PAM
 K
 The number of clusters to create. More...
 
- Public Attributes inherited from AbstractModel
 DSName
   More...
 
 Dim
   More...
 
 Normalize
 Whether to normalize distance across multiple dimensions. More...
 
 P
 The power to use in calculation of dissimilarity. More...
 
 Verbose
   More...
 

Additional Inherited Members

- Static Public Member Functions inherited from AbstractModel
_.Library.Status Delete (_.Library.String dataset)
 Deletes a model for a dataset with the name given by dataset argument.
 
_.Library.Boolean Exists (_.Library.String dataset)
 Checks whether a model for a dataset with the name given by dataset argument already exists.
 
- Static Public Attributes inherited from RegisteredObject
 CAPTION = None
 Optional name used by the Form Wizard for a class when generating forms. More...
 
 JAVATYPE = None
 The Java type to be used when exported.
 
 PROPERTYVALIDATION = None
 This parameter controls the default validation behavior for the object. More...
 

Detailed Description

This class provides an implemantation of CLARA (Clustering for Large Applications) algorithm.

An obvious way of clustering larger datasets is to try and extend existing methods so that they can cope with a larger number of objects. The focus is on clustering large numbers of objects rather than a small number of objects in high dimensions. Kaufman and Rousseeuw (1990) suggested the CLARA (Clustering for Large Applications) algorithm for tackling large applications. CLARA extends their k-medoids approach or a large number of objects. It works by clustering a sample from the dataset and then assigns all objects in the dataset to these clusters.

CLARA (CLustering LARge Applications) relies on the sampling approach to handle large data sets. Instead of finding medoids for the entire data set, CLARA draws a small sample from the data set and applies the PAM algorithm to generate an optimal set of medoids for the sample. The quality of resulting medoids is measured by the average dissimilarity between every object in the entire data set D and the medoid of its cluster

To alleviate sampling bias, CLARA repeats the sampling and clustering process a pre-defined number of times and subsequently selects as the final clustering result the set of medoids with the minimal cost.

Member Function Documentation

◆ IsPrepared()

_.Library.Boolean IsPrepared ( )

Checks whether the model is ready for an analysis to be executed.

This is dependent on a

specific algorithm and therefore this method is overriden by subclasses.

Reimplemented from PAM.

Member Data Documentation

◆ CacheCost

CacheCost

Unused in current implementation.

 

◆ NIdle

NIdle

A minimum number of idle iterations (i.e.

iterations that do not improve the total cost).

 

◆ SampleSize

SampleSize

Sample Size to use for one PAM run.

 

◆ Treshold

Treshold

Treshold to determine when to stop.

 

◆ UseSA

UseSA

Whether to use Simulated Annealing in each PAM run for a sample (not recommended).