This class provides an implemantation of CLARA (Clustering for Large Applications) algorithm. More...
Public Member Functions | |
_.Library.Boolean | IsPrepared () |
Checks whether the model is ready for an analysis to be executed. More... | |
![]() | |
_.Library.Double | ClusterCost (_.Library.Integer k) |
This class provides an implemantation of Partitioning Around Medoids (PAM) algorithm, a.k.a. More... | |
![]() | |
_.Library.Integer | ById (_.Library.RawString id) |
Returns the ordinal number of the point with the given ID id. More... | |
_.Library.Double | Distance (_.Library.Integer i, _.Library.Integer j, _.Library.Double p, _.Library.Boolean normalize) |
Returns the dissimilarity measure between two data points of the model. More... | |
_.Library.Double | Distance1 (_.Library.Integer i, z, _.Library.Double p, _.Library.Boolean normalize) |
Returns the dissimilarity measure between a data points of the model and a point with given coordinates. More... | |
_.Library.Double | Distance12 (z1, z2, _.Library.Double p, _.Library.Boolean normalize) |
Returns the dissimilarity measure between two points with given coordinates. More... | |
_.DeepSee.extensions.clusters.ASW | GetASWIndex () |
Returns an object that can calculate an index used in Cluster Validation. More... | |
_.DeepSee.extensions.clusters.CalinskiHarabasz | GetCalinskiHarabaszIndex (_.Library.Integer normalize) |
Returns an object that can calculate an index used in Cluster Validation. More... | |
GetCentroid (_.Library.Integer k, z) | |
Returns the coordinates for the centroid for a given cluster. More... | |
_.Library.Integer | GetCluster (_.Library.Integer point) |
Returns the cluster ordinal for a given point. More... | |
GetClusterSize (_.Library.Integer k) | |
Returns the number of data points assigned to a given cluster. More... | |
_.Library.Integer | GetCost (_.Library.Integer i, _.Library.Integer j) |
Returns the dissimilarity measure as used by this clustering algorithm. More... | |
_.Library.Integer | GetCount () |
Returns the number of all data points in the model. | |
_.Library.Integer | GetDimensions () |
Returns the dimensionality of the model. | |
_.Library.String | GetId (_.Library.Integer i) |
Returns the unque Id of the point with the ordinal number specified by i. More... | |
_.Library.Integer | GetNumberOfClusters () |
Returns the number of clusters in the model. | |
_.DeepSee.extensions.clusters.PearsonGamma | GetPearsonGammaIndex () |
Returns an object that can calculate an index used in Cluster Validation. More... | |
GlobalCentroid (z) | |
Returns the coordinates for the centroid for the whole dataset. More... | |
_.Library.Double | RelativeClusterCost (_.Library.Integer k, _.Library.Integer m) |
Returns the realtive cost of a given cluster relative to a medoid point m. More... | |
Reset () | |
Kills all the data associated with this model. | |
_.Library.Status | SetData (_.Library.IResultSet rs, _.Library.Integer dim, _.Library.Double nullReplacement) |
Sets the data to be associated with this model. More... | |
iterateCluster (_.Library.Integer k, _.Library.Integer i, _.Library.String id, coordinates) | |
Iterates over all the data points assigned to a given cluster. More... | |
printAll () | |
Convenience method. More... | |
printCluster (_.Library.Integer k) | |
Convenience method. More... | |
![]() | |
_.Library.Status | OnAddToSaveSet (_.Library.Integer depth, _.Library.Integer insert, _.Library.Integer callcount) |
This callback method is invoked when the current object is added to the SaveSet,. More... | |
_.Library.Status | OnClose () |
This callback method is invoked by the <METHOD>Close</METHOD> method to. More... | |
_.Library.Status | OnConstructClone (_.Library.RegisteredObject object, _.Library.Boolean deep, _.Library.String cloned) |
This callback method is invoked by the <METHOD>ConstructClone</METHOD> method to. More... | |
_.Library.Status | OnNew () |
This callback method is invoked by the <METHOD>New</METHOD> method to. More... | |
_.Library.Status | OnValidateObject () |
This callback method is invoked by the <METHOD>ValidateObject</METHOD> method to. More... | |
Public Attributes | |
CacheCost | |
Unused in current implementation. More... | |
NIdle | |
A minimum number of idle iterations (i.e. More... | |
SampleSize | |
Sample Size to use for one PAM run. More... | |
Treshold | |
Treshold to determine when to stop. More... | |
UseSA | |
Whether to use Simulated Annealing in each PAM run for a sample (not recommended). More... | |
![]() | |
K | |
The number of clusters to create. More... | |
![]() | |
DSName | |
More... | |
Dim | |
More... | |
Normalize | |
Whether to normalize distance across multiple dimensions. More... | |
P | |
The power to use in calculation of dissimilarity. More... | |
Verbose | |
More... | |
Additional Inherited Members | |
![]() | |
_.Library.Status | Delete (_.Library.String dataset) |
Deletes a model for a dataset with the name given by dataset argument. | |
_.Library.Boolean | Exists (_.Library.String dataset) |
Checks whether a model for a dataset with the name given by dataset argument already exists. | |
![]() | |
CAPTION = None | |
Optional name used by the Form Wizard for a class when generating forms. More... | |
JAVATYPE = None | |
The Java type to be used when exported. | |
PROPERTYVALIDATION = None | |
This parameter controls the default validation behavior for the object. More... | |
This class provides an implemantation of CLARA (Clustering for Large Applications) algorithm.
An obvious way of clustering larger datasets is to try and extend existing methods so that they can cope with a larger number of objects. The focus is on clustering large numbers of objects rather than a small number of objects in high dimensions. Kaufman and Rousseeuw (1990) suggested the CLARA (Clustering for Large Applications) algorithm for tackling large applications. CLARA extends their k-medoids approach or a large number of objects. It works by clustering a sample from the dataset and then assigns all objects in the dataset to these clusters.
CLARA (CLustering LARge Applications) relies on the sampling approach to handle large data sets. Instead of finding medoids for the entire data set, CLARA draws a small sample from the data set and applies the PAM algorithm to generate an optimal set of medoids for the sample. The quality of resulting medoids is measured by the average dissimilarity between every object in the entire data set D and the medoid of its cluster
To alleviate sampling bias, CLARA repeats the sampling and clustering process a pre-defined number of times and subsequently selects as the final clustering result the set of medoids with the minimal cost.
_.Library.Boolean IsPrepared | ( | ) |
Checks whether the model is ready for an analysis to be executed.
This is dependent on a
specific algorithm and therefore this method is overriden by subclasses.
Reimplemented from PAM.
CacheCost |
Unused in current implementation.
NIdle |
A minimum number of idle iterations (i.e.
iterations that do not improve the total cost).
SampleSize |
Sample Size to use for one PAM run.
Treshold |
Treshold to determine when to stop.
UseSA |
Whether to use Simulated Annealing in each PAM run for a sample (not recommended).