IRISLIB database
Optimizer Class Reference
Inheritance diagram for Optimizer:
Collaboration diagram for Optimizer:

Public Member Functions

_.Library.Status OnClose ()
 This callback method is invoked by the <METHOD>Close</METHOD> method to. More...
 
_.Library.Status AddTerms (_.Library.Integer pCount, _.Library.Boolean pAtEnd)
 
_.Library.Status Cleanup ()
 This method clears the temporary artifacts the optimizer has created while optimizing,. More...
 
_.Library.Status Initialize ()
 Initializes this Optimizer instance. More...
 
_.Library.Status LoadTermsArray (pTerms, _.Library.Integer pListIndex)
 Loads all terms from the supplied array. More...
 
_.Library.Status LoadTermsSQL (_.Library.String pSQL)
 Loads a list of candidate terms based on a SQL query. More...
 
_.Library.Status Optimize (_.Library.Integer pMaxSteps)
 
_.Library.Status RemoveTerms (_.Library.Integer pCount)
 
_.Library.Status SaveClassifier (_.Library.String pClassName, _.Library.Boolean pOverwrite)
 Saves the <property>CurrentClassifier</property> class to the desired pClassName,. More...
 
- Public Member Functions inherited from RegisteredObject
_.Library.Status OnAddToSaveSet (_.Library.Integer depth, _.Library.Integer insert, _.Library.Integer callcount)
 This callback method is invoked when the current object is added to the SaveSet,. More...
 
_.Library.Status OnConstructClone (_.Library.RegisteredObject object, _.Library.Boolean deep, _.Library.String cloned)
 This callback method is invoked by the <METHOD>ConstructClone</METHOD> method to. More...
 
_.Library.Status OnNew ()
 This callback method is invoked by the <METHOD>New</METHOD> method to. More...
 
_.Library.Status OnValidateObject ()
 This callback method is invoked by the <METHOD>ValidateObject</METHOD> method to. More...
 

Public Attributes

 AddCount
 The number of terms to add during an <method>AddTerms</method> cycle. More...
 
 AddWindowSize
 The number of terms to test in each round. More...
 
 Builder
 The builder object to be optimized. More...
 
 CategoryWeights
 are retrieved from this array, indexed by category name. More...
 
 CurrentClassifier
 The class name of the current "best" classifier. More...
 
 CurrentScore
 The score of the current classifier. More...
 
 CurrentTestId
 The key to <class>DeepSee.PMML.Utils.TempResult</class> for the test results of. More...
 
 DomainId
 The domain using which the categorization model is being trained and tested. More...
 
 MaximalScoreDecrease
 The maximal decrease in performance the optimizer should accept when trying to remove terms. More...
 
 MetadataField
 The metadata field containing the actual category values to compare predictions against. More...
 
 MinimalScoreIncrease
 The minimal score increase % a term should ensure to be retained for further testing. More...
 
 RemoveCount
 The number of terms to remove in a "remove" cycle. More...
 
 RemoveStepRatio
 This should be a value between 0 and 1 (inclusive). More...
 
 ScoreMetric
 The default accuracy metric to use for evaluating test results, as used by <method>RankScores</method>. More...
 
 TestSet
 The test set to validate model accuracy increases/decreases against. More...
 
 Verbose
 If set to a boolean value, defines whether or not to write output to the current device during. More...
 

Private Member Functions

_.Library.Status __ClearTestInfo (_.Library.Integer pJobNumber, _.Library.Boolean pDropTestResults, _.Library.Boolean pDropTestClass)
 Clears internal and generated artifacts for one particular test.
 
_.Library.Status __RankScores (pJobInfo, pRanked, pNoScore)
 

Additional Inherited Members

- Static Public Attributes inherited from RegisteredObject
 CAPTION = None
 Optional name used by the Form Wizard for a class when generating forms. More...
 
 JAVATYPE = None
 The Java type to be used when exported.
 
 PROPERTYVALIDATION = None
 This parameter controls the default validation behavior for the object. More...
 

Detailed Description

This class automates selecting "appropriate" terms for a <class>iKnow.Classification.Builder</class>. After pointing an Optimizer instance to the Builder object that needs optimization, use the <method>LoadTermsArray</method> and <method>LoadTermsSQL</method> methods to queue a large number of potentially interesting terms the Optimizer should test. Then invoke its <method>Optimize</method> method to let the Optimizer loop through the list of suggested terms automatically and add those terms having the highest positive impact on model accuracy (as measured according to <property>ScoreMetric</property>), removing terms that were already added to the model but turn out to have no significant positive impact on the model's accuracy.

See the individual property descriptions of their impact on the optimization process.

Member Function Documentation

◆ OnClose()

_.Library.Status OnClose ( )

This callback method is invoked by the <METHOD>Close</METHOD> method to.

provide notification that the current object is being closed.

The return value of this method is ignored.

Reimplemented from RegisteredObject.

◆ AddTerms()

_.Library.Status AddTerms ( _.Library.Integer  pCount,
_.Library.Boolean  pAtEnd 
)

This method does one round of processing, testing <property>AddWindowSize</property> candidate

terms and selecting the best pCount terms according to <method>RankScores</method>, unless it wouldn't meet the <property>MinimalScoreIncreas</property> threshold.

If pCount < 0, it defaults to <property>RemoveCount</property>.

◆ Cleanup()

_.Library.Status Cleanup ( )

This method clears the temporary artifacts the optimizer has created while optimizing,.

such as the <property>CurrentClassifier</property> class and <property>CurrentTestId</property> test results.

◆ Initialize()

_.Library.Status Initialize ( )

Initializes this Optimizer instance.

This method is called automatically as part of <method>Optimize</method>

◆ LoadTermsArray()

_.Library.Status LoadTermsArray (   pTerms,
_.Library.Integer  pListIndex 
)

Loads all terms from the supplied array.

If pListIndex is non-zero, the term info is read from that index at each array position. If the term info itself is a list structure as well, it is interpreted as follows: pTerms(n) = $lb(term, type, negationpolicy, matchpolicy)

◆ LoadTermsSQL()

_.Library.Status LoadTermsSQL ( _.Library.String  pSQL)

Loads a list of candidate terms based on a SQL query.

The query should return a column named

"term" containing the term's value and may return columns named "type", "negation" and "match" to configure the type, negation and count policy for each term being retrieved, respectively.

◆ Optimize()

_.Library.Status Optimize ( _.Library.Integer  pMaxSteps)

In at most pMaxSteps steps, the current <property>Builder</property> will be optimized by

testing, one at a time, the terms added through <method>LoadTermsSQL</method> and <method>LoadTermsArray</method>, judging which term works best for each test window by the results of <method>RankScores</method> (see also <method>AddTerms</method>). Every (1/<property>RemoveStepRatio</propery>) rounds, all terms in the dictionary so far will be tested for their contribution to the current model score and the lowest <property>RemoveCount</property> terms will be removed (see also <method>RemoveTerms</method>).

At the end of the optimization process, in addition to <property>Builder</property> being updated, <property>CurrentClassifier</property> will contain the class name of the last test class used to achieve the best result and pTestId will point to the test results for that class.

◆ __RankScores()

_.Library.Status __RankScores (   pJobInfo,
  pRanked,
  pNoScore 
)
private

This method ranks the test results in pJobInfo according to the desired "score".

By default, it will just look at the value of the metric identified by <property>ScoreMetric</property>, but this method can be overridden to calculate in more detail. When this method returns, pRanked is an ordered array containing the job IDs and score in ASCENDING order (pRanked(1) is the worst job):

pRanked([position]) = $lb([jobID], [score])

pJobInfo should contain the following information:
pJobInfo([jobID], "scores", [metric]) = [value]
pJobInfo([jobID], "testid") = test ID
pJobInfo([jobID], "term") = [term ID] (not for initial evaluation)

See also <method>GetScore</method>

◆ RemoveTerms()

_.Library.Status RemoveTerms ( _.Library.Integer  pCount)

Test the impact of removing each term in the current model's TermDictionary individually.

The pCount terms for which, after removing it, <method>RankScores</method> still returns the best score (which supposedly implies its contribution was minimial), will be removed from the TermDictionary, unless the decrease in performance surpasses <property>MaximalScoreDecrease</property>.

If pCount < 0, it defaults to <property>RemoveCount</property>.

◆ SaveClassifier()

_.Library.Status SaveClassifier ( _.Library.String  pClassName,
_.Library.Boolean  pOverwrite 
)

Saves the <property>CurrentClassifier</property> class to the desired pClassName,.

so it will not be removed after this Optimizer instance is dropped. If <property>CurrentClassifier</property> is not set or if the class no longer exists for other reasons, the current builder object will create a classifier class based on its current state.

Member Data Documentation

◆ AddCount

AddCount

The number of terms to add during an <method>AddTerms</method> cycle.

The top results according

to <method>RankScores</method> will be added, as selected from the <property>AddWindowSize</property> terms tested in the cycle.  

◆ AddWindowSize

AddWindowSize

The number of terms to test in each round.

If left at 0, this defaults to the number of

cores the system has available, which should be most efficient.  

◆ Builder

The builder object to be optimized.

 

◆ CategoryWeights

CategoryWeights

are retrieved from this array, indexed by category name.

If <property>ScoreMetric</property> is set to a 'Weighted*' value, the weights for each category If no category weight is set, it is assumed to be 0.

Note: Weights don't need to add up to 1.

 

◆ CurrentClassifier

CurrentClassifier

The class name of the current "best" classifier.

This value is set during <method>Optimize</method>, or as part of the <method>AddTerms</method> and <method>RemoveTerms</method> methods.  

◆ CurrentScore

CurrentScore

The score of the current classifier.

This value is updated by <method>AddTerms</method>

and <method>RemoveTerms</method>.  

◆ CurrentTestId

CurrentTestId

The key to <class>DeepSee.PMML.Utils.TempResult</class> for the test results of.

<property>CurrentClassifier</property>.  

◆ DomainId

DomainId

The domain using which the categorization model is being trained and tested.

This assumes the value of the Builder's DomainId property when registering an IKnowBuilder instance as <property>Builder</property>, if not set explicitly.  

◆ MaximalScoreDecrease

MaximalScoreDecrease

The maximal decrease in performance the optimizer should accept when trying to remove terms.

If removing a term would imply a decrease larger than this figure, it will not be removed. A value of 1 means the maximal score decrease is 1%  

◆ MetadataField

MetadataField

The metadata field containing the actual category values to compare predictions against.

This assumes the value of the Builder's MetadataField property when registering an IKnowBuilder instance as <property>Builder</property>, if not set explicitly.  

◆ MinimalScoreIncrease

MinimalScoreIncrease

The minimal score increase % a term should ensure to be retained for further testing.

If the

score does not increase by at least this figure, it will be discarded from the list of terms to test. A value of 1 means the minimal score increase should be 1%  

◆ RemoveCount

RemoveCount

The number of terms to remove in a "remove" cycle.

Setting this value > 1 assumes the terms

deemed irrelevant (and scheduled to be removed) don't influence one another much and removing more in a single cycle will not worsen performance much more than the individual performance changes of each term removal alone.  

◆ RemoveStepRatio

RemoveStepRatio

This should be a value between 0 and 1 (inclusive).

The ratio of <method>RemoveTerms</method> cycles vs <method>AddTerms</method> cycles.

Note: Remove cycles take significantly longer than add cycles

 

◆ ScoreMetric

ScoreMetric

The default accuracy metric to use for evaluating test results, as used by <method>RankScores</method>.

If set to a 'Weighted*' value, the weights are retrieved from <property>CategoryWeights</property>.  

◆ TestSet

TestSet

The test set to validate model accuracy increases/decreases against.

 

◆ Verbose

Verbose

If set to a boolean value, defines whether or not to write output to the current device during.

the <method>Optimize</method> method. If set to a string, it is treated as a global reference to which output needs to be written.