See <CLASS>Text.Text</CLASS> More...
Static Public Member Functions | |
_.Library.Status | ExcludeCommonTerms (nWords) |
Classifies the most common nTerms words in the current language as noise words. More... | |
_.Library.String | SeparateWords (_.Library.String rawText) |
Separates individual terms with whitespace, for languages such as Japanese. | |
![]() | |
_.Library.Status | AddDocToDictionary (_.Library.String document, _.Library.String category) |
Add words of the specified document to the ^SYSDict global. More... | |
_.Library.Status | AddToDictionary (_.Library.String word, _.Library.Integer wordType, _.Library.String category, _.Library.Integer wCount) |
Add the specified word or phrase to the current dictionary. More... | |
_.Library.Status | BuildValueArray (_.Library.Binary document, _.Library.Binary valueArray) |
The <METHOD>BuildValueArray</METHOD> method tokenizes a text string into a collection of. More... | |
_.Library.String | ChooseSearchKey (_.Library.String document) |
If we must choose exactly one indexable search string from a pattern that. More... | |
_.Library.List | Classify (_.Library.String document, _.Library.Integer topN, maxDocFreq) |
Classify document into one of the known categories using a semi-naive Bayesian classification algorithm. More... | |
_.Library.List | CreateQList (_.Library.String document, _.Library.String coll) |
Internal method used by the <METHOD>Similarity</METHOD> and <METHOD>SimilarityIdx</METHOD> More... | |
_.Library.String | DecompressOffsets (_.Library.String compressed) |
Converts the offsets from compressed to uncompressed form. | |
DropDictionary () | |
Deletes all of the words, noisewords, etc. More... | |
_.Library.List | MakeSearchTerms (_.Library.String searchPattern, _.Library.Integer ngramlen) |
Convert a string into a list of search terms, such that each search term contains no. More... | |
_.Library.Numeric | Similarity (_.Library.String document, _.Library.List qList) |
See also <METHOD>SimilarityIdx</METHOD> | |
_.Library.Numeric | SimilarityIdx (_.Library.String ID, _.Library.String textIndex, _.Library.List qList) |
_.Library.String | Standardize (_.Library.String document, _.Library.Boolean origtext) |
Returns the specified string in standardized form, that is: stemmed, filtered, translated,. More... | |
setto (_.Library.String b, _.Library.String s, _.Library.Integer j, _.Library.Integer k) | |
setto(s) sets (j+1),...k to the characters in the string s, readjusting k. | |
![]() | |
_.Library.String | DisplayToLogical (_, _.Library.String val) |
Converts the input value val, which is a string, into the logical string format. More... | |
_.Library.Status | IsValid (_, _.Library.RawString val) |
Tests if the logical value val, which is a string, is valid. More... | |
_.Library.String | JSONToLogical (_, _.Library.String val) |
If JSONLISTPARAMETER is specified, XSDToLogical is generated which imports using the list specified by JSONLISTPARAMETER. | |
_.Library.String | LogicalToDisplay (_, _.Library.String val) |
Converts the value of val, which is in logical format, into a display string. More... | |
_.Library.String | LogicalToJSON (_, _.Library.String val) |
If JSONLISTPARAMETER is specified, XSDToLogical is generated which exports using the list specified by JSONLISTPARAMETER. | |
_.Library.String | LogicalToXSD (_, _.Library.String val) |
If XMLLISTPARAMETER is specified, XSDToLogical is generated which exports using the list specified by XMLLISTPARAMETER. | |
_.Library.String | Normalize (_, _.Library.RawString val) |
Truncates value val to MAXLEN, characters. | |
_.Library.String | XSDToLogical (_, _.Library.String val) |
If XMLLISTPARAMETER is specified, XSDToLogical is generated which imports using the list specified by XMLLISTPARAMETER. | |
Static Public Attributes | |
CASEINSENSITIVE = None | |
See <CLASS>Text.Text</CLASS> More... | |
![]() | |
CASEINSENSITIVE = None | |
The Text.Text data type class implements the methods used by InterSystems IRIS for full text indexing, text search, similarity scoring, automatic classification, dictionary management, word stemming, n-gram key creation, and noise word filtering. More... | |
DICTIONARY = None | |
The default dictionary for properties of this class. More... | |
FILTERNOISEWORDS = None | |
<PARAMETER>FILTERNOISEWORDS</PARAMETER> controls whether common-word filtering is enabled. More... | |
IGNOREMARKUP = None | |
<PARAMETER>IGNOREMARKUP</PARAMETER> is a Boolean (0/1) flag. More... | |
MAXLEN = None | |
By default, there is no default MAXLEN; that is, it must be specified wherever a Text.Text. More... | |
MAXOCCURS = None | |
Text search applications sometimes need to highlight the matching terms found. More... | |
MAXWORDLEN = None | |
<PARAMETER>MAXWORDLEN</PARAMETER> specifies the maximum word length that will be retained. More... | |
MINWORDLEN = None | |
MINWORDLEN specifies the minimum length word that will be retained. More... | |
NGRAMLEN = None | |
<PARAMETER>NGRAMLEN</PARAMETER> is the maximum number of words that will be regarded as a single More... | |
NOISEWORDS100 = None | |
NOISEWORDSnnn lists the most common words in the language, in order of their frequency of occurrence. More... | |
NUMCHARS = None | |
<PARAMETER>NUMCHARS</PARAMETER> specifies the characters other than digits that may appear More... | |
NUMERIC = None | |
<PARAMETER>NUMERIC</PARAMETER> specifies whether numeric terms will be retained(1) or ignored(0). | |
OKAPIBM25B = None | |
See <METHOD>SimilarityIdx</METHOD> | |
OKAPIBM25K1 = None | |
See <METHOD>SimilarityIdx</METHOD> | |
OKAPIBM25K3 = None | |
See <METHOD>SimilarityIdx</METHOD> | |
SEPARATEWORDS = None | |
Languages such as Japanese require the raw document text to be parsed and. More... | |
SOURCELANGUAGE = None | |
<PARAMETER>SOURCELANGUAGEUAGE</PARAMETER> specifies the default source language to translate More... | |
STEMMING = None | |
<PARAMETER>STEMMING</PARAMETER> replaces each word by its language-specific stem to improve the More... | |
TARGETLANGUAGE = None | |
<PARAMETER>TARGETLANGUAGE</PARAMETER> specifies the default target language to translate More... | |
TARGETLANGUAGECLASS = None | |
<PARAMETER>TARGETLANGUAGECLASS</PARAMETER> specifies the class to use when <PARAMETER>TARGETLANGUAGE</PARAMETER> More... | |
THESAURUS = None | |
<PARAMETER>THESAURUS</PARAMETER> specifies that a language-specific thesaurus is to be used in place of, More... | |
WORDCHARS = None | |
<PARAMETER>WORDCHARS</PARAMETER> specifies the characters other than alphabetic that may More... | |
![]() | |
COLLATION = None | |
The default collation value used for this data type. More... | |
CONTENT = None | |
XML element content "MIXED" for mixed="true" and "STRING" or "ESCAPE" for mixed="false". More... | |
DISPLAYLIST = None | |
Used for enumerated (multiple-choice) attributes. More... | |
ESCAPE = None | |
Controls the translate table used to escape content when CONTENT="MIXED" is specified. | |
JSONLISTPARAMETER = None | |
Used to specify the name of the parameter which contains the enumeration list for JSON values. More... | |
JSONTYPE = None | |
JSONTYPE is JSON type used for this datatype. | |
MAXLEN = None | |
The maximum number of characters the string can contain. More... | |
MINLEN = None | |
The minimum number of characters the string can contain. | |
PATTERN = None | |
A pattern which the string should match. More... | |
TRUNCATE = None | |
Determines whether to truncate the string to MAXLEN characters. | |
VALUELIST = None | |
Used for enumerated (multiple-choice) attributes. More... | |
XMLLISTPARAMETER = None | |
Used to specify the name of the parameter which contains the enumeration list for XML values. More... | |
XSDTYPE = None | |
Declares the XSD type used when projecting XML Schemas. | |
![]() | |
INDEXNULLMARKER = None | |
Override this parameter value to specify what value should be used as a null marker when a property of the type is used in a subscript of an index map. More... | |
See <CLASS>Text.Text</CLASS>
The <CLASS>Text.Japanese</CLASS> class implements (or calls) the Japanese language-specific stemming algorithm and initializes the language-specific list of noise words.
|
static |
Classifies the most common nTerms words in the current language as noise words.
The words specified
in <PARAMETER>NOISEWORDS100</PARAMETER>, <PARAMETER>NOISEWORDS200</PARAMETER>, and <PARAMETER>NOISEWORDS300</PARAMETER>, list the most common 300 words of the current language, in order of their frequency. Similarly, <PARAMETER>NOISEBIGRAMSn00</PARAMETER> lists the most common 300 bigrams of the current language that would not typically be considered useful for searching.
Reimplemented from Text.
|
static |
See <CLASS>Text.Text</CLASS>
The <CLASS>Text.Japanese</CLASS> class implements (or calls) the Japanese language-specific stemming algorithm and initializes the language-specific list of noise words.