The Class <CLASS>Regex.Matcher</CLASS> creates an object that does pattern matching using regular expressions. More...
Public Member Functions | |
_.Library.Integer | EndGet (_.Library.Integer group) |
The EndGet method implements the <property>End</property> property. | |
_.Library.Integer | GroupCountGet () |
The GroupCountGet method implements the <property>GroupCount</property> property. | |
_.Library.String | GroupGet (_.Library.Integer group) |
The GroupGet method implements the <property>Group</property> property. | |
_.Library.Boolean | HitEndGet () |
The HitEndGet method implements the <property>HitEnd</property> property. | |
_.Library.Boolean | Locate (_.Library.Integer position) |
The method Locate finds a match for the regular expression. More... | |
_.Library.Boolean | LookingAt (_.Library.Integer position) |
The method LookingAt attempts to find a match in the property. More... | |
_.Library.Boolean | Match (_.Library.String text) |
The method Match returns true if the entire string <property>Text</property> is. More... | |
_.Library.Status | OperationLimitSet (limit) |
The OperationLimitSet method implements the side effects of doing a Set More... | |
_.Library.Status | PatternSet (_.Library.String pattern) |
The PatternSet method implements Set assignments to the. More... | |
_.Library.String | ReplaceAll (_.Library.String replacement) |
The method ReplaceAll returns a modified copy of the property. More... | |
_.Library.String | ReplaceFirst (_.Library.String replacement) |
The method ReplaceFirst returns a modified copy of the property. More... | |
_.Library.String | RequiredPrefixGet () |
The RequiredPrefixGet method implements the <property>RequiredPrefix</property> More... | |
ResetPosition (_.Library.Integer position) | |
The method ResetPosition resets any saved state from the previous. More... | |
_.Library.Integer | StartGet (_.Library.Integer group) |
The StartGet method implements the <property>Start</property> property. | |
_.Library.String | SubstituteIn (_.Library.String text) |
The method SubstituteIn returns the string that. More... | |
_.Library.Status | TextSet (_.Library.String text) |
The TextSet method implements Set assignments to the. More... | |
![]() | |
_.Library.Status | OnAddToSaveSet (_.Library.Integer depth, _.Library.Integer insert, _.Library.Integer callcount) |
This callback method is invoked when the current object is added to the SaveSet,. More... | |
_.Library.Status | OnClose () |
This callback method is invoked by the <METHOD>Close</METHOD> method to. More... | |
_.Library.Status | OnConstructClone (_.Library.RegisteredObject object, _.Library.Boolean deep, _.Library.String cloned) |
This callback method is invoked by the <METHOD>ConstructClone</METHOD> method to. More... | |
_.Library.Status | OnNew () |
This callback method is invoked by the <METHOD>New</METHOD> method to. More... | |
_.Library.Status | OnValidateObject () |
This callback method is invoked by the <METHOD>ValidateObject</METHOD> method to. More... | |
Static Public Member Functions | |
_.Library.Status | LastStatus () |
The class method LastStatus returns the <class>Status</class> More... | |
![]() | |
_.Library.String | Help (_.Library.String method) |
This is a helper class that is used by the various SYSTEM classes to provide a Help method. More... | |
Public Attributes | |
End | |
The property End without a subscript contains the character. More... | |
Group | |
The property Group without a subscript contains the. More... | |
GroupCount | |
The property GroupCount contains the number of capturing groups. More... | |
HitEnd | |
The property HitEnd is true if the most recent matching. More... | |
OperationLimit | |
The property OperationLimit provides a way to limit the time taken. More... | |
Pattern | |
The property Pattern is the string representation of the regular. More... | |
RequiredPrefix | |
The property RequiredPrefix contains a string which, if nonempty, is. More... | |
Start | |
The property Start without a subscript contains the character. More... | |
Status | |
The property Status contains a <class>Status</class> value which may provide more. More... | |
Text | |
The property Text is the string to which the regular expression. More... | |
Private Attributes | |
__PreviousMatchEnd | |
PreviousMatchEnd is the End value of the previous match. More... | |
Additional Inherited Members | |
![]() | |
CAPTION = None | |
Optional name used by the Form Wizard for a class when generating forms. More... | |
JAVATYPE = None | |
The Java type to be used when exported. | |
PROPERTYVALIDATION = None | |
This parameter controls the default validation behavior for the object. More... | |
The Class <CLASS>Regex.Matcher</CLASS> creates an object that does pattern matching using regular expressions.
The regular expressions come from the International Components for Unicode (ICU). The ICU maintains web pages at https://icu.unicode.org.
The definition and features of the ICU regular expression package can be found in https://unicode-org.github.io/icu/userguide/strings/regexp.html.
On most platforms, installing InterSystems IRIS will also install an appropiate version of the ICU libraries. On platforms that do not have an ICU library available, evaluating any regular expression function or method will result in an <UNIMPLEMENTED> error.
A Regex.Matcher object can be created by evaluating
##class(Regex.Matcher).New(pattern) or
##class(Regex.Matcher).New(pattern,text).
The first parameter to <method>New</method> becomes the inital value of the property <property>Pattern</property>. The optional, second parameter to <method>New</method> become the inital value of the property <property>Text</property>. Setting property <property>Pattern</property> to a regular expression pattern string causes that regular expression pattern to be compiled into a Matcher object where it can be used to do multiple matching operations without being recompiled. The property <property>Text</property> contains the subject text string that is searched by a regular expressions match. Note that an empty string is considered to be an illegal regular expression so the first parameter to <method>New</method> cannot be missing nor be the empty string.
If x is a <CLASS>Regex.Matcher</CLASS> object then the built-in method <method>ConstructClone</method> can be used to copy x ( Set xnew = x.ConstructClone() ) . The state of the most recent match and any error value in the <property>Status</property> property are not cloned. The <method>ConstructClone</method> method can be faster than creating a new Matcher with the same Pattern. The <method>ConstructClone</method> method can just copy instructions for the matching engine rather than recompiling the original pattern string. On 8-bit systems <method>ConstructClone</method> can just copy the Unicode versions of the Pattern and Text properties without need to do the character-by-character conversion from the NLS 8-bit character set into Unicode.
None of the methods or operations in the <CLASS>Regex.Matcher</CLASS> package return a <class>Status</class> value. When an error is detected, these operations always throw the system exception thrown by the kernel code that interfaces to the ICU library. If a program wants to recover from a regular expression error then it is recommended that the code doing regular expression operations be surrounded with a TRY {...} block and that the error recovery be done in the corresponding CATCH {...} block. Note that a TRY block imposes no run-time performance overhead in situations where no error occurs.
The methods and operations in a <CLASS>Regex.Matcher</CLASS> object will catch any <REGULAR EXPRESSION> system error and will generate a <class>Status</class> value that may better describe that error. That <class>Status</class> value will be stored in the <property>Status</property> property of the <CLASS>Regex.Matcher</CLASS> object and in the variable objlasterror. After saving the <class>Status</class> value, the original unmodified <REGULAR EXPRESSION> system exception will be rethrown. You may examine that <class>Status</class> value by executing the following InterSystems IRIS Object Script command:
do $system.Status.DisplayError(objlasterror)
Some other system errors, like <STRING STACK>, are passed through the <CLASS>Regex.Matcher</CLASS> methods without modification.
Note that some ICU operation errors are not considered errors by the <CLASS>Regex.Matcher</CLASS> package. Examples are evaluating the <property>Start</property> and <property>End</property> properties when the previous matching operation failed. In these cases <property>Start</property> and <property>End</property> have value -2 as a character position rather than throwing an error.
Examples:
Regular expression that finds titles M., Mr., Mrs. and Ms. in a string: "\bMr?s?\."
"\b" matches a break at the beginning (or ending) of a word
"M" matches an upper-case letter-M
"r?" matches 0 or 1 occurences of a lower-case letter-r
"s?" matches 0 or 1 occurences of a lower-case letter-s
"\." matches a period character
USER>set matcher=##class(Regex.Matcher).New("\bMr?s?\.") USER>set matcher.Text="Mrs. Sally Jones, Mr. Mike McMurry, Ms. Amy Johnson, M. Maurice LaFrance" USER>while matcher.Locate() {write "Found ",matcher.Group," at position ",matcher.Start,!} Found Mrs. at position 1 Found Mr. at position 19 Found Ms. at position 37 Found M. at position 54 USER>write matcher.ReplaceAll("Dr.") Dr. Sally Jones, Dr. Mike McMurry, Dr. Amy Johnson, Dr. Maurice LaFrance USER>write matcher.ReplaceFirst("Dr.") Dr. Sally Jones, Mr. Mike McMurry, Ms. Amy Johnson, M. Maurice LaFrance
Regular expression that matches phone numbers of the form "(aaa) bbb-cccc" or of the form "aaa-bbb-ccc": (((\d{3}))\s*|(\d{3})-)(\d{3})-(\d{4})
(((\d{3}))\s*|(\d{3})-) matches either prefix "(aaa) " or prefix "aaa-". The outer parentheses capture this entire prefix as Group(1) and limits the range of the two prefix subpatterns in alternation by the | operator.
((\d{3}))\s* matches prefix "(aaa) "
( and ) and \s* match "(" and ")" and zero or more spaces, respectively
\d{3} matches exactly 3 digits
(\d{3}) the parentheses capture these 3 digits as Group(2)
(\d{3})- matches prefix "aaa-"
this "break" allows no other digit or letter immediately before the 3 digits
(\d{3}) captures these 3 digits as Group(3)
(\d{3})- matches "bbb-" and captures these 3 digits as Group(4)
(\d{4}) matches "cccc" and captures these 4 digits as Group(5)
this final "break" makes sure the match is not immediately followed by another digit or a letter
ListPhones(s,a) PUBLIC { ; a is a reference variable. On return ; a contains the number of phone numbers in string s ; a(i) contains just the digits of the i'th phone number kill a set a = 0 set m=##class(Regex.Matcher).New("(\((\d{3})\)\s*|\b(\d{3})-)(\d{3})-(\d{4})\b") set m.Text = s while m.Locate() { ; Get first three digits from Group(2) or Group(3) if m.Start(2)>0 { set n=m.Group(2) } else { set n=m.Group(3) } ; Concatenate middle 3 digits and final 4 digits set n = n_m.Group(4) _ m.Group(5) ; Insert digit string into array a set a($increment(a)) = n } } ListPhones2(s,a) PUBLIC { ; a is a reference variable. On return ; a contains the number of phone numbers in string s ; a(i) is i'th phone number formatted as "(aaa)bbb-cccc" ; Note, no blank after "(aaa)" kill a set a = 0 set m=##class(Regex.Matcher).New("(\((\d{3})\)\s*|\b(\d{3})-)(\d{3})-(\d{4})\b") set m.Text = s while m.Locate() { ; Digits are concatentation of Capture groups 2,3,4,5 ; One of group 2 or 3 is the empty string when group is not used set a($increment(a)) = m.SubstituteIn("($2$3)$4-$5") } } USER>write ^t2 Call 617-555-1212 about item number 61773-333-4569 USER>do ListPhones^ListPhones(^t2,.a) USER>zwrite a a=1 a(1)=617555121 USER>write ^t3 Phone (212) 334-5397, (321)770-2121 and 603-646-0110 USER>do ListPhones^ListPhones(^t3,.a) USER>zwrite a a=3 a(1)=2123345397 a(2)=3217702121 a(3)=6036460110 USER>write ^t3 Phone (212) 334-5397, (321)770-2121 and 603-646-0110 USER>do ListPhones2^ListPhones(^t3,.a) USER>zwrite a a=3 a(1)="(212)334-5397" a(2)="(321)770-2121" a(3)="(603)646-0110"
<br<blockquote>
|
static |
The class method LastStatus returns the <class>Status</class>
value containing additional details about the most recent <REGULAR EXPRESSION> system error. If a <class>Regex.Matcher</class> object encounters a <REGULAR EXPRESSION> error then this status is already available in the <property>Status</property> property of the object. Executing
Do $SYSTEM.Status.DisplayError(##class(Regex.Matcher).LastStatus())
is useful when debugging a <REGULAR EXPRESSION> error following a call on $MATCH, $LOCATE or ##class(Regex.Matcher).New(x) where a <class>Regex.Matcher</class> oref value is not available.
_.Library.Boolean Locate | ( | _.Library.Integer | position | ) |
The method Locate finds a match for the regular expression.
<property>Pattern</property> in the text string <property>Text</property>.
If the optional argument position is defined as an integer 1 or greater then the search for a match begins at that character position of <property>Text</property>.
If the argument position is not defined then the search for the match begins the character position following the previous match.
Locate returns 1 if the match is found; 0 otherwise.
_.Library.Boolean LookingAt | ( | _.Library.Integer | position | ) |
The method LookingAt attempts to find a match in the property.
<property>Text</property> that must start at a particular character position. The match need not extend to the end of <property>Text</property>.
The argument position gives starting character position of the attempted match.
LookingAt returns 1 if the match is found; 0 otherwise.
_.Library.Boolean Match | ( | _.Library.String | text | ) |
The method Match returns true if the entire string <property>Text</property> is.
matched by <property>Pattern</property>; it returns false if it does not match.
The argument text is optional. If the argument text is defined then the property <property>Text</property> is set to its value before the match is executed.
_.Library.Status OperationLimitSet | ( | limit | ) |
The OperationLimitSet method implements the side effects of doing a Set
assignment to change the value of the <property>OperationLimit</property> property.
_.Library.Status PatternSet | ( | _.Library.String | pattern | ) |
The PatternSet method implements Set assignments to the.
<property>Pattern</property> property.
_.Library.String ReplaceAll | ( | _.Library.String | replacement | ) |
The method ReplaceAll returns a modified copy of the property.
<property>Text</property>. It replaces every substring of <property>Text</property> that matches the <property>Pattern</property> with a replacement string. Portions of <property>Text</property> that are not matched are copied without change. The value of ReplaceAll is the resulting string. The property <property>Text</property> is not modified.
The argument replacement supplies the string to replace each matched region. The replacement string may contain references to capture groups which take the form of $1, $2, etc. The replacement string may reference the entire matched region with $0.
_.Library.String ReplaceFirst | ( | _.Library.String | replacement | ) |
The method ReplaceFirst returns a modified copy of the property.
<property>Text</property>. It replaces the first substring of <property>Text</property> that matches the <property>Pattern</property> with a replacement string. Portions of <property>Text</property> that are not matched are copied without change. The value of ReplaceFirst is the resulting string. The property <property>Text</property> is not modified.
The argument replacement supplies the string to replace the matched region. The replacement string may contain references to capture groups which take the form of $1, $2, etc. The replacement string may reference the entire matched region with $0.
_.Library.String RequiredPrefixGet | ( | ) |
The RequiredPrefixGet method implements the <property>RequiredPrefix</property>
property.
ResetPosition | ( | _.Library.Integer | position | ) |
The method ResetPosition resets any saved state from the previous.
match. It also causes the next call to the method <method>Locate</method>() without an argument to begin at the specified character position.
The argument position is the character position from which the next call to <method>Locate</method>() without an argument will begin match attempts.
_.Library.String SubstituteIn | ( | _.Library.String | text | ) |
The method SubstituteIn returns the string that.
results from substituting capturing groups from the most recent regular expression match into components of the argument <property>Text</property>. This method is undefined if the most recent regular expression match operation was not successful.
This method can be used as a low level step in regular expression replacement. It does not modify the property <property>Text</property>. For example, the method ..<method>ReplaceFirst</method>(x) is equivalent to:
Quit:'..Locate(1) ..Text Quit $Extract(..Text,1,..Start-1)_..SubstituteIn(x)_ $Extract(..Text,..End,*)
The argument Text supplies the string that will be modified by the matched region and then returned. The string may contain references to capture groups which take the form of $1, $2, etc. The string may reference the entire matched region with $0.
_.Library.Status TextSet | ( | _.Library.String | text | ) |
The TextSet method implements Set assignments to the.
<property>Text</property> property.
End |
The property End without a subscript contains the character.
position in property <property>Text</property> one beyond of the final character of the string found by the last match.
The value of End(i) when subscripted with an integer i between 1 and <property>GroupCount</property> is the character position one beyond the of the last character of the last string successfully captured by capture group i.
The value of End(i) is -1 if capture group i did not participate in the last match. The values of End and End(i) are -2 if the last match attempt failed.
Note: In addition to integer subscripts between 1 and <property>GroupCount</property>, the value of End(0) is identical to the value of End without a subscript. When the property End(...) is subscripted with values not described above then the attempt to evaluate the property End(...) is undefined.
Group |
The property Group without a subscript contains the.
string found by the last match.
The value of Group(i) when subscripted with an integer i between 1 and <property>GroupCount</property> is the last string successfully captured by capture group i.
If the last match operation was unsuccessful or if the specified capture group was not used during the last match operation then Group and Group(i) contain the empty string. Note that <property>End</property> and <property>End</property>(i) have negative values when the last match operation did not use the specified capture group or did not succeed in matching.
Note: In addition to integer subscripts between 1 and <property>GroupCount</property>, the value of Group(0) is identical to the value of Group without a subscript. When the property Group(...) is subscripted with values not described above then the attempt to evaluate the property Group(...) is undefined.
GroupCount |
The property GroupCount contains the number of capturing groups.
in the regular expression <property>Pattern</property>.
HitEnd |
OperationLimit |
The property OperationLimit provides a way to limit the time taken.
by a regular expression match. The default value for OperationLimit is 0 which indicates that there is no limit. Setting OperationLimit to a positive integer will cause a match operation to signal a TimeOut error after the specified number of clusters of steps by the match engine.
Correspondence with actual processor time will depend on the speed of the processor and the details of the specific pattern, but cluster size is chosen such each cluster's execution time will typically be on the order of milliseconds.
Pattern |
The property Pattern is the string representation of the regular.
expression of the Matcher. Assigning to Pattern resets all saved state concerning the last matching operation.
On an installation using an NLS 8-bit character set different from Latin-1 then you you must be careful with patterns using a character class of the form [x-y] where x or y are national usage characters not in Latin-1. All regular expression matching is done in Unicode so characters x and y are converted Unicode. The character class [x-y] reprsents all characters between the Unicode translations of x and y and not the NLS 8-bit characters between x and y.
|
private |
PreviousMatchEnd is the End value of the previous match.
It has
value -1 if there is no current match and value 1 if there is a current match but no previous match.
RequiredPrefix |
The property RequiredPrefix contains a string which, if nonempty, is.
a sequence of characters which must occur at the start of any string which matches the <property>Pattern</property>. A nonempty RequiredPrefix can be used to search a long string for a favorable position to start a Regular Expression matching operation.
In many cases the heuristics used by the ICU library to determine the RequiredPrefix do not include all possible characters of such a prefix. When a prefix cannot be determined, RequiredPrefix will contain the empty string. RequiredPrefix will also contain the empty string if the ICU library used by InterSystems IRIS does not support the RequiredPrefix feature.
Start |
The property Start without a subscript contains the character.
position in property <property>Text</property> of the first character of the string found by the last match. If the matched string is the empty string then Start is the character position one beyond where the empty string was located (and the property Start equals the property <property>End</property>.)
The value of Start(i) when subscripted with an integer i between 1 and <property>GroupCount</property> is the character position of the first character of the last string successfully captured by capture group i. If the captured string is the empty string then Start(i) is the character position one beyond where the empty string that was captured (and the property Start(i) equals the property <property>End</property>(i).)
The value of Start(i) is -1 if capture group i did not participate in the last match. The values of Start and Start(i) are -2 if the last match attempt failed.
Note: In addition to integer subscripts between 1 and <property>GroupCount</property>, the value of Start(0) is identical to the value of Start without a subscript. When the property Start(...) is subscripted with values not described above then the attempt to evaluate the property Start(...) is undefined.
Status |
The property Status contains a <class>Status</class> value which may provide more.
information about the last System exception thrown by this object. It is initially $$$OK. Its value remains unchanged by any successful operation. The Status property is changed only when an error is thrown the kernel functions implementing <class>Regex.Matcher</class> or by a COS Set assignment to the Status property done by the user.
Text |
The property Text is the string to which the regular expression.
will be applied. Assigning to Text resets all saved state resulting from the most recent match operation. On installations using an 8-bit character code, the internal representation of Text is converted to Unicode. Therefore, on an installation using 8-bit characters the maximum length of the Text property is only half the maximum string length supported by that installation.