IRISLIB database
URLParser Class Reference

Parses a url into component parts. More...

Inheritance diagram for URLParser:
Collaboration diagram for URLParser:

Static Public Member Functions

_.Library.String Compose (_.Library.String Components)
 Composes a url from its Components array as defined by the Decompose method.
 
 Decompose (_.Library.String Url, _.Library.String Components)
 Parses an url into its constituent components. More...
 
 Parse (_.Library.String Url, _.Library.String Components)
 WARNING: This method has been deprecated in favor of Decompose. More...
 

Additional Inherited Members

- Public Member Functions inherited from RegisteredObject
_.Library.Status OnAddToSaveSet (_.Library.Integer depth, _.Library.Integer insert, _.Library.Integer callcount)
 This callback method is invoked when the current object is added to the SaveSet,. More...
 
_.Library.Status OnClose ()
 This callback method is invoked by the <METHOD>Close</METHOD> method to. More...
 
_.Library.Status OnConstructClone (_.Library.RegisteredObject object, _.Library.Boolean deep, _.Library.String cloned)
 This callback method is invoked by the <METHOD>ConstructClone</METHOD> method to. More...
 
_.Library.Status OnNew ()
 This callback method is invoked by the <METHOD>New</METHOD> method to. More...
 
_.Library.Status OnValidateObject ()
 This callback method is invoked by the <METHOD>ValidateObject</METHOD> method to. More...
 
- Static Public Attributes inherited from RegisteredObject
 CAPTION = None
 Optional name used by the Form Wizard for a class when generating forms. More...
 
 JAVATYPE = None
 The Java type to be used when exported.
 
 PROPERTYVALIDATION = None
 This parameter controls the default validation behavior for the object. More...
 

Detailed Description

Parses a url into component parts.

Member Function Documentation

◆ Decompose()

Decompose ( _.Library.String  Url,
_.Library.String  Components 
)
static

Parses an url into its constituent components.

The output parameter 'Components' will contain an array subscripted by the name of the component part with the parsed value as the data. E.G. for a URL such as

http://www.intersystems.com

the Components array will contain the scheme in this form:- Components("scheme")="http"
URLs are described briefly below, please see RFC1738 for a full description

A URL contains the name of the scheme being used (<scheme>) followed by a colon and then a string (the <scheme-specific-part>) whose interpretation depends on the scheme.

Scheme names consist of a sequence of characters. The lower case letters "a"–"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http").

While the syntax for the rest of the URL may vary depending on the particular scheme selected, URL schemes that involve the direct use of an IP-based protocol to a specified host on the Internet use a common syntax for the scheme-specific data:
//<user>:<password>&lt;host>:<port>/<url-path>

Some or all of the parts "&lt;user&gt;:&lt;password&gt;@", ":&lt;password&gt;", ":&lt;port&gt;", and "/&lt;url-path&gt;" may be excluded.

The scheme specific data start with a double slash "//" to indicate that it complies with the common Internet scheme syntax. The different components obey the following rules:

  • user
    An optional user name. Some schemes (e.g., ftp) allow the specification of a user name.

  • password
    An optional password. If present, it follows the user name separated from it by a colon. The user name (and password), if present, are followed by a commercial at-sign "@". Within the user and password field, any ":", "@", or "/" must be encoded.

    Note that an empty user name or password is different than no user name or password; there is no way to specify a password without specifying a user name. E.g., <URL:ftp://@host.com/> has an empty user name and no password, <URL:ftp://host.com/> has no user name, while <URL:ftp://foo:@host.com/> has a user name of "foo" and an empty password.

  • host
    The fully qualified domain name of a network host, or its IP address as a set of four decimal digit groups separated by ".". Fully qualified domain names take the form as described in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123 [5]: a sequence of domain labels separated by ".", each domain label starting and ending with an alphanumerical character and possibly also containing "-" characters. The rightmost domain label will never start with a digit, though, which syntactically distinguishes all domain names from the IP addresses.

  • port
    The port number to connect to. Most schemes designate protocols that have a default port number. Another port number may optionally be supplied, in decimal, separated from the host by a colon. If the port is omitted, the colon is as well.

  • path
    The rest of the locator consists of data specific to the scheme, and is known as the "path". It supplies the details of how the specified resource can be accessed. The path syntax depends on the scheme being used, as does the manner in which it is interpreted.

◆ Parse()

Parse ( _.Library.String  Url,
_.Library.String  Components 
)
static

WARNING: This method has been deprecated in favor of Decompose.

Parses an url into its constituent components.

The output parameter 'Components' will contain an array subscripted by the name of the component part with the parsed value as the data. E.G. for a URL such as

http://www.intersystems.com

the Components array will contain the scheme in this form:- Components("scheme")="http"
URLs are described briefly below, please see RFC1738 for a full description

A URL contains the name of the scheme being used (<scheme>) followed by a colon and then a string (the <scheme-specific-part>) whose interpretation depends on the scheme.

Scheme names consist of a sequence of characters. The lower case letters "a"–"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http").

While the syntax for the rest of the URL may vary depending on the particular scheme selected, URL schemes that involve the direct use of an IP-based protocol to a specified host on the Internet use a common syntax for the scheme-specific data:
//<user>:<password>&lt;host>:<port>/<url-path>

Some or all of the parts "&lt;user&gt;:&lt;password&gt;@", ":&lt;password&gt;", ":&lt;port&gt;", and "/&lt;url-path&gt;" may be excluded.

The scheme specific data start with a double slash "//" to indicate that it complies with the common Internet scheme syntax. The different components obey the following rules:

  • user
    An optional user name. Some schemes (e.g., ftp) allow the specification of a user name.

  • password
    An optional password. If present, it follows the user name separated from it by a colon. The user name (and password), if present, are followed by a commercial at-sign "@". Within the user and password field, any ":", "@", or "/" must be encoded.

    Note that an empty user name or password is different than no user name or password; there is no way to specify a password without specifying a user name. E.g., <URL:ftp://@host.com/> has an empty user name and no password, <URL:ftp://host.com/> has no user name, while <URL:ftp://foo:@host.com/> has a user name of "foo" and an empty password.

  • host
    The fully qualified domain name of a network host, or its IP address as a set of four decimal digit groups separated by ".". Fully qualified domain names take the form as described in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123 [5]: a sequence of domain labels separated by ".", each domain label starting and ending with an alphanumerical character and possibly also containing "-" characters. The rightmost domain label will never start with a digit, though, which syntactically distinguishes all domain names from the IP addresses.

  • port
    The port number to connect to. Most schemes designate protocols that have a default port number. Another port number may optionally be supplied, in decimal, separated from the host by a colon. If the port is omitted, the colon is as well.

  • path
    The rest of the locator consists of data specific to the scheme, and is known as the "path". It supplies the details of how the specified resource can be accessed. Note that the "/" between the host (or port) and the path is NOT part of the path. The path syntax depends on the scheme being used, as does the manner in which it is interpreted.