XSLT and XPath function reference in alphabetical order

(Excerpt from “XSLT 2.0 & XPath 2.0” by Frank Bongers, chapter 5, translated from German)

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

fn:escape-html-uri

Category:

String functions – analysis and manipulation

Origin:

XPath 2.0

Return value:

A xs:string string which corresponds to the input string with special characters escaped according to RFC 2732.

Call/Arguments:

fn:escape-html-uri($uri-string)

$uri-string:

A xs:string string which is interpreted as URI string or as part of a URI string and in which included special characters shall be escaped according to the rules for URI escaping. If the empty sequence is passed on to the function, it returns the empty string.

Purpose of use:

The fn:escape-html-uri() function applies the rules for URI escaping according to RFC 2396 to the xs:string string passed on to it. This shall ensure that the string passed on representing a URI string or the substring of a URI string constitutes after the transformation a valid URI string without characters being forbidden in this context. Excluded from the escaping are those chracters belonging to the printable ASCII characters.

Special characters included in the input string are replaced by escape sequences encoded in hexadicaml in the form %xx (x stands for a hexadecimal digit). The replace mode refers to the character groups explained below and defined in RFC 2396.

The following characters must be escaped: space characters contained in the URI string (SPC) and further characters which have a separator function, are an error source because of varying interpretations between applications and platforms or which are used in the context of the protocols of Internet gateways.

CharacterMeaningUnicodeCharacterMeaningUnicode
space characterU+0020{left curly bracketU+007B
< left angle bracketU+003C| pipeU+007C
> right angle bracketU+003E}right curly bracketU+007D
# hashU+0023\backslashU+005C
% percent signU+0025^circumflexU+005E
" quotation markU+0022[left square bracketU+005B
` graveU+0060]right square bracketU+005D

Table: characters to be escaped according to RFC 2396

In general, the hexadecimal escape sequences are encoded in upper case letters. The fn:escape-html-uri() function proceeds accordingly, also in order to make a reliable string comparison of URI strings with escaped characters possible.

The escape sequence consists of the last two positions of the Unicode indication!
In contrast to character references, the two leading zeros preceding the abovementioned last positions are not optional, but forbidden at this position! As escape sequence, only the last two positions of the Unicode indications are used: %3C for U-003C (left angle bracket).

Two groups of characters:

In RFC 2396 and in RFC 2732 (for IPv6) those characters are listed which should be treated separately with regard to URI strings or which under certain circumstances have to be escaped. These are referred to as »reserved« characters. As the counterpart, a further group is to mention, the »unreserved« characters.

Unreserved characters:

The unreserved characters include the alphanumeric characters (alphanum) which means all letters and digits of the ASCII character set as well as the characters listed in the following table: the marking characters (marks).

The marking characters:

CharacterMeaningUnicodeCharacterMeaningUnicode
- hyphenU+002D~tildeU+007E
_ underscoreU+005F* asteriskU+002A
. full stop/periodU+002E(left parenthesisU+0028
' apostropheU+0027)right parenthesisU+0029
! exclamation markU+0021%percent signU+0025

Table: the marking characters according to RFC 2396

Generally, the unreserved characters do not have to be escaped, but it is not forbidden. The syntax of the link will not be changed when doing so.

Caution when dealing with characters having identical glyphs!
It should be noted that there are many characters of different meanings represented with optically identical glyphs; for example, for the apostrophe (inverted comma) the left and right single quotation marks (U+2018 and U+2019) as well as the straight quotation marks or primes (U+2032 and U+2033). Similar ambiguities exist for the hyphen-minus.

Reserved characters:

The reserved characters have a special significance insofar as they can act as separator (delimiter) between functional components of an URI. So, the question mark initiates a query string and the colon a port identifier.

Used in this meaning, the reserved characters must not be escaped. This corresponds to the behaviour of fn:escape-html-uri().

Remark: escaping of reserved characters
The use of the reserved characters in their lexical meaning in parts of URI strings is indeed also permitted (therefore the overlapping with the characters to be escaped). In this case, the escaping of the characters is required. Since this is in general not performed by the fn:escape-html-uri() function, this can only be achived with the help of the fn:encode-for-uri() function.

CharacterMeaningUnicodeCharacterMeaningUnicode
; semicolonU+003B+plusU+002B
/ slash (solidus)U+0037$ dollar signU+0024
? quesition markU+003F,commaU+002C
: colonU+003A#hashU+0023
@ commercial ATU+0040[left square bracketU+005B
& ampersandU+0026]right square bracketU+005D
= equals signU+003D

Table: the reserved characters according to RFC 2396

Forbidden characters:

In URI strings generally prohibited is a third group of characters: In the first place the non-printable characters (control characters), which means the first 33 ASCII characters from U+0000 to U+0032, as well as the character U+007F (DEL). However, in XML documents they are – except for TAB, LF and CR – also not permitted.

Example – escaping a non-printable character:

fn:escape-html-uri("http://www.example.org/~my account"

results in: "http://www.example.org/~my%20account".

The space character contained in the original URI string is escaped with %20 because it does not belong to the printable characters.

Function definition:

XPath 1.0:

The function is not available.

XPath 2.0:

fn:escape-html-uri($uri as xs:string?) as xs:string

   

<< back next >>

 

 

 


Copyright © Galileo Press, Bonn 2008
Printing of the online version is permitted exclusively for private use. Otherwise this chapter from the book "XSLT 2.0 & XPath 2.0" is subject to the same provisions as those applicable for the hardcover edition: The work including all its components is protected by copyright. All rights reserved, including reproduction, translation, microfilming as well as storage and processing in electronic systems.


Galileo Press, Rheinwerkallee 4, 53227 Bonn, Germany