Not every item name is a valid XML name. In particular, even though a content repository prefix is always a valid XML prefix, the content repository local name (the part after the colon, or the whole name, if there is no prefix) may not be a valid XML name. For example, a content repository name may contain spaces, whereas XML names cannot.
Consequently, for document view serialization, each content repository name is converted to a valid XML name (as defined by XML 1.0) by translating invalid characters into escaped numeric entity encodings5.
The escape character is the underscore (“_”). Any invalid character is escaped as _xHHHH_, where HHHH is the four-digit hexadecimal UTF-16 code for the character. When producing escape sequences the implementation should use lowercase letters for the hex digits a-f. When unescaping, however, both upper and lowercase alphabetic hexadecimal characters must be recognized.
Escaping and unescaping is done by parsing the name from left to right.
The underscore character (“_”), when appearing as literal, is itself escaped if it is followed by xHHHH where H is one of the following characters: 0123456789abcdefABCDEF.
So, for example,
“My Documents” is converted to “My_x0020_Documents”,
“My_Documents” is not encoded,
“My_x0020Documents” is not encoded either,
but “My_x0020_Documents” is encoded as “My_x005f_x0020_Documents”.