One of the applications for which the document view mapping is designed is to allow the import of arbitrary XML into a content repository (another application is to provide a context in which XPath queries are more readable than they would be in system view, see 6.6.1 XPath over Document View). On import, the repository first checks if the incoming XML appears to be a system view document. If it does not then it is assumed to be in document view form, and the following occurs:
Namespace declarations in the incoming XML document that do not already exist in the repository namespace registry are added to the repository namespace registry.
Each XML element E becomes a content repository node of the same name, E.
The node type of the content repository node E is determined by the implementation in accordance with its policy on respecting property semantics (see 7.3.3 Respecting Property Semantics and 7.3.4 Determining Node Types).
Each child XML element C of XML element E becomes a content repository child node C of node E.
Each XML attribute A within an XML element E becomes a property A of content repository node E. The value of each XML attribute A becomes the value of the corresponding property A.
The type of each imported property is determined by the implementation in accordance with its policy on respecting property semantics (see 7.3.3 Respecting Property Semantics and 7.3.4 Determining Node Types).
Escape sequences representing non-XML-valid characters in element names and whitespace in attribute values may be encountered if the incoming XML stream is the product of an earlier document view export (see 6.4.2 Document View XML Mapping). In these cases, whether the escape sequences are decoded is left up to the implementation. Note that the predefined entity references &, <, >, ' and ", as well as all other entity and character references, must be decoded in any case, in accordance with the XML specification).
An implementation that respects node type information may be able to determine whether a particular attribute is intended to be a single or multi value property, and treat any spaces embedded in the value according (either as delimiters or as literal spaces). Implementations are also free to rely on other out-of-band information (such as any schema associated by the incoming XML) to help determine the intended interpretation of whitespace with a particular incoming attribute value.
Text within an XML element E becomes a STRING property called jcr:xmlcharacters of a node called jcr:xmltext, which itself becomes a child node of the node E.
If import is done through the ContentHandler returned by getImportContentHandler, the value of E/jcr:xmltext/jcr:xmlcharacters will be the character data passed to ContentHandler.characters. Data passed to ContentHandler.ignorableWhitespace is ignored. If import is done through importXML, pure whitespace between elements (that is, containing no non-whitespace characters) is ignored. However, whitespace leading, trailing and between non-whitespace characters is included in the text that is stored in E/jcr:xmltext/jcr:xmlcharacters.
An XML element can have a child element and an attribute with the same name while a content repository node cannot have a child node and property with the same name. For example, <a b="x"><b/></a> would imply a content repository node with one property called b and one child node also called b, which is not allowed. Therefore if such a fragment of XML is encountered on import it is an implementation issue as to how to deal with it.