JAXP Compatibility Guide
for the J2SE 6 Platform

Contents

Introduction

The J2SE 1.4 platform included the Crimson reference implementation for JAXP 1.1. The J2SE 6 platform includes a reference implementation for JAXP 1.4 based on the Apache Xerces library.

Because these implementations come from entirely different codebases, and because the JAXP standard has evolved from 1.1 to 1.4, there are some subtle differences between the implementations, even they both conform to the JAXP standard. These two factors combine to create the compatibility issues described in this guide.

What's New

See Release Notes.

DOM Level 3

While the reference implementation in J2SE 1.4 supported the DOM Level 2 API, the implementation in J2SE 6 supports the DOM Level 3 family of APIs. This section covers the impact of those changes on programs that used the JAXP 1.1 reference implementation:

For more information, see the complete list of changes in the DOM Level 3 Changes appendix.

Methods added to DOM interfaces

In DOM level 3, additional methods were defined in the following interfaces:

  • Attr

  • Document

  • DOMImplementation

  • Element

  • Entity

  • Node

  • Text

The added methods only affect applications that implement the interfaces directly, and only when the application is recompiled. Applications that use the factory methods to obtain implementation classes for these interfaces will have no problems.

Preserving the XML format

These changes affect an application that reads in XML data into a DOM, makes modifications, and then writes it out in a way that preserves the original formatting.

In JAXP 1.1, extraneous whitespace was automatically removed on input, and a single property (ignoringLexicalInfo) was set to false to preserve entity nodes and CDATA nodes, for example. Including the additional nodes made the DOM somewhat more complex to process, but because they were there, adding whitespace output (indentation and newlines) produced highly readable, formatted version of the XML data which closely approximated the input.

In JAXP 1.4, there are four APIs that the application uses to determine how much lexical (formatting) information is available to process, using the following DocumentBuilderFactory methods:

  • setCoalescing()
    To convert CDATA nodes to Text node and append
    to an adjacent Text node (if any).

  • setExpandEntityReferences()
    To expand entity reference nodes.

  • setIgnoringComments()
    To ignore comments.

  • setIgnoringElementContentWhitespace()
    To ignore whitespace that is not a significant part
    of element content.

The default values for all of these properties is false, which preserves all the lexical information necessary to reconstruct the incoming document in its original form. Setting them all to true lets you construct the simplest possible DOM, so the application can focus on the data's semantic content, without having to worry about lexical syntax details.

Note:
When adding new nodes, the application must add any indentation and newline formatting that is needed for readability, since it is not provided automatically.

SAX 2.0.2

Following are the changes made between SAX 2.0.0 and SAX 2.0.2 that might affect compatibility.

  • DeclHandler.externalEntityDecl now requires the parser to return the absolute system identifier for consistency with DTDHandler.unparsedEntityDecl.  This may cause some incompatibilities.

  • In SAX 2.0.1, an application can set ErrorHandler, EntityResolver, ContentHandler, or DTDHandler to null. This is a relaxation of the previous restriction in SAX 2.0, which generated a NullPointerException (NPE) in such circumstances.

    So the following code is legal in JAXP 1.3:

    SAXParserFactory spf = SAXParserFactory.newInstance();
    SAXParser sp = spf.newSAXParser();
    XMLReader reader = sp.getXMLReader();
    reader.setErrorHandler(null);
    reader.setContentHandler(null);
    reader.setEntityResolver(null);
    reader.setDTDHandler(null);
  • The resolveEntity() method in the EntityResolver API now throws IOException, as well as SAXException. (Before, it only threw SAXException.)

    The vast majority of applications are unaffected by this change, because the DefaultHandler implementation class has been modified to declare the additional exception, and very few applications use the DefaultHandler in such a way that they will run into a problem.

    The only way an application can be affected is if it overrides the resolveEntity() method and also invokes super.resolveEntity(). In that case, the application won't compile in J2SE 5 until the method is modified to handle the IOExceptions that super.resolveEntity() could throw.

  • The following new features are recognized :

    • http://xml.org/sax/features/external-general-entities
      To include external general entities.

    • http://xml.org/sax/features/external-parameter-entities
      To include external parameter entities and the external DTD subset. 

    and the following new property:

    • http://xml.org/sax/properties/xml-string
      To get the string of characters associated with the current event.

    For a complete list of Xerces features and properties, see http://xml.apache.org/xerces2-j/features.html and http://xml.apache.org/xerces2-j/properties.html.

Note:
One point of compatibility is also worth mentioning. Namespace recognition was turned off by default in J2SE 1.4 (JAXP 1.1). For backward compatibility, that policy is continued in J2SE 6 (JAXP 1.4). However, namespace recognition is turned on by default in the official SAX implementation at www.saxproject.org. While not strictly a compatibility issue from the standpoint of JAXP, it is an issue that sometimes comes as a surprise.

Using XSLT

Code that uses the standard JAXP APIs to create and access an XSL transformer does not need to be changed. The output will be the same, but will in general be produced much faster, since the XSLTC compiling transformer will be used by default, instead of the interpreting Xalan transformer.

Note:
There may be no significant difference between Xalan and XSLTC performance for a single run on a small data set, as when you are developing and testing an XSL stylesheet. But there is a major performance benefit when using XSLTC on anything larger.

Programmatic Access to Xalan XPath

JAXP 1.4 provides the standard XPath API for evaluating XPath expressions. We encourage users to use this API. Xalan-interpretive is not included in the reference implementation. If an application explicitly uses the Xalan XPath API to evaluate a standalone XPath expression (one that is not part of an XSLT stylesheet), you'll need to download and install the Apache libraries for Xalan.

StAX Optional Features

The StAX API contains a number of optional features. Below are some of implementation details in the RI.

  • XMLInputFactory's createXMLEventReader(Source source) and createXMLStreamReader(Source source) methods are not supported.

  • XMLOutputFactory's createXMLEventWriter(Result result) and createXMLStreamWriter(Result result) support StreamResult, DOMResult and StAXResult. Note that SAXResult is not supported.

Package Name Changes

This change does not affect applications that confine themselves to using the standard JAXP APIs. But applications that access implementation-specific features of the XML processors defined in JAXP versions prior to 1.3 will have to be modified.

The change has several effects on previous applications:

  1. The property-values that were used to access the internal implementations must be changed.

  2. Applications that used internal APIs that from the Xalan implementation classes must change the import statements that gave them access to those APIs.

  3. Applications that used internal APIs from the Crimson implementation must be re-coded -- ideally, by using newer JAXP APIs or, if necessary, by using Xerces APIs.

What Changed, and Why

In J2SE 1.4, the fact that JAXP was built into the Java platform was a mixed blessing. On the one hand, an application could rely on that fact that it was there. On the other, most applications needed features and bug fixes that were available in later versions.

But adding new libraries had no effect, because internal classes always take precedence over the classpath. The solution for that problem in 1.4 was to use the endorsed standards mechanism. However, that was a new mechanism, and one which frequently placed an additional burden on the end user, as well as the application developer.

The solution since JAXP 1.3 is to change the package names of the Apache libraries used in the implementation. That change lets a user reference newer Apache libraries in the classpath, so application developers can use them in the same way that would use any other additions to the Java platform.

The new names given to the Apache packages in the JAXP 1.3 reference implementation are shown below:

 

JAXP 1.1

Since JAXP 1.3

JAXP

org.apache.crimson

-/-
com.sun.org.apache.xerces.internal

 

org.apache.xml

com.sun.org.apache.xml.internal

XSLT

org.apache.xalan
org.apache.xpath
org.apache.xalan.xsltc

com.sun.org.apache.xalan.internal
com.sun.org.apache.xpath.internal
com.sun.org.apache.xalan.internal.xsltc

Feature for Secure Processing

JAXP 1.3 included a new secure processing feature which instructs JAXP components such as parsers, transformers and so on to behave in a secure fashion. Described below are the current implementation of this feature. Refer to the table at the end of this section for the default behaviors among different factories. Note that when Security Manager is present, all of the listed features are turned on automatically (since 1.4.3).
  • Limitations on DOM and SAX Parsers

    In JAXP 1.3, when the secure processing feature is on, default limitations are set for DOM and SAX parsers. These limits are:
    entityExpansionLimit = 64,000
    elementAttributeLimit = 10,000

    In JAXP 1.4, the secure processing feature is turned on by default. In addition to the above limits, a new maxOccur limit is added to the validating parser. The limit is:
    maxOccur = 5,000

    The feature can be turned off by calling the setFeature method on factories. The following code turns the feature off for SAX Parser:

                    
                    SAXParserFactory spf = SAXParserFactory.newInstance();
                    spf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING,false);
                    
    Details about entityExpansionLimit and elementAttributeLimit are stated below. For maxOccur, please refer to the JAXP 1.4 Release Notes.
  • XPath and XSLT extension functions

    The extension functions are disabled when the secure processing feature is on. In both JAXP 1.3 and 1.4, the secure processing feature is off by default. To turn on the feature and disable extension functions, use the following code:

                    
                    XPathFactory xpf = xPathFactory.newInstance();
                    xpf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, false);
                    
    Or
    TransformerFactory tf = TransformerFactory.newInstance(); tf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
  • Security Issue Posed by Nested Entity Definitions

    While XML does not allow recursive entity definitions, it does permit nested entity definitions, which produces the potential for Denial of Service attacks on a server which accepts XML data from external sources. For example, a SOAP document like the following that has very deeply nested entity definitions can consume 100% of CPU time and large amounts of memory in entity expansions:

    <?xml version="1.0" encoding ="UTF-8"?>
                 <!DOCTYPE foobar[
                 <!ENTITY x100 "foobar">
                 <!ENTITY  x99 "&x100;&x100;">
                 <!ENTITY  x98 "&x99;&x99;">
                 ...
                 <!ENTITY   x2 "&x3;&x3;">
                 <!ENTITY   x1 "&x2;&x2;">
                 ]>
                <SOAP-ENV:Envelope xmlns:SOAP-ENV=...>
                <SOAP-ENV:Body>
                <ns1:aaa xmlns:ns1="urn:aaa" SOAP-ENV:encodingStyle="...">
                <foobar xsi:type="xsd:string">&x1;</foobar>
                </ns1:aaa>
                </SOAP-ENV:Body>
                    </SOAP-ENV:Envelope> 

    A system that doesn't take in external XML data need not be concerned with the issue, but one that does should turn on the secure processing feature and reset the limits as described below.

  • Entity Expansion Limit

    The number of entity expansions can be limited by either setting the system property entityExpansionLimit or by setting the parser property http://apache.org/xml/properties/entity-expansion-limit. Both properties accept a java.lang.Integer. The parser throws a fatal error once it has reached the entity expansion limit. By default, entityExpansionLimit is set to 64,000.

    To set the entity expansion limit to 10,000 using the system property, use an option like the following on the java command line: -DentityExpansionLimit=10000 or System.setProperty("entityExpansionLimit","10000");

    To set the entity expansion limit using the parser property http://apache.org/xml/properties/entity-expansion-limit, use code similar to the following:

            DocumentBuilderFactory dfactory = DocumentBuilderFactory.newInstance();
            dfactory.setAttribute("http://apache.org/xml/properties/entity-expansion-limit", new Integer("10000"));
            DocumentBuilder docBuilder = dbFactory.newDocumentBuilder();
  • Element Attribute Limit

    The number of attributes in an element can be limited by either setting the system property elementAttributeLimit or by setting the http://apache.org/xml/properties/elementAttributeLimit property on the parser. Both properties accept Integer values. By default elementAttributeLimit is set to 10,000. When the parser property, http://apache.org/xml/properties/elementAttributeLimit is set, it overrides the system property. The parser throws a fatal error if the number of attributes in a element exceeds the limit.

    To set the Element Attribute Limit to 20 using the system property, use an option like the following on the java command line: -DelementAttributeLimit=20 or System.setProperty("elementAttributeLimit","20");

    To set the Element Attribute Limit using the parser property http://apache.org/xml/properties/entity-expansion-limit, use code similar to the following:

            DocumentBuilderFactory dfactory = DocumentBuilderFactory.newInstance();
            dfactory.setAttribute("http://apache.org/xml/properties/elementAttributeLimit", new Integer(20));
            DocumentBuilder docBuilder = dbFactory.newDocumentBuilder();
  • Parser property to disallow DTDs

    The above security risk can also be mitigated by setting the http://apache.org/xml/features/disallow-doctype-decl parser property to true. A fatal error is then thrown if the incoming XML document contains a DOCTYPE declaration. (The default value for this property is false.) This property is typically useful for SOAP based applications where a SOAP message must not contain a Document Type Declaration.

  • Secure processing using StAX

    StAX supports similar SupportDTD property that can be used to disable DTD processing. This is done by using the setProperty method on XMLInputFactory:

                    XMLInputFactory xif = XMLInputFactory.newInstance();
                    xif.setProperty(XMLInputFactory.SUPPORT_DTD, Boolean.FALSE);
                    


Default Status for the Feature for Secure Processing


FactoryDefaultLimit(s) when the feature is set
DocumentBuilderFactory true entityExpansionLimit = 64,000
elementAttributeLimit = 10,000
maxOccur = 5,000
SAXParserFactory true entityExpansionLimit = 64,000
elementAttributeLimit = 10,000
maxOccur = 5,000
SchemaFactory true maxOccur = 5,000
TransformerFactory false extension functions disabled
XPathFactory false extension functions disabled
Terms of Use; Privacy Policy; Copyright ©2013-2014 (revision 20140418.2d69abc)
 
 
Close
loading
Please Confirm
Close