Fast and Easy XML Processing

"Upgrade to JAXP 1.3"

by Neeraj Bajaj (Jan. 2005)

This article explains some of the new concepts and important features introduced in JAXP 1.3. JSR 206was developed with performance and ease of use in mind. The new Validation Framework gives much more power to the application dealing with XML Schema and significant performance improvements. XPath APIs provide access to the XPath evaluation environment. JAXP 1.3 brings richer XML Schema datatype support to the Java platform by adding new datatypes that map to W3C XML Schema datatypes. Keeping pace with the evolution of XML standards, JAXP 1.3 also adds complete support for the following standards: XML 1.1, DOM L3, XInclude, and SAX 2..0.2. All this has already gone into the Java platform in the latest release of J2SE 5.0. If you are on J2SE 1.3 or 1.4, a standalone stable implementation of JAXP 1.3 is also available to download from This article mainly concentrates on the work done as part of the JSR 206 effort. New Schema Validation Framework concepts are explained in this article along with working code and diagrams. Major new features introduced are:

Schema Validation Framework

JAXP 1.3 introduces a new Schema independent Validation Framework (called the Validation API). This new framework gives much more power to the application dealing with XML Schema and can accomplish things which were not possible before. In the new approach there has been a fundamental shift in the way XML Processing and Validation is performed. Validation used to be considered an integral part of XML parsing and previous versions of JAXP supported validation as a feature of an XML parser: a SAXParser or DocumentBuilder instance.
But now the JAXP 1.3 Validation API decouples the validation of an instance document as a process independent of parsing. This new approach has several advantages. Applications relying heavily on XML Schema can greatly improve the performance of schema validation. The performance gain largely depends on the ratio (Size of XML Schema / Size of XML document), larger ratios lead to greater performance gains. This is not the end of the story. Using these new Validation APIs, many previously unsolvable problems can be solved in an efficient, easy, and secure way. Let's look what all you can do with the new Schema Validation Framework.

Validate XML against any schema

Though JAXP 1.3 requires only W3C XML Schema language to be supported, one can easily plug-in support for other Schema Languagessuch as. RELAX NG. The Validation API provides a pluggability layer through which applications can provide specialized validation libraries supporting additional schema languages at run-time. This is achieved using a SchemaFactory class which is capable of locating implementations for the schema languages at run time. The first step is to specify the schema language to be used and obtain the concrete factory implementation.

SchemaFactory sf = SchemaFactory.newInstance(<SCHEMA LANGUAGE>);
<SCHEMA LANGUAGE> could be W3C XML Schema, Relax NG etc.

If this function returns successfully, it means an implementation capable of supporting specified schema language is available. Getting the SchemaFactory implementation is the entry point to the Validation APIs. This step goes through the pluggability mechanism which has been at the core of JAXP for a long time. The pluggability mechanism is also involved in SAXParserFactory and DocumentBuilderFactory. The javadoc for SchemaFactory.newInstance() explains the steps involved in searching for a factory implementation. For completeness, it is included below:

To find a SchemaFactory object for a given schema language, this method looks in the following places in the following order where "the class loader" refers to the context class loader:

  1. If the system property "javax.xml.validation.SchemaFactory:schemaLanguage" is present (where schemaLanguage is the parameter to this method), then its value is read as a class name. The method will try to create a new instance of this class by using the class loader, and returns it if it is successfully created.

  2. $java.home/lib/ is read and the value associated with the key being the system property above is looked for. If present, the value is processed just like above.

  3. The class loader is asked for service provider provider-configuration files matching javax.xml.validation.SchemaFactory in the resource directory META-INF/services. See the JAR File Specification for file format and parsing rules. Each potential service provider is required to implement the method: isSchemaLanguageSupported(String schemaLanguage)
    The first service provider found in class loader order that supports the specified schema language is returned.

  4. Platform default SchemaFactory is located in a implementation specific way. There must be a platform default SchemaFactory for W3C XML Schema.

One can write the code in such a way that applications can switch between W3C XML Schema and Relax NG validation without changing a single line of code.

Compile Schema

With the new Validation APIs, an application has the option to parse only the schema, checking schema syntax and semantics as per the constraints imposed by the particular schema language. This is quite useful when you are writing a schema and want to make sure that the schema is written in conformance to the spec. SchemaFactory is the class which does this job. This class loads the schema and also prepares them in a special form represented as a javax.xml.validation.Schema object which can be used for validating instance documents against the schema. A schema may include or import other schemas. In that case those schemas are also loaded.

When reading schema a SchemaFactory may need to resolve resources and can encounter errors. As you see in the figure 1, an LSResourceResolver and an ErrorHandler can be registered on SchemaFactory. When registered, ErrorHandler is used by SchemaFactory to report any errors encountered during schema compilation. If LSResourceResolver is set, it is used by SchemaFactory to customize resolution of resources. It is a new interface introduced as part of DOM L3. Functionally it is the same as SAX EntityResolver except that it also provides the information about the namespace of the resource being resolved, for example, the targetNamespace of the XML Schema. Here is a section of code which shows how SchemaFactory can be used to compile schema and get a Schema instance.

//create a SchemaFactory capable of understanding W3C XML Schemas (WXS)
1. SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
//Set the error handler to receive any error during Schema Compilation
2. factory.setErrorHandler(new MyErrorHandler());
//set the resource resolver to customize resource resolution
3. factory.setResourceResolver( new MyLSResourceResolver());
// load a WXS schema, represented by a Schema instance 
4. Schema schema = factory.newSchema(new StreamSource(new File("mySchema.xsd")););

The javax.xml.validation.Schema class is an "immutable" memory representation of schema. Once the schemas are loaded into memory and represented as a Schema instance, it can be used for validation. In this way, a single Schema instance can be shared with many different parser instances even running in different threads. Applications are encouraged to be written so that the same set of schema are parsed only once and the same Schema instance is passed to different instances of the parser. The next section explains how the Schema instance can be used to validate documents

Validate XML using compiled schema

Before we look at this approach, let's look at the way we have been doing Schema Validation using schema properties which were defined in JAXP 1.2.

Here is an example showing how the above two properties are used today:

1. SAXParserFactory spf = SAXParserFactory.newInstance();
2. spf.setNamespace(true); 
3. spf.setValidating(true); 
4. SAXParser sp = spf.newSAXParser();
5. sp.setProperty(“”, “”);
6. sp.setProperty(“”, “mySchema.xsd”) ;
7. sp.parse(<XML Document>, <ContentHandler);

The user sets the schemaLanguage and/or schemaSource property on SAXParser and setValidating() to true. Generally a business application defines a "set of schema" containing their business rules against which XML documents need to be validated. To accomplish this an application sets the schema using schemaSource property or relies on the xsi:schemaLocation attribute in the instance document to specify the schema location(s).

This approach works well but there is a tremendous performance penalty. The specified schemas are loaded again and again for every XML document that needs to be validated! This problem is solved with the new Validation APIs where an application needs to parse a set of schemas only once.

Let's see how! After the “Compile Schema(s)" step, do the following..

5. SAXParserFactory spf = SAXParserFactory.newInstance()
//Just set the Schema instance on SAXParserFactory
6. spf.setSchema(schema);
//Obtain the SAXParser instance
7. SAXParser saxParser = spf.newSAXParser();
//parser will parse the XML document but validate it using Schema instance
8. saxParser.parse(new File("instance.xml"), myHandler);

Just set the Schema instance on the factory and you are done. No need to set the validation to true, no need to set schemaLanguage or schemaSource property. Validation of XML documents is done against the compiled schema set on the factory. You will be amazed by the performance improvements you'll see doing schema validation this way. Try it yourself. As mentioned earlier, the performance gain largely depends on the ratio (Size of XML Schema / Size of XML document). Larger ratios lead to a larger performance gain. Also look at the Reusing a Parser Instance section to further improve the performance.

Note that it is an error to use the and/or the 

property in conjunction with a non-null Schema object. Such configuration will cause a SAXException exception when those properties are set on SAXParser or DocumentBuilderFactory.

Validate a SAXSource or DOMSource

As mentioned earlier there has been fundamental shift in the way XML parsing and validation is looked at. Now XML validation is considered a process independent from XML parsing. Once you have the Schema instance loaded into memory, you can do many things depending upon your requirement. You can create a ValidatorHandler which can validate a SAX stream or create a standalone Validator. Look at figure 3. A standalone Validator can validate a SAXSource or a DOMSource or an XML document against any schema. In fact, javax.xml.validation.Validator can still work if the SAX Stream or DOM object comes from a different implementation. The Validator can either just validate the input feed or validate the input feed and also send the augmented validation result to the specified output. Note that if you are supplying SAXSource as input , Result should only be SAXResult. If the input is DOMSource, Result should only be DOMResult.

To receive any errors during the validation, an ErrorHandler should be registered with the Validator. Lets see a working code

   //Create a Validator which can be used to validate instance document against this schema(s)
1. Validator validator = schema.newValidator();
2. validator.setErrorHandler( new ErrorHandlerImpl());
   //Validate this instance document against the Instance document supplied
3. validator.validate(new StreamSource(<XML Document>));

Validator can also be used to validate the instance document or DOM object in memory with augmented result sent to DOMResult.

4. Document document = //DOM object
5. validator.validate(new DOMSource(document), new DOMResult());

The Validation APIs can validate a SAX stream and work in conjunction with Transformation APIs to achieve pipeline processing which is explained in the next Validate XML after Transformation section.

Validate XML after Transformation

Transformation APIs are used to transform one XML document into another by applying a style sheet. There are times when we need to validate the transformed XML document against a schema. What should we do? Should that XML document be fed to a parser and then use the schema feature to do the Schema Validation ? No. The new Validation APIs give you the power to validate the transformed XML document against a different schema by allowing the application to create a pipeline and pass the output of a transformer to the Validation APIs to validate against the desired schema. It doesn't matter if the output of the Transformation is a SAX Stream or a DOM in memory.

Validate a SAX Stream

Here is a code snippet which shows you how to use specially designed javax.xml.validation.ValidationHandler which can Validate SAX Stream

   //get Schema instance
1. Schema schema = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI).newSchema(new File(<SCHEMA>)); 
   //Create ValidationHandler
2. ValidatorHandler vh = schema.newValidatorHandler();
   //set Error  Hanlder for reporting errors
3. vh.setErrorHandler(new ErrorHandlerImpl());
   // set application defined content handler 
4. vh.setContentHandler(new ApplicationContentHandler()); 
   //Get the Transformer
5. Transformer transformer = TransformerFactory.newInstance().newTransformer(new StreamSource(<STYLESHEET>));
   //Create a pipeline.. Validate the output of transformation against the 
   //schema using ValidatorHandler created from Schema instance.
6. transformer.transform(new StreamSource(<XML DOCUMENT>), new SAXResult(vh));

The whole flow is shown in Figure 4, where an XML document and a Stylesheet are given as input to a Transformer and the output is a SAX stream. We take advantage of the modular approach of doing validation independent from parsing. The javax.xml.validation.ValidatorHandler is a special handler which is capable of working directly with a SAX stream. It validates the stream and passes it to the application.

Validate DOM in memory

The Transformation APIs also allow a transformed result to be obtained as a DOM object. The DOM object in memory can be validated against a Schema. This could be done as follows.

//Transform the XML document into DOM
1. transformer.transform(new StreamSource(<XML DOCUMENT>), <DOMResult>dr);
//validate the DOM object created out of transformation
2. schema.newValidator().validate(new DOMSource(dr.getNode()));

So you see that the Validation APIs can be used with the Transformation APIs to easily do complex things. This approach also boosts performance as it avoids the step of parsing the XML again when validating a transformed XML document.

Validate a JDOM Document

The ValidatorHandler can be used to validate various object models such as JDOM against the schema(s). In fact, any object model (XOM, DOM4J, etc.) which can be built on top of a SAX stream or can emit SAX events be used with the Schema Validation Framework to validate XML document against Schema. This is possible because ValidationHandler can validate a SAX stream.

Lets see how JDOM document can be validated against schema(s)

   //Obtain the ValidatorHandler as explained in the previous section, step 1-4
   //pass the ValidatorHandler to SAXOutputter
5. SAXOutputter so = new SAXOutputter(vh);
6. so.output(jdomDocument);

It is that simple. JDOM has a way to output a JDOM document as a stream of SAX events. SAXOutputter fires SAX events which are validated by ValidatorHandler. Any error encountered is reported via ErrorHandler set on ValidatorHandler.

Obtain Schema Type Information

ValidatorHandler can give access to 'TypeInfoProvider' which can be queried to access the type information determined by the validator. This object is dynamic in nature and returns the type information of the current element/attribute assessed by the ValidationHandler during validation as the validation of the XML document progresses. This interface allows an application to know:

a) if the attribute is declared as ID type
b) if the attribute was declared in the original XML document or was added by Validator during Validation
c) Type information of the Element or Attribute as declared in Schema associated with the document.

Type information is returned as a org.w3c.dom.TypeInfo object which is defined as part of DOM L3 and has following functions.


The name of a type declared for the associated element or attribute, or null if unknown.


The namespace of the type declared for the associated element or attribute or null if the element does not have declaration or if no namespace information is available.


isDerivedFrom(java.lang.String typeNamespaceArg, java.lang.String typeNameArg, int derivationMethod)
This method returns if there is a derivation between the reference type definition, i.e.

The TypeInfo object returned is "immutable" and the caller can keep references to the obtained TypeInfo longer than the callback scope. The methods of this interface may only be called by the startElement event of the ContentHandler that the application sets on the ValidatorHandler. For ex. look at the section of the code below. Note that for clarity only part of the code is shown here,

   //Create ValidatorHandler, Schema acts as a factory for ValidationHandler
1. ValidatorHandler vh = schema.newValidatorHandler(); 
   //set Error Handler for reporting errors
2. vh.setErrorHandler(eh);
   //pass the TypeInfoProvider instance to application content handler
3. vh.setContentHandler(new MyContentHandler(vh.getTypeInfoProvider()));
4. SAXParserFactory spf = SAXParserFactory.newInstance();
5. spf.setNamespaceAware(true);
6. XMLReader reader = spf.newSAXParser().getXMLReader();
   //Set the validation handler as content handler so that it can work on top of SAX Stream
7. reader.setContentHandler(vh);
8. reader.parse(<XML Document>);

Ensure security of data

There could be serious consequence of validating an XML document against an untrusted Schema, as validation may modify the actual data by adding default attributes and so is capable of corrupting the data. Validation against an untrusted schema may also mean that incoming instance document might not confirm to constraints or rules as imposed by your business.
With the new Validation APIs , getting “Schema” instance is the first step before being able to validate an instance document and it is in the hands of the application how to create the Schema instance. Validation using the Schema instance makes sure that incoming instance document is not validated against any other (untrusted) schema(s) but only against the schema(s) from which the instance is created. If the instance XML document has elements/attributes which refers to schema(s) from a different targetNamespace and are not part of javax.xml.validation.Schema representation an error will be thrown. This approach protects you from accidental mistakes and malicious documents

Reusing a Parser Instance:

There have been certain changes in old APIs like SAXParser , DocumentBuilder, and Transformer to improve XML processing performance. Suppose your application has a requirement to parse multiple XML documents. You are either using SAXParser or DocumentBuilder and when reading one such XML document, the application encountered a fatal error or the application using SAXParser decides to stop parsing by throwing an exception. Can the same instance be reused to parser another XML document ? Lets make the scenario very simple, there is no fatal error or exception thrown, can the same instance be used to parse multiple XML documents ? There was nothing that suggested that same instance was safe to be used to parse multiple XML document. So one approach was to create new parser instances using the factory to parse different XML documents. Wouldn't it have been great if the same instance could be reused to parse many XML documents. JAXP 1.3 has added new function reset() on SAXParser, DocumentBuilder, and Transformer. This guarantees that the same instance can be re-used whatever may be the scenario. The reset() method improves the overall performance as resources, time associated with creating memory instances and GC time is also saved. Let's see how reset can be used taking the case where XML document is validated against the compiled schema as explained in section compile schema(s)

1. SAXParserFactory spf = SAXParserFactory.newInstance() ;
   //Set the Schema instance on SAXParserFactory
2. spf.setSchema(schema);
   //Obtain the SAXParser instance
3. SAXParser saxParser = spf.newSAXParser();
   //there are multiple XML documents that need to be parsed
4. for(int i = 0 ; i < n ; i++){
   //parser will parse the XML document but validate it using Schema instance
5. saxParser.parse(new File(args[i]), myHandler);
   //reset the parser instance
6. saxParser.reset(); }

Validating XML document using the compiled Schema and reusing the parser instance considerably improves the schema validation peformance. reset() can also be used standalone for ex. when parser is in non-validating mode. Same function has been added to javax.xml.transform.Transformer, newly designed javax.xml.validation.Validator and javax.xml.xpath.XPath too. Applications are encouraged to reuse the parser, transformer, validator instance by calling reset when processing multiple XML documents.

Note that reset() sets the instance back to factory settings i.e. to the same state when instance was created from factory, so any feature or property set on the instance should be set again after calling reset().

XPath Support

Accessing XML is made simple using XPath, a single XPath expression can be used to replace many lines of DOM API code. JAXP 1.3 has defined an XPath APIs which conforms to XPath 1.0 and provides an object-model neutral API for the evaluation of XPath expressions and access to the evaluation environment. Though current APIs conform to XPath 1.0, the APIs have been designed with future XPath 2.0 support in mind.

To use JAXP 1.3 Xpath APIs the first step is to get the instance of XPathFactory. Though the default model is W3C DOM, it can be changed by specifying the object model URI

//Default model, W3C DOM
XPathFactory factory = XpathFactory.newInstnace();
// Specific Object Model
XPathFactory factory = XpathFactory.newInstance(object_model_uri);

Evaluate XPath expression

XpathFactory is used to create XPath objects. XPath interface provides access to XPath evaluation environment and expressions. Xpath has overloaded evaluate function which can return the result by evaluating Xpath expression based on return type set by the application. Following table shows XML document on the left hand side and working code on the right side to print all the names of the book in the XML document.

     <Author> Author1 </Author>
     <Name> Name1 </Name>
     <ISBN> ISBN1 </ISBN>
     <Author> Author2 </Author>
     <Name> Name2 </Name>
     <ISBN> ISBN2 </ISBN>
//Get Xpath instance
XPath xpath = XpathFactory.newInstance().newXPath();
//Write an Xpath expression to retrieve all the Book names
String expression = "/Books/Book/Name/text()";
//NodeSet containing all the book names
NodeSet nameNodes = (NodeSet) xpath.evaluate(expression, new
 InputSource("Books.xml"), XpathConstants.NODESET);
//print all the names of the books
for(int i = 0 ; i < result.getLength(); i++){
    System.out.println("Book name " + (i+1) + " is " +

Evaluate with Context specified

Xpath is also capable of evaluating an expression based on the context set by the application.The following example sets the Document node as the context for evaluatin

“/Books/Book” XPath expression.
//parse the XML as a W3C Document
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document document = builder.parse(new File("Books.xml"));
XPath xpath = XPathFactory.newInstance().newXPath();
String expression = "/Books/Book";
NodeSet books = (NodeSet) xpath.evaluate(expression, document, XpathConstants.NODESET);

With a reference to a Book element, a relative XPath expression can now be written to select the Name element.

String expression = “Name”;
Node bookNode = xpath.evaluate(expresion, books.item(0), XpathConstants.NODE);

NamespaceContext XPath evaluation

What happens if the XML document is Namespace aware ? Let's consider the above Xml document and put the Book element in “publisher1” domain as follows

In this case Xpath expression “/Books/Book/Name/text()” won't give any result because expression is not fully qualified. You can specify 'prefix' and use an expression like "/Books/p1:Book/p1:Name". NamespaceContext should be set on XPath instance so that prefix 'p1' can be resolved. This is shown in the table below where NamespaceContext capable of resolving 'p1' is set on XPath instance.

<Books >
<Book xmlns=””>
<Book xmlns=””>
XPath xpath = XpathFactory.newInstance().newXPath();
//write an Xpath expression to retrieve the Book name in p1 domain
String expression = "/Books/p1:Book/p1:Name" ;
//set the namespace context which defines the mapping
//evaluate an Xpath expression to retrieve the 'Name' node(s)
NodeSet nameNodes = (NodeSet) xpath.evaluate(expression, new
InputSource(“Books.xml”), XpathConstants.NODESET);
//Print the count
System.out.println("Node count = “ + nameNodes.getLength());

Note that two Book elements are in different namespace. So the expression would result in only one node.


XPath spec. allows variables to be used in the XPath expressions. XpathVariableResolver is defined to provide access to the set of user defined XPath variables. Here is an example of Xpath expression using Variable.

//Write an Xpath expression, Get all the Name nodes which has the value as specified by variable $bookName
String expression = "/Books/j:Book[j:Name=$bookName]";
//Set the XpathVariableResolver capable of resolving the variable $bookName
//Finally evaluate the Xpath expression
Node node = (Node) xpath.evaluate(expression, new InputSource(“Books.xml”), XPathConstants.NODE);
System.out.println("Node name is " + node.getNodeName());

A SimpleXPathVariableResolver can implement the resolveVariable function as follows. For clarity only relevant code is shown here.

public Object resolveVariable(javax.xml.namespace.QName qName) {
        return "Name1";

XML Schema Datatypes

JAXP 1.3 has introduced new datatypes (javax.xml.datatypes package) in java platform which directly maps to some of the XML Schema datatypes, thus bringing XML Schema datatype support directly into Java platform.. DatatypeFactory has functions to create different types of datatypes for ex. xs:data, xs:dateTime, xs:duratoin etc. javax.xml.datatype.XMLGregorianCalendar takes care of many W3C XML Schema 1.0 date/time datatypes specifically, dateTime, time, date, gYearMonth, gMonthDay, gYear gMonth and gDay defined in the XML Namespace "". These datatypes are normatively defined in W3C XML Schema 1.0 Part 2, Section 3.2.7-14.

javax.xml.validation.Duration is an immutable representation of a time span as defined in the W3C XML Schema 1.0 specification. A Duration object represents a period of Gregorian time, which consists of six fields (years, months, days, hours, minutes, and seconds) plus a sign (+/-) field.

Following table shows the mapping of XML Schema data type to Java data types.

W3C XML Schema Data Type

Java Data Type



















XQuery 1.0 and XPath 2.0 Data Model

Java Data Type





These datatypes has rich set of functions introduced to do basic operations over datatypes for ex. addition, subtraction and multiplication.

Also there are ways to get the lexicalRepresentation of particular datatype which is defined at XML Schema 1.0 Part 2, Section 3.2.[7-14].1, Lexical Representation". There is no need to understand the complexities of XML Schema datatypes ex. kind of operations can be performed on datatypes or write a lexical representation of particular datatype, javax.xml.datatype APIs do it for you.

XInclude Support

JAXP 1.3 has also defined the support for XInclude. SAXParserFactory/DocumentBuilderFactory should be configured to make it XInclude aware which can be done by setting setXIncludeAware() to true.

Security Enhancements

JAXP 1.3 has defined a security feature, “http://javax.xml.XMLConstants/feature/secure-processing“ which when set to 'true' operates the parser in secure manner and instructs the implementation to process XML securely and avoid conditions such as denial of service attacks. Examples include restriciting the number of entities that can be expanded, number of attributes an element can have and XML Schema constructs that would consume large amounts of resources, for example large values for minOccurs and maxOccurs . If XML processing is limited for security reasons, it will be reported via a call to the registered ErrorHandler.fatalError().


This article has introduced you to some of the new features introduced in JAXP 1.3. You have seen the benefits of Schema Validation Framework and how it can be used to improve the performance of schema validation. Applications using JAXP 1.2 schema properties to validate XML document against schema are encouraged to upgrade to JAXP 1.3 and use this framework. Remember to reuse the parser instance by calling reset() to improve the performance. New object model neutral XPath APIs brings the XPath support and can work with different object models. XML Schema datatype support is brought directly into Java platform with the introduction of new datatypes. Security features introduced in JAXP 1.3 can help the application against any denial of service attacks. Also JAXP 1.3 brings complete support of latest standards XML 1.1, DOM L3, XInclude and SAX 2..0.2. There are enough reasons to upgrade to JAXP 1.3 and the implementation is available to download from

Terms of Use; Privacy Policy; Copyright ©2013-2017 (revision 20160708.bf2ac18)
Please Confirm