Q. What is the Java API for XML Processing (JAXP)?
The Java API for XML Processing, or JAXP for short, enables applications to parse and transform XML documents using an API that is independent of a particular XML processor implementation. JAXP also provides a pluggability feature which enables applications to easily switch between particular XML processor implementations.
To achieve the goal of XML processor independence, an application should limit itself to the JAXP API and avoid implementation-dependent APIs and behavior. This may or may not be easy depending on the application. See the question How can I write my JAXP application to be implementation-independent? below for more information. JAXP includes industry standard APIs such as the Document Object Model (DOM) and the Simple API for XML (SAX). See these slides (PDF) from ApacheCon 2001 for more information.
The reason for the existence of JAXP is to facilitate the use of XML on the Java platform. For example, APIs such as DOM Level 2 do not provide a method to bootstrap a DOM
Documentobject from an XML input document, whereas JAXP does. Other parts of JAXP such as the
javax.xml.transformportion do not have any other equivalent XSLT processor-independent APIs.
- Q. What is the difference between the specification version and the
A JAXP implementation has both a JAXP specification version number and an implementation name and version number. Specification versions are limited to the form
Nis a number. Specifications are developed according to the Java Community Process (JCP). The most recent Java Specification Request (JSR) related to the JAXP API is JSR 206, which defines the JAXP 1.3 Maintenance Release.
Implementations attempt to use a particular specification version. However, a particular implementation may have bugs in it so that it deviates from the specification. Implementations may use an independent and arbitrary naming and versioning scheme from the JAXP specification version.
There is one particular implementation called the JAXP reference implementation (RI) which can cause confusion. The JAXP RI has a similar name to the specification and has similar version numbers. Unlike the specification version number, the JAXP RI may use a version number containing more than two numbers. For example, JAXP RI version 1.1.1 implements the JAXP 1.1 specification and it contains fixes for bugs found in the previous JAXP RI 1.1. Note the difference between specification and implementation versions here. Also, the first RI version has the same number as the specification version, namely 1.1.
As of October 2005, the current specification version of JAXP is 1.3, which is available both as a standalone API and as part of the Java 2 Platform, Standard Edition (J2SE) 5.0. The current RI version is JAXP RI 1.3.2. In addition, JAXP 1.4 is a part of Java SE 6.
- Q. Where can I download an implementation?
The tables below list implementations that claim to support at least some portions of JAXP. Please note that not all claims have been verified and that the information may not be current.
JAXP can be divided into two main parts: a parsing API and an transform API. Implementations that support the transform API are typically XSLT processors which require an XML parser to read input documents. Because of this, these implementations typically bundle an XML parser as part of their distribution.
The following implementations support the transform component of JAXP and also bundle a parser (in alphabetical order):
Name Parser Implementation XSLT Processor Implementation Comment Apache Xalan-J Xerces 2.7 Xalan XSLT None JAXP Reference Implementation Xerces 2.7 XSLT See JAXP RI questions below. J2SE 1.4 Crimson Xalan-J XSLT, cvs tag: xalan_2_2_d10 Uses JAXP RI version later than 1.1.2 J2SE 5.0 Xerxes 2.7 XSLT Uses JAXP RI version 1.3 J2SE 6.0 Xerxes 2.7 Xalan 2.6 Uses JAXP RI version 1.4
The parsing component often is distributed separately. The following are implementations that support just the parsing component of JAXP (in alphabetical order):
Name Comment AElfred2 portion of GNUJAXP Non-apache style license, see link for details Apache Crimson None Apache Xerces-J 1.x Supports XML Schema. Obsoleted by Xerces2-J. Apache Xerces2-J Supports XML Schema. Supersedes Xerces-J 1.x.
Q. Why does Apache have multiple XML parsers?
As of March 2006, Apache has several Java parsers: Crimson, Xerces 1, and Xerces 2. The reason is historical -- because Apache accepted two donations from two different companies. IBM donated XML4J which became Apache Xerces 1. Sun donated Project X which became Apache Crimson. Xerces 2 is a new third parser which is a rewrite. It has goals such as maintainability, modularity, and the implementation of certain features, which neither of the previous original parsers has achieved. Xerces 2 was designed to fill the long-term needs of Apache projects going forward. The current version of Xerxes is 2.8.
Q. Where can I get the JAXP specification?
Q. How do I start developing an application which uses JAXP?
If your application needs to perform XSLT transformations programmatically, then you need an implementation that supports the transform parts of the JAXP API. One resource that provides sample code for this type of application is the Xalan-J documentation.
If not, then your application is a purely parsing application and you need to decide between using the DOM or SAX APIs. In the interest of having less API to learn, it is recommended to limit usage to the standard DOM and SAX APIs as much as possible and use the auxiliary JAXP methods for the functionality that is not available or perhaps difficult to use via the DOM or SAX APIs. For example, currently DOM does not specify a method to bootstrap or load an XML document and return a DOM
Documentobject. This is available via JAXP.
Q. Can an Application use the JAXP implementation specific packages/classes (such as Xalan) directly?
An application should not depend on JAXP implementation packages/classes (for example, Xalan) as they are subject to change. JAXP is an abstract layer over an XSLT processor and an XML parser, both of which are subject to change. For example, In J2SE 5.0, Xalan has been replaced by XSLTC (an XSL style sheet compiler).
Q. Where can I ask questions about JAXP?
One place to ask questions about JAXP is with the provider of your implementation. For example, if you are using the Apache Xerces parser, use the xerces-j-user mailing list. For Apache implementations, see the XML mailing lists page for subscription information. One location where you can find Apache mailing list archives is at MARC.
Sun also hosts a web-based Java and XML Forum where you can communicate with other JAXP users.
Q. Why do I get a warning about Namespace processing default values?
JAXP has a
namespaceAwareproperty that is directly tied to the SAX 2.0
"http://xml.org/sax/features/namespaces"feature which controls whether the parser performs namespace processing. However, the JAXP default value of this property is different from the native SAX 2.0 default. When an application creates a parser using JAXP, the default value is
false, but when using SAX 2.0 directly using the static method
org.xml.sax.helpers.XMLReaderFactory.createXMLReader(), the default is
The following code samples illustrate the typical use case of creating a parser with namespace processing turned on using JAXP (exception handling has been omitted). First, an example to create a DOM
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new Inputsource("http://some-uri.com/foo.xml"));
This next example instantiates a SAX 2.0 XMLReader using JAXP. Note, in this example, the code does not use the SAX 2.0
createXMLReader()static method to instantiate the XMLReader.
SAXParserFactory spf = SAXParserFactory.newInstance();
XMLReader xmlReader = spf.newSAXParser().getXMLReader();
You may ask, "why are the default values different"? The reason for this is historical. JAXP 1.0 first defined the default value to be
false, then SAX 2.0 came along and defined a "namespaces" feature with a default of
true, finally JAXP 1.1 came along and had to maintain backward compatibility with JAXP 1.0, yet support SAX 2.0.
Q. How can I write my JAXP application to be implementation-independent?
JAXP enables your application to be implementation-independent, but it does not enforce this constraint. For example, if your instance XML documents use character encodings that are not required to be supported by the XML REC, then this may cause compatibility problems. You may have tested your application with a JAXP parser that implements an optional character encoding, but not all JAXP parsers may support that character encoding. Therefore, to ensure portability, your application should limit itself to the following required encoding names in instance documents that contain an encoding declaration: "UTF-8" or "UTF-16". In particular, Java encoding names such as "UTF8" in an encoding declaration may not work with all JAXP compatibile processors.
Q. How do I output/marshal/serialize a DOM tree into a stream?
Note there are several implementation-dependent ways of doing this such as using the
org.apache.xml.serializepackage in Xerces, but this ties your application to a particular parser and is thus non-portable.
In the future, DOM Level 3 should also provide this feature and it will likely be incorporated into a future version of JAXP.
Q. When I turn on validation, why do I fail to get any errors?
This is probably because you have not set an
ErrorHandler. To get validation errors, three things must be true:
- The source document must be associated with a schema. For example, the source document contains a DOCTYPE declaration.
- Validation must be turned on.
- The application must set a SAX
Ofte, applications fail to perform the last item.
Q. How do I validate my instance document to a particular schema?
If you are using DTDs, the only standard way of controlling the DTD that is used to validate a document is to insert or replace the document type declaration within the XML document itself. A simpler method that requires a minimum of code is to use an
EntityResolver. In general, you can use an
EntityResolverto override any external entity in your XML document. However, an EntityResolver cannot override the local subset of a DTD, only the external subset.
If you are using another schema language such as W3C XML Schema, you can use the JAXP API (versions 1.2 onwards) to set programmatically the schema used to validate your instance document. See How do I override the JAXP implementation in J2SE SDK 1.4? for more information on this topic.
Q. How do I use a different JAXP compatible implementation?
From JAXP 1.1 onwards, applications can plug in different JAXP compatible implementations of parsers or XSLT processors. For example, when an application wants to create a new JAXP
DocumentBuilderFactoryinstance, it calls the staic method
DocumentBuilderFactory.newInstance(). This causes a search for the name of a concrete subclass of
DocumentBuilderFactoryusing the following order:
- The value of a system property like
javax.xml.parsers.DocumentBuilderFactoryif it exists and is accessible.
- The contents of the file
$JAVA_HOME/jre/lib/jaxp.propertiesif it exists.
- The Jar Service Provider discovery mechanism specified in the Jar
File Specification. A jar file can have a resource (i.e. an
embedded file) such as
META-INF/services/javax.xml.parsers.DocumentBuilderFactorycontaining the name of the concrete class to instantiate.
- The fallback platform default implementation.
Of the above ways to specify an implementation, perhaps the most useful is the jar service provider mechanism. To use this mechanism, place the implementation jar file on your classpath. For example, to use Xerces 1.4.4 instead of the version of Crimson which is bundled with JDK 1.4 (Java Development Kit version 1.4), place xerces.jar in your classpath. This mechanism also works with older versions of the JDK which do not bundle JAXP. If you are using JDK 1.4 and above, see Why are there Apache classes in the J2SE 1.4 RI? for potential problems.
- The value of a system property like
Q. Why are there Apache classes in the J2SE 1.4 RI?
The J2SE 1.4 RI was the first version of the JDK that bundled in an implementation of JAXP 1.1. This allows developers to write applications without having to provide a parser and XSLT processor with their application. However, in some cases, it may create additional problems.
The Sun J2SE 1.4 RI uses Apache software for its implemenation of JAXP 1.1 with package names unchanged from Apache software distributions. This can cause problems, for example, if your application wants to use a newer version of Apache software. Under the Java 2 class loader delegation model, the java launcher's ClassLoader will load the bundled version of a class before any other version. Thus, if you place a newer version of xalan.jar on your CLASSPATH, then that version will be ignored since the runtime will use the older bundled version instead. As a workaround, see the question on overriding the implementation in J2SE SDK 1.4.
This problem has been overcome in J2SE 5.0 by renaming these packages to
Q. How do I override the JAXP implementation in J2SE SDK 1.4?
From J2SE 1.4 onwards, there is an Endorsed Standards Override Mechanism which can be used to override the classes in the SDK itself. One way to replace the classes in the SDK with the classes contained in a set of jar files is to place the jar files in some directory, "endorsed", and define a system property. For example, to use a newer version of Xalan, place the newer version of xalan.jar in the "endorsed" directory and invoke the Java launcher with the -Djava.endorsed.dirs= endorsed option.
Q. How do I use W3C XML Schema with JAXP?
Use the JAXP 1.2 API to validate instance documents with W3C XML Schema. The JAXP 1.2.0 RI contains two sample programs (DOMEcho and SAXLocalNameCount) that illustrate how to do this. See the JAXP 1.2 specification for more details.
JAXP Implementation Questions
Q. Where do I get the latest implementation of JAXP?
Newer implementations of JAXP are being released through the following Sun software releases:
- Java Web Services Developer Pack -- usually includes the latest unbundled JAXP RI version
- Java 2 Platform, Standard Edition (J2SE) version 1.4.0 or later -- contains a recent bundled JAXP implementation, but may not be the latest.
- Java 2 Platform, Enterprise Edition (J2EE) Reference Implementation version 1.3 or later -- contains a recent JAXP implementation, but may not be the latest.
Q. Where can I find JAXP docs online?
Q. Where do I get the source code to JAXP?
JAXP is based on open sourced code. You can obtain the source code from the JAXP SVN repositories.
Q. Why all the JAR files?
In JAXP 1.2.0, the jar packaging scheme was changed so that were a total of six JAR files. The reasons were to support the J2SE Endorsed Standards Override Mechanism and to be compatible with the Apache packaging scheme. This decision was a compromize because six JAR files mads life diffucult for users.
However, in JAXP 1.4, these six JAR files have been consolidated into just two JAR files:
Q. (Obsolete) What happened to jaxp.jar?
To summarize, starting with JAXP 1.1.3, there is no jaxp.jar. This fact should have been emphasized in the JAXP 1.1.3 documentation. The motivation for this decision was to match the Apache packaging scheme at the time of release, as well as to simplify life for developers.
In JAXP 1.1.3, which includes the Apache Crimson 1.1.3 parser, both API and implementation classes are contained in the same jar files: crimson.jar for the parser and xalan.jar for the XSLT processor. The JAXP RI packaging scheme matches the Apache packaging scheme at the time of release, however, the Apache scheme has since changed. See the Apache site for details.