SAXON home page

SAXON: Change History

This file describes changes for version 6. For changes prior to version 6.0, see changes5.html. For changes prior to version 5.0, see history.html

Changes in version 6.0.2 (2000-12-08)

Defects cleared

The following errors were reported for version 6.0.1, and have been cleared except where otherwise noted:

6.0.1/001 When a template is called recursively to obtain a default value for one of its own parameters (i.e. within <xsl:param>), the wrong result may be returned. This is because tail recursion is invoked when it should not be. (Bug also present in 5.5 and earlier releases).
6.0.1/002 An array bound exception will occur when processing a document with a stylesheet that uses more than 100 namespace URIs or namespace prefixes. Present since 6.0
6.0.1/003 When a key is defined with match="@*", nothing will be retrieved. The problem also applies to some other patterns that can match attributes, for example match=" name | @name ". (Possibly present in 5.5 and earlier releases - unconfirmed)
6.0.1/004 The extension functions saxon:set-user-data() and get-user-data() do not work correctly with the TinyTree model. They may also fail with the standard tree model if the context node is an attribute or namespace. This is because the code relies on a one-to-one mapping of XPath nodes to Java objects. (Present since 6.0)
6.0.1/005 Not a bug.
6.0.1/006 When attribute value templates are used in the attributes of xsl:sort, for example ascending="{$asc}", then the values used are those that apply the first time the sort occurs; if subsequent sorts have different values for the parameters, these are ignored. This is true even if the subsequent sort takes place in a later transformation using the same PreparedStyleSheet. (Also applies to 5.5 and earlier releases).
6.0.1/007 saxon:output and other Saxon extension elements do not allow the xsl:extension-element-prefixes attribute to appear on the extension element itself. (Present since 6.0)
6.0.1/008 An attempt to access the last processing instruction in the source document using xsl:value-of, xsl:copy, etc, will fail if the data part of the processing instruction is zero length. The failure occurs with the Microsoft JVM but not with JDK 1.3. (Present since 6.0)
6.0.1/009 Running a transformation using the Transformer.getInputContentHandler() method fails saying that the same NamePool must be used for the StyleSheet and the source document. (Present since 6.0)
6.0.1/010 The code that searches for an xml-stylesheet processing instruction displays unintended trace information on System.err.
6.0.1/011 When xsl:apply-imports is called and there is no explicit imported template rule to invoke, Saxon does a no-op; the correct action is to invoke the built-in template rule for the current node. (Bug present in all previous releases).
6.0.1/012 If the value attribute to xsl:number is not an integer, Saxon truncates it towards zero rather than rounding it as specified. (Bug present in all previous releases).
6.0.1/013 With the TinyTree model, selecting a namespace node using //e/namespace::n doesn't work. Selecting all namespace nodes using namespace::* is OK. (Present since 6.0)
6.0.1/014 An array bound check failure may occur in routine com.icl.saxon.tinytree.TinyElementImpl.makeAttributeNodeFS() when searching for the last attribute node in the document. (Present since 6.0)

Integration with FOP has been restored. Saxon now works with FOP version 0_15_0.

NamePools: I have changed the approach, so that instead of making a copy of the stylesheet name pool for each transformation, the name pool is now shared (which means its updating methods are now synchronized, to ensure thread-safety). This shouldn't affect most users, unless you are manipulating NamePools explicitly. It is still possible to have multiple name pools, but you now need to organise any copying yourself if this is what you want to do. For 99% of users, it should be possible to ignore NamePools entirely and just leave the system to use the single default name pool all the time.

The following changes are for conformance with the (imminent) XSLT 1.0 errata:

Changes in version 6.0.1 (2000-11-28)

Defects cleared

The following errors were reported for version 6.0, and have been cleared except where otherwise noted:

6.0/001 When xsl:copy-of is used to copy attributes with no namespace prefix, and the owning element has a default namespace declaration (xmlns="xyz"), then an invalid prefix is generated for the attributes.
6.0/002 The PreparedStyleSheet object is not serially reusable. A new NamePool needs to be allocated each time it is used.
6.0/003 A performance bug: in the match pattern row[id=1234] the predicate is not recognized as a boolean predicate, therefore the pattern matching code determines the position of the row relative to its siblings on the assumption that it needs this information. If there are a large number of <row> siblings this gives a severe performance hit.
6.0/004 The function-available() function returns false for a method that exists but that requires one or more arguments.
6.0/005 The element-available() function crashes (with a diagnostic print of the name pool contents) if the supplied name is one that is not used in the stylesheet and is not a known XSL or Saxon instruction.
6.0/006 With the TinyTree tree model, finding the descendants of a node that has neither descendants nor following-siblings produces incorrect results.
6.0/007 DTDGenerator won't compile: no name pool is supplied to RuleManager
6.0/008 In the SQL sample application, the last row is not written to database. (This reported bug has not yet been investigated)

Other changes

Warning messages (issued typically when a node matches more than one template rule) are now limited in number: only the first 25 are displayed.

Changes in version 6.0 (2000-11-17)

In Saxon 5.5, I introduced a change that allows a result-tree-fragment to be implicitly converted to a node-set. I did this in anticipation of changes in XSLT 1.1, and to allow interoperability with MSXML3. However, Microsoft have now withdrawn this facility and conform fully to the XSLT 1.0 rules, so in order to protect Saxon's reputation for 100% conformance, I have decided to withdraw the facility too. It can still be used, however, if the stylesheet specifies version="1.1". For more details, see Conformance

Defects in version 5.5.1

The following errors are cleared in version 6.0:

5.5.1/001 When xsl:copy-of is used to make a copy of an element node that has no attributes or namespace declarations of its own, the namespace nodes inherited from its ancestor elements are not copied to the result tree. (Present since 5.5)
5.5.1/002 In some Java environments (ServletExec) the current method for dynamic loading of classes fails. The fix to this detects this failure and reverts to the simple pre-JDK 1.2 method.
5.5.1/003 When <xsl:namespace-alias> is used, Saxon uses the new (result-prefix) prefix and the new URI in the output. A careful reading of the spec suggests that it should use the old (stylesheet-prefix) prefix with the new URI. (The term "result-prefix" is thus a misnomer).
5.5.1/004 An ArrayIndexOutOfBounds exception occurs if the match pattern "@comment()" (or "@text()" or "@processing-instruction()") is used in an xsl:template rule. Such a pattern is meaningless (it will never match any nodes) but entirely legal.
5.5.1/005 Saxon does not report an error if two sibling <xsl:with-param> elements specify the same parameter name.
5.5.1/006 Where conflicting <xsl:strip-space> and <xsl:preserve-space> elements occur in the stylesheet, Saxon gives greater weight to the priority of the pattern than to its import precedence. So <xsl:strip-space elements="ns:item"> in an imported stylesheet will incorrectly override <xsl:preserve-space elements="ns:*"> in the importing stylesheet.
5.5.1/007 A null pointer exception can occur in the AElfred parser when attempting to access an XML file using a URL, if the resource accessed by the URL is found but its encoding is unknown.
5.5.1/008 A null pointer exception can occur when evaluating a variable reference within the arguments to an extension function that is called within the predicate of a filter expression.
5.5.1/009 When running in fowards-compatible mode, Saxon incorrectly rejects XSL elements that contain an attribute other than those defined in XSLT 1.0.
5.5.1/010 When xsl:copy is applied to an attribute, text node, comment, or processing instruction, the content of the xsl:copy element should be ignored. It isn't.
5.5.1/011 When output to a DOM Node is requested in the TrAX API, this is ignored if an output method is specified in an xsl:output element of the stylesheet. The output is sent to the standard output stream instead. The xsl:output element should be ignored.
5.5.1/012 When a top-level element such as xsl:output is used within a template, it is reported as an error. This happens even when processing in forwards-compatible mode (e.g. when version="1.1"). In this case fallback processing (xsl:fallback) should be invoked.
5.5.1/013

not yet fixed

When the first argument to the document() function is a result tree fragment, Saxon takes the Base URI (for resolving the URI if it is relative) as if the argument were a string. The intention of the specification, though not clearly stated, is that the Base URI should be calculated as if the argument were a node-set. That is, if the argument is $tree and $tree is defined by <xsl:variable name="tree">doc.xml</xsl:variable>, then the Base URI should be that of the xsl:variable element, not that of the element containing the call on the document() function.

New XSLT facilities at version 6.0

Added support for two new output encodings on xsl:output: iso-8859-2 and cp1250.

Added two attributes to xsl:output (not yet available in saxon:output):

Added a new extension function saxon:showNodeSet(). It takes a single argument that is a node-set, produces a diagnostic print of the node-set on System.err, and returns an empty string.

Added an extension function saxon:getContext() to get the context object. Only really intended for diagnostic use.

Command line changes

Added an option to choose the tree implementation (see below): -ds for the standard tree, as used in previous releases, -dt for the "tinytree" which is new to this release. The tinytree is the default: it takes up less memory, is faster to build, and generally appears to perform better in most circumstances.

The -a option on the stylesheet, which causes the source document to be processed using the stylesheet identified from its xml-stylesheet processing instruction, now uses the same logic as the getAssociatiedStylesheets() method in the TrAX interface. This means multiple (cascading) stylesheets are now supported. However, embedded stylesheets (identified by href="#id" in the xml-stylesheet processing instruction) are not supported at this release.

Java API changes

There have been a great many internal changes, but relatively few that impact directly on the high-level transformation API. In particular, if you only use TrAX interfaces, there are no changes. Otherwise, the main points to note are:

Internal Changes

These details should only affect you if you access intimate internal interfaces or use the Saxon source code.

There are two big changes to the internals of Saxon at this release: a new implementation of the tree structure, and a new system for handling names.

The tinytree implementation

I have introduced an alternative tree implementation (called "tinytree"). This is designed to reduce the number of Java objects created: the tree is sliced vertically rather than horizontally, so instead of having one Java object per node, there is one Java array for each property of the nodes, with an entry in the array for each node. The effect is to greatly reduce the Java memory management overheads. The existing tree structure remains available, and is always used for the stylesheet tree. It is also currently always used for the intermediate result tree created when saxon:output next-in-chain is used.

To select the standard tree structure, use -ds on the command line. To select the "tinytree" structure, use -dt. The default is -dt. You can also select the tree structure using a method on the Controller class.

The tinytree is smaller than the standard tree, as the name suggests, and it is also faster to build. However, it may be slower to navigate. So if you have a small document that is built once in memory and used repeatedly, the standard tree implementation is probably better. In other cases, however, the tinytree usually wins.

Name pools

I have made radical changes to the way names are managed. Previously, the NamePool object contained a pool of names, but its only real purpose was to avoid the memory overhead of storing each name many times. Now, Saxon takes advantage of the NamePool to avoid storing references to Name objects on the tree at all: instead it stores a "namecode": an integer which can be used to identify the name within the NamePool.

A namecode has 4 bits unused, 8 bits representing the prefix, and 20 bits acting as a pointer to an entry in the namepool containing the local name and namespace URI. Two names are therefore equal if the namecodes are the same in the bottom 20 bits. The value in these 20 bits is also referred to as the fingerprint of the name.

All searching for objects by name is now done by comparing fingerprints; no string comparisons are involved. Fingerprints are used not only for matching names used in XPath expressions to refer to the source document, they are also used for all matching of names within a stylesheet, for example variable names, template names, mode names, key names, and decimal format names.

The name pool is also used for storing namespace declarations: each prefix/URI pair is allocated a namespace code, and all manipulation of namespace nodes in the tree is done using these integer codes.

A consequence of this is that all documents used in a transform must use the same NamePool. This has some implications on the Java API. With simple use of the API, you needn't worry about name pools, they will be taken care of automatically. However, if you are operating a continuously running service in which both source documents and stylesheets are cached in memory, you may need to exercise some care to specify the right NamePool when each document is built.

The model is further complicated by multi-threading. Rather than have synchronization problems with multiple threads updating the same NamePool, the NamePool used to build the stylesheet is copied (imported) into the NamePool used to build the source document, before parsing of the source document starts. When you use the transform() method to parse and transform an InputSource, this happens automatically. However, if you want to build the document yourself, and transform it using transformDocument() (which allows you to run more than one transformation on the same document), then you must manage the NamePool merging yourself. The system does include checks that the NamePools for the stylesheet and source document are compatible, though these are not completely foolproof.

The use of namecodes rather than String names has affected many internal interfaces, and some of these are interfaces that are also exposed externally. For example, the ParameterSet object which is used to pass parameters from a calling template to a called template can also be used to supply global parameters to the Transformer. The parameters in a parameter set are now identified by an integer fingerprint rather than a string name. You can get the integer namecode from the NamePool using the getFingerprint() method; alternatively use the TrAX method addParameter(), which still takes the name as a String.

The Emitter interface has also changed to use name codes; if you have written your own Emitter, the code will have to be modified.

Other changes

The classes and interfaces used in Saxon for manipulating collections of attributes now implement the SAX2 Attributes interface.

The standard XPath functions have been extensively revised. The main change, apart from tidying up the code, is that the functions are now responsible for evaluating their own arguments, which enables some optimisation, especially when the arguments are node-sets: they can now be evaluated using knowledge of the data type required. For example, the not() function now stops as soon as the first node in the argument node-set is found.

Some of the little-used methods on the NodeInfo interface have been moved as static methods to a separate helper class, com.icl.saxon.om.Navigator. This enables the code of these methods to be independent of the particular tree implementation.

The delayed evaluation of path expressions now works as follows: on the first two occasions that a path expression is evaluated, it navigates the source tree. On the third occasion, it saves the resulting node-set in memory. On subsequent uses, the result is retrieved from memory. This approach is designed to balance time against memory usage.

The optimisation of "//name" as "/descendant::name" (which is possible when there are no predicates) wasn't working in 5.5 (or for a while before that), causing an unnecessary sort. This has been corrected. In addition, the first time "//name" is used for a particular document, the results are now saved, and all subsequent uses of "//name" for the same document retrieve the results from memory. This means that the traditional assumption that "//name" is inefficient may no longer always be true.

A Sequencer class has been introduced for allocating globally-unique sequence numbers. There are two such sequences, one for document numbers, and one for node numbers. By default, two sequencers are created when Saxon is loaded, and remain in use until it is unloaded. However, it is now possible to reset the sequence numbering if required, either to prevent running out of numbers in a long-running server, or to ensure repeatability of the value of generate-id(). The result of generate-id() depends on the document number, and you can restart the sequence of document numbers by calling controller.setDocumentSequencer(new com.icl.saxon.om.Sequencer()). It is the caller's responsibility to ensure that this does not cause two documents that are in use at the same time to have the same number. The node sequence number is used when sorting nodes into document order, and when eliminating duplicates in a union operation. You can similarly allocate a new sequence using controller.setNodeSequencer().

Added an optimization for recursive processing of a node-set: the predicate "[position() > 1]" is now recognized and handled specially, allowing pipelined execution and reducing memory requirements.

Removed getAttributeValue(Name), replaced it with getAttributeValue(String uri, String localName). This is more efficient: in many cases it removes the need to construct the Name object and then take it apart. Attributes can also be found using the integer fingerprint of the name.

The Name class is no longer used for holding expanded names, it now serves merely as a container for a couple of static methods for name validation.

NameTest and its subclasses have been reorganised. There is a new class NodeTest which is a subclass of Pattern; it performs the test on node-type and node-name supporting a node-test in XPath. This test is context-free. As well as replacing the NameTest class, it also replaces NodeTypePattern and NamedNodePattern. The NodeTest is now used on a Step, and on an Axis, replacing the previous combination of a NameTest and a node type. These tests are also used in testing which nodes are candidates for whitespace stripping.

The interface between the Step and Axis classes and the expression parser has been much simplified.

Michael H. Kay
8 December 2000