{short description of image}

XQL FAQ

Jonathan Robie, R&D Fellow, Software AG
jonathan.robie@softwareag.com
Originally posted: 26 Mar 1999
Last revised: 26 Mar 1999

Welcome to the XQL FAQ - a Frequently Asked Questions list for the XML Query Language. This FAQ and the associated XQL Mailing List are intended to point you to resources that will help you learn, use, or implement XQL.

XQL is a query language that uses XML as a data model, and it is very similar to XSL Patterns. XQL expressions are easily parsed, easy to type, and can be used in a variety of software environments - as part of a URL, in XML or HTML attributes, in programming language strings, etc.

XQL has already been implemented in web browsers, document repositories, XML middleware, Perl libraries, and command-line utilities. If you are looking for an XML query language that is simple, powerful, easily implemented, and available now, XQL is probably your best bet.

How do I learn about XQL?

The best way to learn about XQL depends on your learning style. The following table lists different ways to begin learning about XQL:

XQL Tutorial

A quick introduction to XQL based on concrete examples.

XQL Proposal

This is the proposal to the W3C QL98 Workshop, jointly authored by me, Joe Lapp of webMethods, and David Schach of Microsoft. Feature-oriented rather than design-oriented, and clearly an example of writing by committee. Slightly out of date, but there are only two things that need to be changed:

  1. $op$ syntax is now deprecated - expressions like "a $and$ b" are now stated without '$', eg "a and b"
  2. $not$ has been deprecated. Because of character constraints, not() has been introduced as a way of negating conditions, eg "not(a and b)".
The Design of XQL

This is excerpted from the XQL proposal that I had written, and provides more complete discussion of the basic design than the proposal listed above. Each example is shown with sample input and output for the query

.
Formal Semantics of XSL Patterns

Phil Wadler's formal model of pattern matching in XSL, which is almost identical to pattern matching in XQL.

XQL Mailing List

An interactive forum for discussion of XQL, including implementation strategies, extensions, clarification of semantics, etc

XQL BNF

Joe Lapp's annotated BNF for XQL - this will save you a lot of time if you want to write a parser. Slightly out of date - see my notes for the XQL Proposal, above.

If you are a hands-on learner, you will probably want to get one of the implementations listed in the next section and play around with the language.

Who has implemented XQL?

Here are implementations that I am aware of. If you want your XQL implementation listed here, please send email to me at jonathan.robie@sagus.com. I have not tested most of these implementations; the descriptions are lightly edited versions of texts found on the web sites that describe them. Implementations are listed in alphabetical order.

Datachannel

"DataChannel RIO is an XML-enabled solution designed to build a dynamic two-way corporate portal with input (i.e. publishing) and output (i.e. retrieval) capabilities that make Intranets and Extranets easier to use, integrate, manage, and support."

Fatdog Software's XQEngine Fatdog Software's XML Query Engine (XQEngine for short) is a full-text search engine component for XML. It lets you search small to medium-size collections of XML documents for boolean combinations of keywords, much as web-based search engines let you do for HTML.
GMD The GMD-IPSI XQL engine is a Java based storage and query application for large XML documents. The functionality may be accessed via command line invocation or the Java API. The engine consists of two main parts: (1) A persistent implementation of the W3C-DOM, and (2) A full implementation of the XQL language. The persistent DOM implements the W3C-DOM interfaces on indexed, binary XML files.
Microsoft Internet Explorer 5.0

Microsoft's Internet Explorer 5.0 browser offers "extensive support for the latest standards-based web technologies, including Dynamic HTML, CSS, CSS-P, XML, XSL, XQL, and the W3C DOM"

To see how to use XQL in IE5, see the documentation on selectNodes().

For information on the XQL patterns they support, see their documentation on XSL Pattern Syntax.

The "extensions to XSL Patterns" mentioned in the above URL are, in fact, XQL, as is obvious by the reference to the QL98 paper.

ObjectStore

ObjectStore's eXcelon "is a high-performance, highly scalable data server that caches and serves all information to enterprise applications and Web servers as XML. eXcelon can be used as an application cache for existing data sources, or as a complete data management system for new XML-based applications."

Software AG

Software AG's Tamino "is the first information server to store XML information without converting it into other data structures. Tamino delivers XML information with exceptional performance for transaction-oriented applications within enterprises or on the Web. Tamino can also integrate data from existing databases into XML structures."

webMethods

webMethod's B2B "facilitates inter-enterprise integration between ERP applications, Web sites and legacy data sources. webMethods B2B 2.1 offers guaranteed delivery and enhanced continuous operation capabilities to support high-volume, mission-critical applications."

XML::XQL

XML::XQL, by Eduard Derksen (Enno), "is a Perl extension that allows you to perform XQL queries on XML object trees. Currently only the XML::DOM module is supported, but other implementations, like XML::Grove, may soon follow." Open source, freely downloadable.

Xtract

"Xtract is a command-line tool for searching XML documents. Just as `grep' returns lines which match your regular expression, so Xtract returns all those sub-trees from XML documents which match a query pattern. The query expression language is simple but powerful, and is based loosely on XQL, the recently proposed XML Query Language. An introduction to the Xtract query pattern language, together with the full Xtract grammar is in this tutorial." Although the author says it is "loosely" based on XQL, the discrepancy is slight: "The major difference from XQL is that a query must return a sequence of XML contents (either elements or text inside an element): it cannot for instance return just an attribute value." Open source, freely downloadable.

XQL Open Source Project

There is an Open Source XML project, coordinated by Ed Howland <Ed@dega.com>. Information on this project may be found at http://ed.dega.com/pub/xml/xql/

Is XQL a moving target?

The first draft of XQL was written in February, 1998, after several months of development. In order to allow people to implement XQL without having to hit a moving target, I am freezing the version of XQL that was presented at the W3C QL 98 Workshop as XQL Level 0. However, the version discussed in the joint paper submitted with webMethods and Microsoft was already slightly out of date at the time of the workshop, and should be revised as follows:

  1. $op$ syntax has been deprecated - expressions like "a $and$ b" are now stated without '$', eg "a and b"
  2. $not$ has been deprecated. Because of character constraints, not() has been introduced as a way of negating conditions, eg "not(a and b)".

In addition, the proposal contains more than I think should really be required, so I am defining a minimum implementation set will be drawn from the "XQL Proposal" listed above. I also plan to post test suites. If people have test suites they would like to share, please let me know about them.

The basic terms and operators of XQL are required:

Feature Example
Element name author
Wildcard as element name *
Attribute name @id
Wildcard as attribute name @*
Equality first-name='Jonathan'
parent/child author/first-name
ancestor/descendant invoice//product
subscripts a[0]
a[1,3-5, -1]
filters author[first-name='bob']
intersection a[b] intersect a[0]
union a union b
conjunction a and b
disjunction a or b
grouping (a union b) intersect *[0]
negation not(a)

There will also be well-defined extensions. It is not clear to me which of the following should be required, and which should be optional extensions:

The XQL Proposal submitted with webMethods and Microsoft includes a few features that do not fit comfortably into the XQL evaluation model, and which must be optional.

If participants on the XQL Mailing List are able to agree on a syntax for joins, I would like to add that as an optional extension.

Is XQL a W3C standard?

No.

Granted, XQL is a three letter acronym that begins with X and ends with L, and the proposal was submitted to a W3C Query Language Workshop, but that does not make it a W3C standard. It is very likely that the W3C will be hosting a Query Language Activity, and we would be very interested in participating in this activity, but there is no way to guarantee how much influence XQL or any other language will have on that activity. Given the track record of W3C standards, there probably won't be a W3C recommendation for a query language for at least a year, and one of the big reasons I have frozen the current version of XQL is to give people something that they can implement while waiting for a W3C query language.

People ask me if XQL will be submitted as a proposal to the W3C. That was my original intention, and I think it is a good idea. Please don't ask me whether I am actually submitting it. The W3C has confidentiality policies that govern submissions, and if anything were in the works, I could not tell you about it.

Does XQL do joins / transformations?

The people who claim XQL is too simple complain about the lack of joins and transformations, which David Maier considered important parts of an XML Query Language in his paper, Database Desiderata for an XML Query Language.

Microsoft's Adam Bosworth, in an article on the W3C Query Language Workshop by Lisa Rein , stated that "I think the XQL folks are trying to generalize path expressions to be a full query language, and I think this is a mistake. Query languages need other constructs than those that describe interesting elements to process. They need to say what to do with them (e.g. order them, extract important elements from them, sum them, ...). I'm a huge fan of rich path expressions. I don't consider them a query language, just a useful part of one." Actually, XQL is significantly more than rich path expressions, and I think it would be very useful to have a rich query language that can do joins, transformations, ordering, aggregate functions, etc. However, I believe there is also a need for a simple query language that is easily implemented in a wide variety of settings, and which scales well for efficient access to very large repositories - this is what XQL was designed for. Since the term "query language" is not well defined, I do not wish to quibble about what constitutes a full query language and what constitutes "a useful part of" a query language.

People who are interested in more full-featured XML query languages that include joins and transformations should consider the following languages:

Joins: XQL allows semi-joins, but not full joins. For instance, it is possible to ask for all books whose author is the same as the author of "Moby Dick":

book[author=//book[title='Moby Dick']/author]

Peter Fankhauser of the GMD in Darmstadt has suggested a means of extending XQL to support joins. I intend to add a discussion of that here in the near future. I would be open to adding an extension for joins as an optional feature in XQL, provided we can achieve consensus on this in the XQL Mailing List.

Transformations are a more complicated issue. XSL can already do transformations, and one reasonable approach would be to do queries in XQL, and transformations in XSL. This is the approach taken in the paper "Querying and Transforming XML" .

Shouldn't XQL be simpler?

XQL has been criticized for being too simple, and it has also been criticized for being too complex. Marcelo of Cantos, who is associated with the Structured Information Manager, has said that XQL "offers a good compromise between expressivity and simplicity" (XML Dev List, Fri, 26 May 1999).

The minimal implementation set of XQL is intended to be simple, easy to implement, and easily optimizable. Nevertheless, some minimalists have argued that even the minimal implementation set of XQL is too complex. Tim Bray has written a useful paper that indicates what may be the smallest possible useful query language; the paper is called "Element Sets: A Minimal Basis for an XML Query Engine".

When it comes down to it, the set of features in a query language is determined by a series of cost/benefit decisions, and what you choose to include depends on what is important to you. XQL was designed using a large sandbox of documents, including traditional XML and SGML documents, HTML documents, and data-oriented XML documents, and the features found in XQL reflect the queries that we found useful for the documents we studied.

How would you implement XQL?

The best implementation depends a great deal on what you are doing, eg. whether you are dealing with single documents in RAM or large document repositories. Most implementations that work on single documents seem to be basically navigational, implementing the basic operations of XQL in a very straightforward manner.

A query language that will be used for large-scale repositories must be amenable to query optimization and efficient query processing, and it must be possible to build appropriate indexes on the data to support such queries. Since some of the better articles on these topics are somewhat difficult to find via a typical literature search, we felt it would be helpful to introduce them here. A more extensive bibliography may be found in Observations on Structured Query Languages.

In [Macleod1989], Macleod describes a query language with capabilities similar to XQL; he also describes an index structure to support it. [Dao1997] describes a different approach to indexing structured data, and provides more information on query processing and optimization. Navarro and Baeza-Yates [NB1997] focus on cababilities needed for a query language rather than a query language itself, and they focus more on querying processing than on index files. Their literature review is also extremely helpful. [Clarke1995] is a particularly helpful article that provides an algebra for structured text search, defines higher-level operators based on this algebra, and shows how a query optimizer can be designed based on the algebra. It builds on the approach of [Burkowski1992]. Another very useful paper that uses a similar approach is [JK].

[Burkowski1992] F.J. Burkowski. An algebra for hierarchically organized text-dominated databases. Information Processing & Management, 28(3):333--348, 1992.

[Clarke1995] C.L.A. Clarke, G.V. Cormack, and F.J. Burkowski. An algebra for structured text search and a framework for its implementation. The Computer Journal, 38(1):43--56, 1995.

[Dao1997] T.Dao, R.Sacks-Davis, and J.A. Thom. An indexing scheme for structured documents and its implementation. In DASFAA '97, 1997.

[JK] Jani Jaakkola and Pekka Kilpeläinen Nested Text Region Algebra as yet unpublished, draft version June, 1998

[Macleod1991] I.A. Macleod. A query language for retrieving information from hierarchic text structures. The Computer Journal, 34(3):254--264, June 1991.

[NB1997] G.Navarro and R.Baeza-Yates. Proximal nodes: a model to query document databases by contents and structure. ACM TOIS, 1997.