Words in Boxes

Nouns, verbs, and occasionally adjectives.

Wednesday, November 12, 2008

Getting Started With XProc and Calabash on Windows

I've been excited for a long time about the coming of XProc, an XML pipelining language. A lot of my work involves running XML documents through series of transformations, which often means hacking up batch files or, for more permanent pipelines, writing an Ant build. Both of these methods are clunky at best and unmaintainable nightmares at worst.

Norm Walsh has been hard at work on his XProc implementation, XML Calabash. I've been experimenting with it, and while there are still issues to work out (it's alpha, after all), I'm already doing useful work with it. But since XProc is so new, there's almost no documentation available. There's the spec, of course, but that's aimed more towards implementers and less toward end users. Other than that, the only resources I've found are the Calabash documentation page, xproc-dev mailing list, Norm Walsh's 2007 presentation, and a few blog posts.

So this is the first in hopefully a series of posts aimed to create at least some initial public documentation about how to actually get things done using XProc. I'll post things as I learn. But caveat emptor - these shouldn't be confused for "best practices."

Before you can do anything, you have to get and install Calabash. Once you download it, place it somewhere in your system path. (Mine's in c:\home\scripts\calabash\.)

The next step is to create a batch file to run Calabash, runcalabash.bat:

@echo off
set CALABASH_HOME="%SCRIPTS%\calabash\calabash.jar"
set SAXON_HOME=%SCRIPTS%\saxon
set APTCUSTOM=%ProgramFiles%\Arbortext5.3\Editor\custom

set RUN_CALABASH=java -Xbootclasspath/p:"%APTCUSTOM%\
classes\resolver.jar";"%APTCUSTOM%\scripts\Framework";
"%SAXON_HOME%\saxon9.jar";"%SAXON_HOME%\saxon9-s9api.jar";
%CALABASH_HOME% com.xmlcalabash.drivers.Main -E
org.apache.xml.resolver.tools.CatalogResolver -U
org.apache.xml.resolver.tools.CatalogResolver

rem Slurp the command line arguments.
set CMD_LINE_ARGS=%1
if ""%1""=="""" goto doneStart
shift
:setupArgs
if ""%1""=="""" goto doneStart
set CMD_LINE_ARGS=%CMD_LINE_ARGS% %1
shift
goto setupArgs

:doneStart
%RUN_CALABASH% %CMD_LINE_ARGS%

You'll need to adjust the first few lines to match the paths in your system. CALABASH_HOME points to the location of calabash.jar. SAXON_HOME points to the Saxon jars, which Calabash requires to run. APTCUSTOM stores the path to my installation of Arbortext Editor, which contains the DTDs of the document types I work with and their XML catalog. If you don't need a catalog resolver, then you can omit this.

RUN_CALABASH pulls these together, creating the command that launches Calabash with a classpath containing all the java programs we need. I'm passing it the -E and -U options to activate the URI and entity resolvers, but again, you can leave that out if you want.

The next block compensates for the fact that Windows batch files can only take 10 parameters at once. Lame. (This code comes from ant.bat distributed with Ant).

Once that is done, you're ready to run your first pipeline (The following discussion is adapted from Norm's). He conveniently provides one with Calabash, so we'll start there. Here's the pipeline:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc">
<p:input port="source">
<p:inline>
 <doc>Congratulations! You've run your first pipeline!</doc>
</p:inline>
</p:input>
<p:output port="result"/>
<p:identity/>
</p:declare-step>

It doesn't do much - just echoes the input. To run it from the command line, type:

runcalabash pipe.xpl

That's it. You'll get back:

<doc xmlns:p="http://www.w3.org/ns/xproc">
Congratulations! You've run your first pipeline!
</doc>

If you want to direct the output to a file instead of to the command line, you can use the -o flag:

runcalabash -o "result=out.xml" pipe.xpl

Note the quotation marks. They are necessary because Windows batch files drop "=" when processing parameters. Again, lame.

If you want to change the input to a file instead of the inline document, you can use the -i flag:

runcalabash -i "source=in.xml" -o "result=out.xml" pipe.xpl

Finally, if you want to explore on your own, execute runcalabash.bat with no options to get a usage summary. Next time: an example of a pipeline that's actually useful.

I'm James Sulak, a software developer in Houston, Texas.

You can also find me on Twitter, or if you're curious, on my old-fashioned home page. If you want to contact me directly, you can e-mail comments@wordsinboxes.com.