I've been excited for a long time about the coming of XProc, an XML pipelining language. A lot of my work involves running XML documents through series of transformations, which often means hacking up batch files or, for more permanent pipelines, writing an Ant build. Both of these methods are clunky at best and unmaintainable nightmares at worst.
Norm Walsh has been hard at work on his XProc implementation, XML Calabash. I've been experimenting with it, and while there are still issues to work out (it's alpha, after all), I'm already doing useful work with it. But since XProc is so new, there's almost no documentation available. There's the spec, of course, but that's aimed more towards implementers and less toward end users. Other than that, the only resources I've found are the Calabash documentation page, xproc-dev mailing list, Norm Walsh's 2007 presentation, and a few blog posts.
So this is the first in hopefully a series of posts aimed to create at least some initial public documentation about how to actually get things done using XProc. I'll post things as I learn. But caveat emptor - these shouldn't be confused for "best practices."
Before you can do anything, you have to get and install Calabash. Once you download it, place it somewhere in your system path. (Mine's in c:\home\scripts\calabash\.)
The next step is to create a batch file to run Calabash, runcalabash.bat:
@echo off set CALABASH_HOME="%SCRIPTS%\calabash\calabash.jar" set SAXON_HOME=%SCRIPTS%\saxon set APTCUSTOM=%ProgramFiles%\Arbortext5.3\Editor\custom set RUN_CALABASH=java -Xbootclasspath/p:"%APTCUSTOM%\ classes\resolver.jar";"%APTCUSTOM%\scripts\Framework"; "%SAXON_HOME%\saxon9.jar";"%SAXON_HOME%\saxon9-s9api.jar"; %CALABASH_HOME% com.xmlcalabash.drivers.Main -E org.apache.xml.resolver.tools.CatalogResolver -U org.apache.xml.resolver.tools.CatalogResolver rem Slurp the command line arguments. set CMD_LINE_ARGS=%1 if ""%1""=="""" goto doneStart shift :setupArgs if ""%1""=="""" goto doneStart set CMD_LINE_ARGS=%CMD_LINE_ARGS% %1 shift goto setupArgs :doneStart %RUN_CALABASH% %CMD_LINE_ARGS%
You'll need to adjust the first few lines to match the paths in your system. CALABASH_HOME points to the location of calabash.jar. SAXON_HOME points to the Saxon jars, which Calabash requires to run. APTCUSTOM stores the path to my installation of Arbortext Editor, which contains the DTDs of the document types I work with and their XML catalog. If you don't need a catalog resolver, then you can omit this.
RUN_CALABASH pulls these together, creating the command that launches Calabash with a classpath containing all the java programs we need. I'm passing it the -E and -U options to activate the URI and entity resolvers, but again, you can leave that out if you want.
The next block compensates for the fact that Windows batch files can only take 10 parameters at once. Lame. (This code comes from ant.bat distributed with Ant).
Once that is done, you're ready to run your first pipeline (The following discussion is adapted from Norm's). He conveniently provides one with Calabash, so we'll start there. Here's the pipeline:
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"> <p:input port="source"> <p:inline> <doc>Congratulations! You've run your first pipeline!</doc> </p:inline> </p:input> <p:output port="result"/> <p:identity/> </p:declare-step>
It doesn't do much - just echoes the input. To run it from the command line, type:
That's it. You'll get back:
<doc xmlns:p="http://www.w3.org/ns/xproc"> Congratulations! You've run your first pipeline! </doc>
If you want to direct the output to a file instead of to the command line, you can use the -o flag:
runcalabash -o "result=out.xml" pipe.xpl
Note the quotation marks. They are necessary because Windows batch files drop "=" when processing parameters. Again, lame.
If you want to change the input to a file instead of the inline document, you can use the -i flag:
runcalabash -i "source=in.xml" -o "result=out.xml" pipe.xpl
Finally, if you want to explore on your own, execute runcalabash.bat with no options to get a usage summary. Next time: an example of a pipeline that's actually useful.