Words in Boxes

Nouns, verbs, and occasionally adjectives.

Wednesday, February 11, 2009

Code Review: Commenting XSLT Regular Expressions

You learn a lot from reading other people's code.  For example, the other day I ran into a clever trick in Jeni Tennison's XSpec code for commenting regular expressions in XSLT:

<xsl:variable name="attribute-regex" as="xs:string">
  <xsl:value-of>
    \s+
    (\S+)        <!-- 1: the name of the attribute -->
    \s*
    =
    \s*
    (       <!-- 2: the value of the attribute (with quotes) -->
      "([^"]*)"  <!-- 3: the value without quotes -->
      |
      '([^']*)'  <!-- 4: also the value without quotes -->
    )
  </xsl:value-of>
</xsl:variable>

The trick is the <xsl:value-of /> instruction, which casts its contents as a string.  An especially nice thing about this method is that you can refer to other variables within the declaration:

   (\S+)    <!-- 12: the name of the element being opened -->
   (        <!-- 13: the attributes of the element -->
     (      <!-- 14: wrapper for the attribute regex -->
       <xsl:value-of select="$attribute-regex" />  <!-- 15-18 attribute stuff -->
     )*
   )

Of course, to ignore all the extra white space in a regex constructed this way, you'll need to set the "x" flag in any <xsl:analyze-string />, replace(), or matches() that refers to it.

I'm James Sulak, a software developer in Houston, Texas. My work revolves around publishing XML content in print and on the web.

You can also find me on Twitter, or if you're curious, on my old-fashioned home page.