A novel approach to XSLT-based Schematron validation

David Maus dmaus@dmaus.name

XML Prague 2019

Agenda

  • schematron in a nutshell
  • XSLT-based schematron validation
    • the challenge
    • three approaches
    • simplifications
  • summary

Schematron

  • a rule based validation language for XML documents
  • focus on natural-language assertions
  • validate arbitrary relationships in a document
  • key concepts designed by Rick Jelliffe in 1999
  • ISO-standardized in 2006, updated 2016

Key concepts

Pattern
structure in a source document specified in a schema by an ordered collection of rules
Rule
selects portions of the source document that contribute to a pattern
Assertion
natural-language assertion with an assertion test expressed as a boolean query

Example

<schema xmlns="http://purl.oclc.org/dsdl/schematron">
  <pattern id="agents">
    <rule context="tei:person">
      <assert test="tei:persName[@type = 'preferred']">
        A person element must provide a persName element
        with the preferred name.
      </assert>
      …
    </rule>
    …
  </pattern>
  …
</schema>

XSLT-based Schematron

  • compile schema to stylesheet that creates a validation report
  • most popular way to implement Schematron validation
  • challenge lies in the construction of the validation stylesheet
  • Match rule R1 and R2
    same node, different patterns
  • Do not match rule R3
    same node, same pattern

Match same node in different patterns

  • xsl:apply-imports
    not viable
  • different modes
    one mode per pattern
  • xsl:next-match
    XSLT 2.0

Do not match same node in same pattern

  • override templates
    only works for modes per pattern
  • match, but remove from final report
    track matched (pattern, node)
  • match, but don't validate
    pass a sequence of already matched (pattern, node)

match, don't validate

<xsl:template match="*" mode="validate" priority="2">
  <xsl:param name="pattern-node" as="element(rule)*"/>
  <xsl:if test="empty($pattern-node[@pattern = 'P1'][@node = generate-id(current())])">
    <svrl:fired-rule id="R1"/>
    …
  </xsl:if>
  <xsl:next-match>
    <xsl:with-param name="pattern-node" as="element(rule)*">
      <xsl:sequence select="$pattern-node"/>
      <rule node="{generate-id()}" pattern="P1"/>
    </xsl:with-param>
  </xsl:next-match>
</xsl:template>
same node, different patternssame node, same pattern
traditional one mode per pattern override templates
ex-post single mode
xsl:next-match
match, validate, remove from report
? single mode
xsl:next-match
match, don't validate

Really just a single mode?

  • a pattern may specify variables that are used in its rules and assertions
  • a pattern may apply to subordinate documents instead the primary document
  • a pattern may be imported from an external definition with a different base URI

Pattern groups

  • identify patterns whose rules can run in the same mode
  • use a grouping function that concatenates
    1. the base URI of a pattern
    2. the value of the @documents properties
    3. the generated id of the first variable binding element
  • create one mode per pattern group

Summary

  • XSLT is a natural fit to Schematron validation
  • there are three approaches to XSLT-based Schematron validation
    traditional, ex-post rule match selection, ?
  • use of xsl:next-match is limited to XSLT 2 but can reduce the number of times a document is processed

SchXslt
http://github.com/dmj/schxslt