David Maus M. A.

Digital Humanities Publishing

SchXslt 1.2 »MarkupUK Edition«

David Maus, 07.06.2019 · Permalink

I am happy to announce the release of version 1.2 of SchXslt, an XSLT-based Schematron processor.

You can download this version SchXslt from its project page. Developers using SchXslt in a Java-based project can add or update the Maven artifact name.dmaus.schxslt.schxslt to version 1.2.

Heads up: The location of the XSLT stylesheets has changed. The stylesheets for the XSLT 2.0 compiler are moved from the directory src/main/resources/xslt to src/main/resources/xslt/2.0. The stylesheets of the XSLT 1.0 compiler reside in a 1.0 subdirectory accordingly.


Support for query language XSLT 3.0

SchXslt 1.2 adds support for the not-yet-standardized query language binding for XSLT 3.0. The 2.0 compiler accepts the query language identifier token xslt3 and creates a validation stylesheet with the version attribute set to 3.0.

Callback API

SchXslt 1.2 adds the templates schxslt-api:validation-stylesheet-body-top-hook and schxslt-api:validation-stylesheet-body-bottom-hook to the callback API. The former contributes to the top of the validation stylesheet body, the latter to the bottom. Both are called with the Schematron schema as sole argument.

Support for query language XSLT 1.0

SchXslt 1.2 ships with an XSLT 1.0 processor that creates XSLT 1.0 validation stylesheets. Due to the limitations of this version of XSLT the XSLT 1.0 processor has the following known limitations:

  • the include step is not performed recursivly; only includes from the primary schema document are replaced

  • the base URI of included external definitions is not preserved

  • the base URI of the primary schema document is not preserved

  • the URI of the primary source document is not included in the documents attribute of the svrl:active-pattern element

XQuery modules

SchXslt 1.2 ships with XQuery modules for BaseX and eXist. The modules are packaged as EXPath packages. Both provide the function schxslt:validate that performs Schematron validation and returns a SVRL validation report.

The packaged modules are available for download on SchXslt's page of releases.

DIY Atom feed for the #PolonskyGerman project blog

David Maus, 21.05.2019 · Permalink

October last year I was given the opportunity to work with colleagues from Digital Bodleian in #PolonskyGerman, a project that seeks to open up the medieval German manuscript collections of the Herzog August Bibliothek Wolfenbüttel and the Bodleian Libraries for research and reuse. The project is doing fine but there is one itch to scratch: The project's web page has a blog but the blog has no Atom or RSS feed. How do I know when there is something new in the blog?

Easy answer: I DIY my own feed. Way back in time I wrote a little Ruby program that solved this problem in a generic fashion. This time I settled for a project specific solution.

The setup is straightforward: I fetch the front page with Curl, pipe it through TagSoup and feed the resulting XML into an XProc pipeline. The pipeline executes an XSLT transformation and validates the result against the Atom specification. If all went well I get an Atom feed I can plug into my rss2email instance.


XProc Step by Step: Check text content after structural transformations

David Maus, 25.04.2019 · Permalink

When working with TEI documents stemming from a Word-to-XML-conversion I usually perform some structural modifications like joining tei:hi elements with identical rend attribute if they are immediate siblings. Although I consider my XSLT good enough and use XSpec to test the merging algorithm, I like to verify that no text content was lost for good measure.

One tried and true method of doing this is dumping the text nodes of the document before and after the transformation, and then compare both. Because I only want to know if the text is the same, comparing a SHA1 checksum is sufficient.

Yesterday I finally sat down and wrote an XProc 1.0 step that does exactly this.

Step 1: Calculate the checksum

The first step dmaus:content-checksum reads a document and adds the attribute dmaus:checksum containing the SHA1 checksum of the document's text content to the outermost element.

<p:declare-step type="dmaus:content-checksum" name="content-checksum">  <p:documentation>    Add @dmaus:checksum attribute containing the SHA1 checksum of the document content to the outermost element.  </p:documentation>  <p:input  port="source"/>  <p:output port="result"/>  <p:add-attribute attribute-name="dmaus:checksum" attribute-value="" match="/*"/>  <p:hash algorithm="sha" version="1" match="@dmaus:checksum">    <p:with-option name="value" select="/"/>    <p:input port="parameters">      <p:empty/>    </p:input>  </p:hash></p:declare-step>

Step 2: Compare checksums

The second step dmaus:check-text-content-match has two input ports source and other. It calculates the checksum for the documents appearing on either port and compares the two. If the checksums differ, the step raises an error. For convenient use in a pipeline the document from source is passed through to the primary output port.

<p:declare-step type="dmaus:check-text-content-match" name="check-text-content-match">  <p:documentation>    Signal an error if the content of the document appearing on the 'source' port differs from the content of the    document appearing on the 'other' port.  </p:documentation>  <p:input  port="source"/>  <p:input  port="other"/>  <p:output port="result"/>  <dmaus:content-checksum name="checksum-source">    <p:input port="source">      <p:pipe step="check-text-content-match" port="source"/>    </p:input>  </dmaus:content-checksum>  <dmaus:content-checksum name="checksum-other">    <p:input port="source">      <p:pipe step="check-text-content-match" port="other"/>    </p:input>  </dmaus:content-checksum>  <p:group>    <p:variable name="source" select="/*/@dmaus:checksum">      <p:pipe step="checksum-source" port="result"/>    </p:variable>    <p:variable name="other" select="/*/@dmaus:checksum">      <p:pipe step="checksum-other" port="result"/>    </p:variable>    <p:choose>      <p:when test="$other ne $source">        <p:error code="dmaus:content-mismatch">          <p:input port="source">            <p:inline>              <message>The content of the two documents does not match</message>            </p:inline>          </p:input>        </p:error>      </p:when>      <p:otherwise>        <p:identity>          <p:input port="source">            <p:pipe step="check-text-content-match" port="source"/>          </p:input>        </p:identity>      </p:otherwise>    </p:choose>  </p:group></p:declare-step>

Step 3: Use the step

Both steps are defined in a library I can import in my pipeline. In this contrived example I connect the step dmaus:check-text-content-match to the result of a p:xslt that implements the structural modification and the original document appearing on the pipeline's source port.

Example pipeline
<p:declare-step version="1.0" name="main"                xmlns:dmaus="tag:dmaus@dmaus.name,2019:XProc"                xmlns:p="http://www.w3.org/ns/xproc">  <p:input  port="source"/>  <p:output port="result"/>  <p:import href="library.xpl"/>  <p:xslt name="structural-modification">    <p:input port="stylesheet">      <p:document href="…"/>    </p:input>  </p:xslt>  <dmaus:check-text-content-match>    <p:input port="source">      <p:pipe step="structural-modification" port="result"/>    </p:input>    <p:input port="other">      <p:pipe step="main" port="source"/>    </p:input>  </dmaus:check-text-content-match></p:declare-step>

Release of SchXslt version 1.1

David Maus, 16.04.2019 · Permalink

I am happy to announce the release of version 1.1 of SchXslt, an XSLT-based Schematron processor.

You can download this version SchXslt from its project page. Developers using SchXslt in a Java-based project can add or update the Maven artifact name.dmaus.schxslt.schxslt to version 1.1.


Callback API

SchXslt follows the footsteps of the Skeleton implementation and lets you customize the reporting output. The callback API defines named templates that are called to create parts of the validation stylesheet that report on active patterns, fired rules, failed asserts and successful asserts.

A documentation of the API can be found in the project's wiki.

Java classes

The SchXslt Maven artifact also provides Java classes implementing Schematron validation.

Ant Task

The Ant task is updated to use version 1.1 of SchXslt.

Fixed bugs

Release of SchXslt, a new XSLT-based Schematron processor v1.0

David Maus, 22.02.2019 · Permalink

I am happy to announce that I released version 1.0 of SchXslt.

SchXslt is a conforming open-source Schematron processor implemented entirely in XSLT. It operates as a three-stage transformation process that translates a Schematron to an XSLT validation stylesheet. This stylesheet outputs a validation report in the Schematron Validation Report Language (SVRL) when applied to an instance document.

SchXslt utilizes features of XSLT 2.0 to improve the validation and is well tested against the ISO specification.

SchXslt is also available as a Maven artifact to ease integration into Java-based applications (name.dmaus.schxslt.schxslt) and as an Ant task to perform Schematron validation with SchXslt.

Both, SchXslt and SchXslt Ant, are released under the terms of the MIT license.

Status update on SchXslt

David Maus, 13.02.2019 · Permalink

I just published release candidate 5 of SchXslt, my XSLT-based Schematron processor. My goal is to publish a final version 1.0 at the end of February.

Version 1.0 will suffer from the same defect as the skeleton implementation with regards to equally named global, phase, and pattern variables. SchXslt will terminate if a global variable has the same name as a phase variable, a phase variable has the same name as a pattern variable, or a global variable has the same name as a pattern variable.

Given the fact that the skeleton has the same defect for more than 10 years, it should pose no immediate problem for people switching to SchXslt.

Today I Learned: ICC Farbprofil bei Imagekonversion mit ImageMagick erhalten

David Maus, 06.07.2018 · Permalink

Update 2019-02-12: Ohne Farbprofile keine farbechte Darstellung! Der Effekt ist bei farbigen Objekten augenfällig.

Im Zuge einer geplanten und längst überfälligen Modernisierung der Wolfenbütteler Digitalen Bibliothek spiele ich mit dem Gedanken, zukünftig auch die bei der Digitalisierung verwendeten ICC Farbprofile mit in den veröffentlichten JPEGs anzubieten. Alle anderen Metadaten aus den TIFF-Dateien sollen aber wie gehabt entfernt werden.

Nach einigem Rumwerkeln bin ich auf folgende Lösung gekommen:

ICC Profile mit ImageMagick erhalten
magick convert 00001.tif 00001.iccmagick convert 00001.tif -strip -profile 00001.icc 00001.jpg

Im ersten Schritt schreibe ich das Farbprofil in eine temporäre Datei. Im zweiten entferne ich erst alle Metadaten und ergänze dann das extrahierte Profil.

My presentation @ XML Prague 2019

David Maus, 08.02.2019 · Permalink

Update 2019-02-11: The recording is available, too.

The slides from my talk at this year's XML Prague are now available. Slides 15 and 16 discuss the conditions under which xsl:next-match based approaches can run in a single mode.

Zotero Standalone, Revisited

David Maus, 29.12.2018 · Permalink

Letztes Jahr habe ich das Setup beschrieben, mit dem ich auf meinen Systemen Zotero ohne Abhängigkeit von DBUS und GTK+ 3 installiere. Ein Jahr später habe ich die Weihnachtstage genutzt, um das Setup mit Hilfe von Vagrant und Ansible weitgehend zu automatisieren. Ein einfaches vagrant up oder vagrant provision reicht jetzt aus, um eine aktuelle Version von Zotero mit meiner Firefox-Runtime zu erstellen.

Nota bene: Ich verwende Pale Moon anstelle des offiziellen Firefox und muss daher die Namen des Installationsverzeichnisses und der ausführbaren Binärdateien anpassen.