David Maus M. A.

Digital Humanities Publishing

Release of SchXslt version 1.3 »TEI Conference Edition«

David Maus, 17.09.2019 · Permalink

I am pleased to announce the release of SchXslt 1.3, a modern XSLT-based Schematron processor.

You can download this version SchXslt from its project page. This version of SchXslt als marks the release of SchXslt CLI, a Schematron commandline application written in Java.

Java users beware: The Java classes now live in a separate repository and as a separate Maven artifact. They follow their own versioning scheme and have a declared runtime dependency on SchXslt 1.2 or higher. The two artifacts are name.dmaus.schxslt.schxslt (the XSLT stylesheets) and name.dmaus.schxslt.java (the Java classes).

Please update your build files accordingly and sorry for the inconvenience!


Fixed bugs

Today I Learned: Static code analysis for PHP projects

David Maus, 28.08.2019 · Permalink

Static code analysis for PHP project has gone a long way. Phan and PHPStan are two tools that use static code analysis to detect errors in my PHP project files.

Both are installed via Composer:

Installing Phan & PHPStan
dmaus@carbon ~/projects/php-project % composer require --dev phan/phan...dmaus@carbon ~/projects/php-project % vendor/bin/phan --init --init-level=1...dmaus@carbon ~/projects/php-project % composer require --dev phpstan/phpstan

Neues TLS-Zertifikat auf Devuan installieren

David Maus, 09.08.2019 · Permalink

Kürzlich habe ich fortext.net dem Feed-Aggregator Planet Digital Humanities hinzugefügt und wurde beim Aktualisieren des Aggregators von einer Fehlermeldung überrascht.

Etwas irreführende Fehlermeldung vom Planeten
dmaus@dmaus:~/planet-dh$ /usr/bin/planet config.iniERROR:planet.runner:Error 500 while updating feed https://fortext.net/rss.xml

Die Fehlerangabe Error 500 ließ zunächst ein Problem auf der Seite von fortext vermuten, da 500 der HTTP Fehlercode für serverseitige Fehler ist. Beim Versuch, den Fehler mit curl zu debuggen bin ich dann aber auf ein fehlendes TLS-Zertifikat gestoßen.

Dem Problem mit curl auf der Spur
dmaus@dmaus:~/planet-dh$ curl https://fortext.netcurl: (60) SSL certificate problem: unable to get local issuer certificateMore details here: https://curl.haxx.se/docs/sslcerts.htmlcurl performs SSL certificate verification by default, using a "bundle" of Certificate Authority (CA) public keys (CA certs). If the default bundle file isn't adequate, you can specify an alternate file using the --cacert option.If this HTTPS server uses a certificate signed by a CA represented in the bundle, the certificate verification probably failed due to a problem with the certificate (it might be expired, or the name might not match the domain name in the URL).If you'd like to turn off curl's verification of the certificate, use the -k (or --insecure) option.

Nach etwas Recherche habe ich herausgefunden, dass das fehlende Zertifikat Encryption Everywhere DV TLS CA - G1 auf der Seite von DigiCert heruntergeladen werden kann. Gesagt, getan. Dann habe ich das Zertifikat wie folgt installiert.

Zertifikat umwandeln und installieren
dmaus@dmaus:~/planet-dh$ mv EncryptionEverywhereDVTLSCA-G1.crt EncryptionEverywhereDVTLSCA-G1.derdmaus@dmaus:~/planet-dh$ openssl x509 -in EncryptionEverywhereDVTLSCA-G1.crt -inform DER -out rEncryptionEverywhereDVTLSCA-G1.crtdmaus@dmaus:~/planet-dh$ sudo cp EncryptionEverywhereDVTLSCA-G1.crt /usr/local/share/ca-certificates/dmaus@dmaus:~/planet-dh$ sudo update-ca-certificatesdmaus@dmaus:~/planet-dh$ rm EncryptionEverywhereDVTLSCA-G1.*

SchXslt 1.2.1 maintenance release

David Maus, 03.07.2019 · Permalink

This is a maintenance release of SchXslt, an XSLT-based Schematron processor.

You can download this version SchXslt from its project page. Developers using SchXslt in a Java-based project can add or update the Maven artifact name.dmaus.schxslt.schxslt to version 1.2.1.

Fixed bugs

SchXslt 1.2 »MarkupUK Edition«

David Maus, 07.06.2019 · Permalink

I am happy to announce the release of version 1.2 of SchXslt, an XSLT-based Schematron processor.

You can download this version SchXslt from its project page. Developers using SchXslt in a Java-based project can add or update the Maven artifact name.dmaus.schxslt.schxslt to version 1.2.

Heads up: The location of the XSLT stylesheets has changed. The stylesheets for the XSLT 2.0 compiler are moved from the directory src/main/resources/xslt to src/main/resources/xslt/2.0. The stylesheets of the XSLT 1.0 compiler reside in a 1.0 subdirectory accordingly.


Support for query language XSLT 3.0

SchXslt 1.2 adds support for the not-yet-standardized query language binding for XSLT 3.0. The 2.0 compiler accepts the query language identifier token xslt3 and creates a validation stylesheet with the version attribute set to 3.0.

Callback API

SchXslt 1.2 adds the templates schxslt-api:validation-stylesheet-body-top-hook and schxslt-api:validation-stylesheet-body-bottom-hook to the callback API. The former contributes to the top of the validation stylesheet body, the latter to the bottom. Both are called with the Schematron schema as sole argument.

Support for query language XSLT 1.0

SchXslt 1.2 ships with an XSLT 1.0 processor that creates XSLT 1.0 validation stylesheets. Due to the limitations of this version of XSLT the XSLT 1.0 processor has the following known limitations:

  • the include step is not performed recursivly; only includes from the primary schema document are replaced

  • the base URI of included external definitions is not preserved

  • the base URI of the primary schema document is not preserved

  • the URI of the primary source document is not included in the documents attribute of the svrl:active-pattern element

XQuery modules

SchXslt 1.2 ships with XQuery modules for BaseX and eXist. The modules are packaged as EXPath packages. Both provide the function schxslt:validate that performs Schematron validation and returns a SVRL validation report.

The packaged modules are available for download on SchXslt's page of releases.

DIY Atom feed for the #PolonskyGerman project blog

David Maus, 21.05.2019 · Permalink

October last year I was given the opportunity to work with colleagues from Digital Bodleian in #PolonskyGerman, a project that seeks to open up the medieval German manuscript collections of the Herzog August Bibliothek Wolfenbüttel and the Bodleian Libraries for research and reuse. The project is doing fine but there is one itch to scratch: The project's web page has a blog but the blog has no Atom or RSS feed. How do I know when there is something new in the blog?

Easy answer: I DIY my own feed. Way back in time I wrote a little Ruby program that solved this problem in a generic fashion. This time I settled for a project specific solution.

The setup is straightforward: I fetch the front page with Curl, pipe it through TagSoup and feed the resulting XML into an XProc pipeline. The pipeline executes an XSLT transformation and validates the result against the Atom specification. If all went well I get an Atom feed I can plug into my rss2email instance.


XProc Step by Step: Check text content after structural transformations

David Maus, 25.04.2019 · Permalink

When working with TEI documents stemming from a Word-to-XML-conversion I usually perform some structural modifications like joining tei:hi elements with identical rend attribute if they are immediate siblings. Although I consider my XSLT good enough and use XSpec to test the merging algorithm, I like to verify that no text content was lost for good measure.

One tried and true method of doing this is dumping the text nodes of the document before and after the transformation, and then compare both. Because I only want to know if the text is the same, comparing a SHA1 checksum is sufficient.

Yesterday I finally sat down and wrote an XProc 1.0 step that does exactly this.

Step 1: Calculate the checksum

The first step dmaus:content-checksum reads a document and adds the attribute dmaus:checksum containing the SHA1 checksum of the document's text content to the outermost element.

<p:declare-step type="dmaus:content-checksum" name="content-checksum">  <p:documentation>    Add @dmaus:checksum attribute containing the SHA1 checksum of the document content to the outermost element.  </p:documentation>  <p:input  port="source"/>  <p:output port="result"/>  <p:add-attribute attribute-name="dmaus:checksum" attribute-value="" match="/*"/>  <p:hash algorithm="sha" version="1" match="@dmaus:checksum">    <p:with-option name="value" select="/"/>    <p:input port="parameters">      <p:empty/>    </p:input>  </p:hash></p:declare-step>

Step 2: Compare checksums

The second step dmaus:check-text-content-match has two input ports source and other. It calculates the checksum for the documents appearing on either port and compares the two. If the checksums differ, the step raises an error. For convenient use in a pipeline the document from source is passed through to the primary output port.

<p:declare-step type="dmaus:check-text-content-match" name="check-text-content-match">  <p:documentation>    Signal an error if the content of the document appearing on the 'source' port differs from the content of the    document appearing on the 'other' port.  </p:documentation>  <p:input  port="source"/>  <p:input  port="other"/>  <p:output port="result"/>  <dmaus:content-checksum name="checksum-source">    <p:input port="source">      <p:pipe step="check-text-content-match" port="source"/>    </p:input>  </dmaus:content-checksum>  <dmaus:content-checksum name="checksum-other">    <p:input port="source">      <p:pipe step="check-text-content-match" port="other"/>    </p:input>  </dmaus:content-checksum>  <p:group>    <p:variable name="source" select="/*/@dmaus:checksum">      <p:pipe step="checksum-source" port="result"/>    </p:variable>    <p:variable name="other" select="/*/@dmaus:checksum">      <p:pipe step="checksum-other" port="result"/>    </p:variable>    <p:choose>      <p:when test="$other ne $source">        <p:error code="dmaus:content-mismatch">          <p:input port="source">            <p:inline>              <message>The content of the two documents does not match</message>            </p:inline>          </p:input>        </p:error>      </p:when>      <p:otherwise>        <p:identity>          <p:input port="source">            <p:pipe step="check-text-content-match" port="source"/>          </p:input>        </p:identity>      </p:otherwise>    </p:choose>  </p:group></p:declare-step>

Step 3: Use the step

Both steps are defined in a library I can import in my pipeline. In this contrived example I connect the step dmaus:check-text-content-match to the result of a p:xslt that implements the structural modification and the original document appearing on the pipeline's source port.

Example pipeline
<p:declare-step version="1.0" name="main"                xmlns:dmaus="tag:dmaus@dmaus.name,2019:XProc"                xmlns:p="http://www.w3.org/ns/xproc">  <p:input  port="source"/>  <p:output port="result"/>  <p:import href="library.xpl"/>  <p:xslt name="structural-modification">    <p:input port="stylesheet">      <p:document href="…"/>    </p:input>  </p:xslt>  <dmaus:check-text-content-match>    <p:input port="source">      <p:pipe step="structural-modification" port="result"/>    </p:input>    <p:input port="other">      <p:pipe step="main" port="source"/>    </p:input>  </dmaus:check-text-content-match></p:declare-step>

Release of SchXslt version 1.1

David Maus, 16.04.2019 · Permalink

I am happy to announce the release of version 1.1 of SchXslt, an XSLT-based Schematron processor.

You can download this version SchXslt from its project page. Developers using SchXslt in a Java-based project can add or update the Maven artifact name.dmaus.schxslt.schxslt to version 1.1.


Callback API

SchXslt follows the footsteps of the Skeleton implementation and lets you customize the reporting output. The callback API defines named templates that are called to create parts of the validation stylesheet that report on active patterns, fired rules, failed asserts and successful asserts.

A documentation of the API can be found in the project's wiki.

Java classes

The SchXslt Maven artifact also provides Java classes implementing Schematron validation.

Ant Task

The Ant task is updated to use version 1.1 of SchXslt.

Fixed bugs

Release of SchXslt, a new XSLT-based Schematron processor v1.0

David Maus, 22.02.2019 · Permalink

I am happy to announce that I released version 1.0 of SchXslt.

SchXslt is a conforming open-source Schematron processor implemented entirely in XSLT. It operates as a three-stage transformation process that translates a Schematron to an XSLT validation stylesheet. This stylesheet outputs a validation report in the Schematron Validation Report Language (SVRL) when applied to an instance document.

SchXslt utilizes features of XSLT 2.0 to improve the validation and is well tested against the ISO specification.

SchXslt is also available as a Maven artifact to ease integration into Java-based applications (name.dmaus.schxslt.schxslt) and as an Ant task to perform Schematron validation with SchXslt.

Both, SchXslt and SchXslt Ant, are released under the terms of the MIT license.