I’ve released a first version of a pluggable command-line tool for validating the formatting and orthography of text files.

Various text projects like the apostolic-fathers have sometimes included little custom scripts I’ve written to validate the files. Is the Unicode normalised? Are there stray characters or bad line endings? Are references in a valid format?

I also had started included some Greek-specific tests in the greek-normalisation library.

But starting the greek-texts project, I decided it would be nice to have a generic framework for writing text file validators that could be used for all sorts of projects and files.

The result is text-validator. Think of it like a code linter but for your text files.

Each validator is its own Python module and, while a few basic tests are included in the library, the idea is that third parties can write their own validators and make them installable Python packages for others to use.

You install text-validator with

pip install text-validator

as well as installing any third-party plugins you want to use.

You then config your validator plugins with a TOML file like:

["text_validator.plugins.whitespace"]
CHECK_CRLF = true
CHECK_TABS = true
CHECK_TRAILING_WHITESPACE = true
CHECK_NO_EOF_NEWLINE = true

and run the command validate-text to run your suite of configured plugins on the files in your text project.

The GitHub repo is https://github.com/jtauber/text-validator and there you can also read more about How to Write a Plugin and look at the existing plugins in the Plugin Directory.

Create issues in the GitHub repository if you have particular validators you like to see or would like to contribute.

I haven’t tried it yet but I’d like to try hooking text-validator up as a test that gets run on commits and pull requests on GitHub as part of a CI process.