In the last couple of weeks I’ve done a couple of minor releases of the greek-normalisation Python library which brings together various code I use to clean up Greek texts and normalise the forms.

The 0.2 release (which I neglected to announce) just had a small fix to the breathing_check function to support things like ἀϊ (which failed before because it didn’t take into account the diaeresis). Soon I’ll blog about a new Python tool I’ve been building which will provide a framework for doing lots of checks like this.

The 0.3 release now installs two command-line scripts toNFC and toNFD to convert a file to either an NFC or NFD Unicode Normalization Form.

Once installed you can do things like:

toNFC source.txt > nfc_version.txt

The repository is https://github.com/jtauber/greek-normalisation and it’s pip-installable as greek-normalisation.

See my previous post The Normalisation Column in MorphGNT for the original work this code came form.