In the last couple of weeks I’ve done a couple of minor releases of the
greek-normalisation Python library which brings together various code I use to clean up Greek texts and normalise the forms.
The 0.2 release (which I neglected to announce) just had a small fix to the
breathing_check function to support things like ἀϊ (which failed before because it didn’t take into account the diaeresis). Soon I’ll blog about a new Python tool I’ve been building which will provide a framework for doing lots of checks like this.
The 0.3 release now installs two command-line scripts
toNFD to convert a file to either an NFC or NFD Unicode Normalization Form.
Once installed you can do things like:
toNFC source.txt > nfc_version.txt
The repository is https://github.com/jtauber/greek-normalisation and it’s pip-installable as
See my previous post The Normalisation Column in MorphGNT for the original work this code came form.