In the last couple of weeks I’ve done a couple of minor releases of the greek-normalisation
Python library which brings together various code I use to clean up Greek texts and normalise the forms.
The 0.2 release (which I neglected to announce) just had a small fix to the breathing_check
function to support things like ἀϊ (which failed before because it didn’t take into account the diaeresis). Soon I’ll blog about a new Python tool I’ve been building which will provide a framework for doing lots of checks like this.
The 0.3 release now installs two command-line scripts toNFC
and toNFD
to convert a file to either an NFC or NFD Unicode Normalization Form.
Once installed you can do things like:
toNFC source.txt > nfc_version.txt
The repository is https://github.com/jtauber/greek-normalisation and it’s pip-installable as greek-normalisation
.
See my previous post The Normalisation Column in MorphGNT for the original work this code came form.