For years I’ve had Python code for normalising Greek forms, checking for stray characters, etc. I finally got around to consolidating them in a library.
It has a few little utilities like:
>>> strip_last_accent_if_two('γυναῖκά') 'γυναῖκα' >>> grave_to_acute('τὴν') 'τήν' >>> breathing_check('ἀι') False
but the core of it is the normalisation of tokens with knowledge of clitics and elision.
>>> normalise('τὴν') ('τήν', ['grave']) >>> normalise('γυναῖκά') ('γυναῖκα', ['extra']) >>> normalise('σου') ('σου', ['enclitic']) >>> normalise('Τὴν') ('τήν', ['grave', 'capitalisation']) >>> normalise('ὁ') ('ὁ', ['proclitic']) >>> normalise('μετ’') ('μετά', ['elision']) >>> normalise('οὐκ') ('οὐ', ['movable', 'proclitic'])
See my previous post The Normalisation Column in MorphGNT for the original work this code came form.
There are also some regular expressions that I’ve used to check mistakes in things like the Open Apostolic Fathers.
It’s just an initial 0.1 release but parts of the code have already been in use for years.
The repository is https://github.com/jtauber/greek-normalisation and it’s pip-installable as