For years I’ve had Python code for normalising Greek forms, checking for stray characters, etc. I finally got around to consolidating them in a library.

It has a few little utilities like:

>>> strip_last_accent_if_two('γυναῖκά')

>>> grave_to_acute('τὴν')

>>> breathing_check('ἀι')

but the core of it is the normalisation of tokens with knowledge of clitics and elision.

>>> normalise('τὴν')
('τήν', ['grave'])

>>> normalise('γυναῖκά')
('γυναῖκα', ['extra'])

>>> normalise('σου')
('σου', ['enclitic'])

>>> normalise('Τὴν')
('τήν', ['grave', 'capitalisation'])

>>> normalise('ὁ')
('ὁ', ['proclitic'])

>>> normalise('μετ’')
('μετά', ['elision'])

>>> normalise('οὐκ')
('οὐ', ['movable', 'proclitic'])

See my previous post The Normalisation Column in MorphGNT for the original work this code came form.

There are also some regular expressions that I’ve used to check mistakes in things like the Open Apostolic Fathers.

It’s just an initial 0.1 release but parts of the code have already been in use for years.

The repository is and it’s pip-installable as greek-normalisation.