Analyzing Verbal Morphology: Part 1

In anticipation of my SBL talk tomorrow, here's an update on my verbal analysis.

As I mentioned in Analyzing Nominal Morphology: Part 1, I started off with nominal morphology but, the last couple of years have been more focused on the verb (until a couple of months ago when I switched back to the noun).

My current modeling approach is actually my third attempt at verbs. Perhaps in a later post I'll describe the earlier approaches and why I backed out and started from scratch twice. I'm happy with the path I'm following now, though.

Unlike the approach I took later with nouns, my verb analysis didn't focus on theme/distinguisher but on stem/suffix with sandhi rules. One reason for this is one of my immediate goals was stem generation.

Prior to running on all the MorphGNT verbs, I started with Helma Dik's Nifty Greek Handouts and the verb paradigms in Louise Pratt's The Essentials of Greek Grammar. Coverage is now those plus all the MorphGNT verbs except for imperatives, subjunctives and optatives.

The code and data is currently available at https://github.com/jtauber/greek-inflection although I may move at least the GNT-specific data to be in the morphological-lexicon repo soon.

The basic approach is to have an "endings" database and a "stems" database. The "endings" database looks like:

PAI.1S:
    - "|>ω<ω|"
    - "|ε>ῶ<ω|"
    - "|ο>ῶ<ω|"
    - "|α>ῶ<ω|"
    - "|ο!>ω<_1|μι"
    - "|ε!>η<_1|μι"
    - "|υ!>υ<_1|μι"
    - "|α!>η<_1|μι"
    - "|ει!>ει<_1|μι"

AAI.1S:
    - "|><|α"
    - "|%>ο<T_1|ν"
    - "|α^>η<_1|ν"
    - "|ε^>η<_1|ν"
    - "|ο^>ω<_1|ν"
    - "|α!>η<_1|ν"

where endings and sandhi are expressed. You can see various stem diacritics like ! for athematic, ^ for root aorists and % for second aorists. T_1 represents a thematic vowel and _1 a particular ablaut pattern.

Along side this is a larger stem database:

ἀγαπάω:
    stems:
        1-: ἀγαπα
        1+: ἠγαπα
        2-: ἀγαπησ
        3-: ἀγαπησ
        3+: ἠγαπησ
        4-: ἠγαπηκ
        5-: ἠγαπη
        7-: ἀγαπηθησ
ἀναλαμβάνω:
    compound: ἀνά++λαμβάνω
    stems:
        1-: ἀναλαμβαν
        3-: ἀναλαβ%
        3+: ἀνελαβ%
        6-: ἀναλημφθ
        6+: ἀνελημφθ

Stems are keys by a principal-part like scheme where - / + refers to augmented and unaugmented. The 7- stem is the future perfect.

The stem database can also do overrides for individual paradigm cells, show preverbs, mark enclitics and more.

All this gets tested against the Dik and Pratt examples and the verb forms in the MorphGNT in two ways:

given a lemma and features, is the correct form generated?
given a form, lemma and features, is the correct stem identified?

Once the imperatives, subjunctives and optatives are done, I'll work on stem relationships, essentially treating the stems as another paradigm. I may also at some point generate distinguishers for each verb form (within a particular aspect/tense-voice form).

Further work will involve using it to actually analyze new texts, particularly handling the case where the stem is not yet in the stem database.