Analyzing Nominal Morphology: Part 2

In Analyzing Nominal Morphology: Part 1, I talked about putting together a list of nominal distinguishers and verifying it on the MorphGNT, generating a per-lexeme theme + distinguisher analysis. Here, I'll outline some further steps I've taken.

As well as producing a YAML file with entries for each lexeme, I also now generate a (space-delimited) tabular form that looks like this:

ἀβαρής a-4a -- M n-3d(2aA) ἀβαρ AS ἀβαρῆ ἀβαρ ῆ εσ+α
ἄβυσσος n-2b -- F n-2b ἀβυσσ GS ἀβύσσου ἀβύσσ ου ο+ιο
ἄβυσσος n-2b -- F n-2b ἀβυσσ AS ἄβυσσον ἄβυσσ ον ο+ν
ἀγαθοποιέω verb PA M n=3c(5b-OU) ἀγαθοποι NS ἀγαθοποιῶν ἀγαθοποι ῶν ουντ+
ἀγαθοποιέω verb PA M n=3c(5b-OU) ἀγαθοποι NP ἀγαθοποιοῦντες ἀγαθοποι οῦντες ουντ+ες
ἀγαθοποιέω verb PA M n=3c(5b-OU) ἀγαθοποι AP ἀγαθοποιοῦντας ἀγαθοποι οῦντας ουντ+ας
ἀγαθοποιέω verb PA F n-1c ἀγαθοποιουσ NP ἀγαθοποιοῦσαι ἀγαθοποιοῦσ αι α+ι
ἀγαθοποιΐα n-1a -- F n-1a ἀγαθοποιϊ DS ἀγαθοποιΐᾳ ἀγαθοποιΐ ᾳ α+ι
ἀγαθοποιός a-3a -- M n-2a ἀγαθοποι GP ἀγαθοποιῶν ἀγαθοποι ῶν +ων
ἀγαθός a-1a(2a) -- M n-2a ἀγαθ NS ἀγαθός ἀγαθ ός ο+ς

The columns are:

lemma
Mounce category (or verb for particples) for overall lexeme
aspect / voice (for participles)
gender
Mounce category used for particular sub-paradigm (different from overall lexeme for adjectives or participles)
lexeme-level theme
case / number
form
form-specific theme
form-specific distinguisher
stem ending and suffix

What's helpful about this format is you can use awk, grep, sort, wc and other Unix tools to very quickly extract information. (I may soon put it in SQL and expose a web interface too). So you can see all the times a particular distinguisher is used, or all the times it's used for a particular case / number. Or what all the sandhi rules are.

I've already written a Python script that generates a list of paradigms based on this (keyed off Mounce category for now, until I've finalized my own, which will actually be defined by these paradigms).

The paradigms look like:

n-3b(1) M (10):
    NS:   ξ          {κ+ς}
    GS:   κος        {κ+ος}
    DS:   κι         {κ+ι}
    AS:   κα         {κ+α}
    NP:   κες        {κ+ες}
    GP:   κων        {κ+ων}
    AP:   κας        {κ+ας}

There's actually a feedback loop where inconsistencies and errors spotted in this paradigm output inform corrections to the underlying distinguisher rules.

The code and data are available at https://github.com/morphgnt/morphological-lexicon/tree/master/projects/nominal_distinguishers.