In Analyzing Nominal Morphology: Part 1, I talked about putting together a list of nominal distinguishers and verifying it on the MorphGNT, generating a per-lexeme theme + distinguisher analysis. Here, I’ll outline some further steps I’ve taken.
As well as producing a YAML file with entries for each lexeme, I also now generate a (space-delimited) tabular form that looks like this:
ἀβαρής a-4a -- M n-3d(2aA) ἀβαρ AS ἀβαρῆ ἀβαρ ῆ εσ+α
ἄβυσσος n-2b -- F n-2b ἀβυσσ GS ἀβύσσου ἀβύσσ ου ο+ιο
ἄβυσσος n-2b -- F n-2b ἀβυσσ AS ἄβυσσον ἄβυσσ ον ο+ν
ἀγαθοποιέω verb PA M n=3c(5b-OU) ἀγαθοποι NS ἀγαθοποιῶν ἀγαθοποι ῶν ουντ+
ἀγαθοποιέω verb PA M n=3c(5b-OU) ἀγαθοποι NP ἀγαθοποιοῦντες ἀγαθοποι οῦντες ουντ+ες
ἀγαθοποιέω verb PA M n=3c(5b-OU) ἀγαθοποι AP ἀγαθοποιοῦντας ἀγαθοποι οῦντας ουντ+ας
ἀγαθοποιέω verb PA F n-1c ἀγαθοποιουσ NP ἀγαθοποιοῦσαι ἀγαθοποιοῦσ αι α+ι
ἀγαθοποιΐα n-1a -- F n-1a ἀγαθοποιϊ DS ἀγαθοποιΐᾳ ἀγαθοποιΐ ᾳ α+ι
ἀγαθοποιός a-3a -- M n-2a ἀγαθοποι GP ἀγαθοποιῶν ἀγαθοποι ῶν +ων
ἀγαθός a-1a(2a) -- M n-2a ἀγαθ NS ἀγαθός ἀγαθ ός ο+ς
The columns are:
- lemma
- Mounce category (or
verb
for particples) for overall lexeme - aspect / voice (for participles)
- gender
- Mounce category used for particular sub-paradigm (different from overall lexeme for adjectives or participles)
- lexeme-level theme
- case / number
- form
- form-specific theme
- form-specific distinguisher
- stem ending and suffix
What’s helpful about this format is you can use awk
, grep
, sort
, wc
and other Unix tools to very quickly extract information. (I may soon put it in SQL and expose a web interface too). So you can see all the times a particular distinguisher is used, or all the times it’s used for a particular case / number. Or what all the sandhi rules are.
I’ve already written a Python script that generates a list of paradigms based on this (keyed off Mounce category for now, until I’ve finalized my own, which will actually be defined by these paradigms).
The paradigms look like:
n-3b(1) M (10):
NS: ξ {κ+ς}
GS: κος {κ+ος}
DS: κι {κ+ι}
AS: κα {κ+α}
NP: κες {κ+ες}
GP: κων {κ+ων}
AP: κας {κ+ας}
There’s actually a feedback loop where inconsistencies and errors spotted in this paradigm output inform corrections to the underlying distinguisher rules.
The code and data are available at https://github.com/morphgnt/morphological-lexicon/tree/master/projects/nominal_distinguishers.