Back in July and August 2014, I started looking at patterns in the full citation forms of nouns in Danker’s Concise Lexicon. My goal was partly to explore, in a systematic way, the relationship between inflectional classes and the information expressed in the common pattern of {nominative form}, {genitive ending}, {article}. I also wanted to put together a kind of automated test to catch typos and inconsistencies in the lexicon.

I started drafting a paper with my findings as I went along and I intend to get back to it at some point but I wanted to mention this little project here, point to the code and mention a couple of things coming out of it so far.

The code is available at https://github.com/morphgnt/morphological-lexicon/tree/master/projects/citation_forms.

In particular, the file citation_form_data.py contains the rules (still needing some work outside the basic {nominative form}, {genitive ending}, {article} pattern) for what a full citation form can look like.

Each row in this file contains a tuple of:

  • a tuple of regexes matching the full citation form, Mounce’s category and Dobson’s part-of-speech/gender (the last mostly to catch errors in that file)
  • a tentative new label for the inflectional class
  • a (potentially empty) list of child rules

For example:

((r"α, ας, ἡ$", r"^n-1a$", r"^N:F$"), "1.1/a1/F", []),

These rules are organized in a hierarchy starting with the most general rules and, containing as children, more specific subsets. The inflectional class labels like 1.1/a1/F are intended to reflect this hierarchy. For example, here are the ancestors of the above rule:

((r"^(\w+), (\w+), (\w+)$", r"^n-", r"^N"), "", [
    ((r"[αη]ς, {art}$", r"^n-1", r"^N:.$"), "1", [
        ((r"ας, ἡ$", r"^n-1", r"^N:F$"), "1.1/F", [
            ((r" ας, ἡ$", r"^n-1", r"^N:F$"), "1.1/F", [
                ((r"α, ας, ἡ$", r"^n-1[ah]$", r"^N:F$"), "1.1/a/F", [

The first line is the most general rule for any nouns whose citation form in Danker has three parts. The next level (given the class 1) are those that have a citation form ending with either ας or ης and then an article. This is further subset (class 1.1/F) into citations forms ending with ας and a feminine singular article. This is further subset into citation forms with no other letters before ας in the genitive ending provided. This is further subset (class 1.1/a/F) into those citation form whose nominative form ends with α. Because this still results in a Mounce category of n-1a or n-1h, this is further refined into the first line we saw with the inflectional class 1.1/a1/F.

From these rules certain inconsistencies show up. For example, “γῆ, γῆς, ἡ” is the only “η, ης, ἡ” entry that gives the full genitive form rather than just the genitive ending. Five of the six masculine words with genitive in “τος” give “τος” with the preceding vowel as the genitive ending but the other one gives the full genitive form. 34 feminine words with genitive in “τος” give just the preceding vowel but one gives the preceding consonant + vowel.

For a lexicon whose editors want consistency in their citation forms, this kind of thing is useful to be able to check programmatially.

Lots more to say when I get around to finishing the paper but I wanted to at least share the code and (in-progress) rules. For the tie-in to inflectional class modeling, I’ll soon integrate this work with my recent work on Analyzing Nominal Morphology but I’ll also use the “automatic consistency checking” aspect of the work to ensure better consistency in the Morphological Lexicon.