More Parsing of the DCC Principal Parts

This is part 7 of a series of blog posts about modelling stems and principal part lists and looks in even more detail at the format of the principal parts list in the DCC verbs.

In the previous blog post, I used regular expressions to match DCC principal parts.

In moving from merely matching patterns to actually extracting parts correctly, I encountered further ambiguities.

Recall that previously, I just did matches like

{grk}, {grk}, {grk}, {grk}, {grk}

where {grk} matched any Greek word.

This weekend, I expanded that to patterns more like

{present}, {future}, {aorist}, {perfect_active}, {aorist_passive}
{present}, {future}, {perfect_active}, {perfect_middle}, {aorist_passive}
{present}, {future}, {aorist}, {perfect_middle}, {aorist_passive}

which actually took into account the endings of the Greek words (for example {perfect_middle} only matches Greek words ending in μαι.

Note that the one pattern from the previous blog post becomes three patterns. These more precise patterns, however, enable easier extraction of the actual parts with their morphosyntactic properties.

They also reveal some more inconsistencies. For example, 2nd aorists are not, it turns out, always explicitly marked.

Also, the four-part pattern

{grk}, {grk}, {grk}, {grk}

actually could be any of

{present}, {future}, {aorist}, {perfect_active}
{present}, {future}, {aorist}, {perfect_middle}
{present}, {future}, {aorist}, {aorist_passive}
{present}, {future}, {perfect_middle}, {aorist_passive}
{present}, {future}, {aorist_passive}, {perfect_middle}

The last pattern is necessitated by

δύναμαι, δυνήσομαι, ἐδυνήθην, δεδύνημαι

which is, presumably, an error with ἐδυνήθην and δεδύνημαι transposed.

Besides errors like this, there is at least one ambiguity where the endings aren’t enough to disambiguate.

χαίρω, χαιρήσω, κεχάρηκα, κεχάρημαι, ἐχάρην

is ambiguous because, κα is a possible aorist ending. The ambiguity can obviously be resolved by looking at the entire form, but given some parts are annotated elsewhere to avoid possible misreading, it might be better to write the above as

χαίρω, χαιρήσω, pf. κεχάρηκα, κεχάρημαι, ἐχάρην

to make perfectly clear the aorist form has been skipped over.

Again, my point is not to nitpick the DCC principal parts list, but rather make explicit the assumptions that principal parts in this format make.

In determining what part a particular form is, the following needs to be considered:

  • explicit annotation (e.g. pf. for perfects)
  • ending (μαι ending a form other than the first two parts indicates the perfect middle)
  • position in the list (both absolutely and relative to other forms who part is worked out from other considerations)

And the main upshot of all this is I’ve now converted the DCC principal parts to a YAML format that I’ll shortly merge in with the parts from Pratt and Morwood.

Comments on “More Parsing of the DCC Principal Parts”