Parts of Speech and Number of Accents

I thought I'd write a quick Python script to check how many accents were on each of the lemmata in [MorphGNT] 5.06.

Here are the counts by part of speech and number of accents on lemma:

|     |  0      |  1      |  2  |
+-----+---------+---------+-----+
| A   |  -      |  9159   |  -  |
| C   |  924    |  17361  |  -  |
| D   |  1592   |  4606   |  -  |
| I   |  -      |  17     |  -  |
| N   |  30     |  28271  |  1  |
| P   |  5433   |  5488   |  -  |
| RA  |  19862  |  4      |  -  |
| RD  |  -      |  1744   |  -  |
| RI  |  -      |  1165   |  -  |
| RP  |  -      |  11584  |  -  |
| RR  |  -      |  1677   |  -  |
| V   |  8      |  28101  |  1  |
| X   |  147    |  844    |  -  |

Some of the low numbers are definitely errors in the database. Now to investigate...

UPDATE (2005-07-16): both 2-accent cases were mistakes. The 30 0-accent nouns and 5 of the 0-accent verbs were foreign loan words that intentionally weren't accented but 3 of the 0-accent verbs were mistakes. The 4 accented articles were the result of crasis with the following noun and the word should probably be analyzed as a noun rather than an article. I guess there'll be a 5.07 release soon. NOTE: I haven't looked at the particles, adverbs, conjunctions or prepositions yet.

originally published on jtauber.com

at the intersection of computing, linguistics, philology, and learning science

By day I’m an entrepreneur, web technologist and open-source developer but my academic background is in linguistics (along with some classics, comparative philology, and educational statistics) and my main avocation is working on text, annotations, analysis and software relating to historical languages with a particular interest in facilitating better learning.

While my focus has mostly been on Biblical Greek, much of the work is highly relevant to other Hellenistic Greek texts, other dialects of Ancient Greek and, indeed, texts in completely different languages as well.

All code written for this endeavour is open source and text and data is made available under a Creative Commons license to the extent allowed by the sources used.