I took MorphGNT SBLGNT and wrote a script that made a list of words from it as follows:
- every token in the text including punctuation
- every token in the text with punctuation stripped
- every normalized token in the text but if it has a movable final character, add both with and without
- the previous but with accents stripped
- every lemma in the text
- the lemma but with accents stripped
So up to 8 potential “words” from each token in the SBLGNT, but then with duplicates removed. This led to 55,496 unique “words”.
I grouped every individual Greek character (209 of them) found in the above word list into 30 “chapter” buckets. For example, I put “κ” in chapter 1 and “ξ” in chapter 4 and “έ” in chapter 8 and “ἤ” in chapter 14 and so on. This wasn’t done computationally, just manually. Each chapter has a theme: something new that gets introduced and, other than chapter 5 which covers the uppercase letters, there are no more than 9 new characters in each chapter and usually 5–8.
I then wrote a script that went through all 55,496 “words” from Step 1 and, for each character, looked up which chapter from Step 2 that character was introduced in. Then, for each word, the script noted the earliest chapter needed for all the characters in that word.
In other words, if
chapter is a mapping from a character to what chapter number it is in, calculate
max(chapter[character] for character in word)
At this point the script has built a table of 55,496 words each with the “target chapter” they can be introduced in.
When a user on greektyping.com is doing a particular chapter, here’s what happens:
- the table is queried for all the words whose target chapter is the current chapter being done.
- a sample of 10 is taken from the result (less than 10 if there are fewer than 10 words for a given target chapter, which happens in chapters 22, 24, 25, 26, and 28)
- this sample is sorted by length
- the user is presented with that list
So that’s how it works. It would be fairly easy to apply to other Greek texts (they don’t have to be analysed to the extent MorphGNT is). But even with just the MorphGNT there’s a lot of “replayability”. Chapter 8 alone has 16,704 words you could be tested on.
We’ll probably add some richer statistics at some point and also typing of longer units of text but for now our focus is on adding instructions for more keyboard layouts (the drills themselves will stay the same, though).