Adding another potential readability metric, let’s look at the mean log frequency of dependency paths.
So far we’ve looked at the mean log frequency of lexemes, the mean log frequency of forms, and, after calculating dependency paths or “swords”, the mean dependency depth.
What we haven’t looked at is the mean log frequency of those dependency paths—a rough proxy for a target having common (rather than merely shallow) syntactic structures.
By this measure, the top five (i.e. lowest scoring) books are:
4832 1 Corinthians
4929 3 John
4935 1 John
4938 John
5027 James
and the top 10 chapters are:
4183 1 Corinthians 13
4362 1 Corinthians 9
4386 1 Corinthians 14
4485 Romans 14
4486 John 16
4550 1 John 3
4558 2 Corinthians 11
4564 1 Corinthians 6
4566 1 Corinthians 7
4576 John 7
It is interesting just how much 1 Corinthians features here. The book (and those chapters featured above) do poorly in terms of mean log frequency of lexemes.
If 1 Corinthians is actually syntactically easy to read, I wonder if that’s an argument for having some readings which, because of vocab, need to be heavily footnoted with glosses but which are still worth reading early because of the syntax.