Posts tagged vocabulary
Last year I started the Python library vocabulary-tools to consolidate the various scripts I’ve written over the years to analyse vocabulary in (particularly New Testament) texts. I’ve just added support for the vocabulary in Vanessa Gorman’s treebanks.
Long-time readers of this blog know that, along with morphology, a core research area of mine is vocabulary. Prompted by Seumas Macdonald and now as part of the Greek Texts Project, I started putting together some vocabulary coverage statistics for various subcorpora of Greek prose.
One of my goals for 2019 is to bring more structure to various disperate Greek projects and, as part of that, I’ve started consolidating multiple one-off projects I’ve done around vocabulary coverage statistics and ordering experiments.
Measures of dispersion are interesting to apply to a corpus because they tell you whether a word is distributed across parts of the corpus as expected or concentrated more in just some parts. I thought I’d play around with Gries’s DP as a measure of dispersion on the SBLGNT lemmas.
With a boost in numbers on vocab.oxlos.org, this post looks at some slightly more detailed statistics from the first activity.
Here are some very preliminary statistics from the Greek Vocab site’s first month.
Last week I launched a site for Greek vocabulary. Here’s how the first week has gone.
I’ve put together a new little site to host various activities to research vocabulary knowledge and acquisition in the context of Ancient and Biblical Greek.
Back in The Core Vocabulary of New Testament Greek I talked about Wilfred Major’s 2008 paper on core vocabulary lists for Classical Greek and provided code for producing the same for the Greek New Testament along with some discussion of the results. I didn’t actually include the full results, however.
In a 2008 paper, Wilfred Major constructs what he calls the 50% and 80% vocab lists for Classical Greek. That is, the lemmata that account for 50% and 80% respectively of tokens in the Classical Greek corpus. In this post I provide the code for the equivalent for the Greek New Testament and talk about some of the results.
In various mailing list posts, blog posts and talks, I’ve shown vocabulary coverage statistics. It’s time to update the code to use more recent data and republish the results here.
Yesterday I gave a talk on the graded reader ideas at BibleTech.
A post to the graded-reader mailing list from March 29, 2008.
A post to the graded-reader mailing list from March 25, 2008.
It is fairly common, in the context of learning vocabulary for a particular corpus like the Greek New Testament, to talk about what proportion of the text one could read if one learnt the top N words.
For a while I’ve been interested in how you could select the order in which vocabulary is learnt in order to maximise one’s ability to read a particular corpus of sentences. Or more generally, imagine you have a set of things you want to learn and each item has prerequisites drawn from a large set with items sharing a lot of common prerequisites.