Here are some very preliminary statistics from the Greek Vocab site’s first month.
So far 82 people have signed up to http://vocab.oxlos.org/ and 52 have completed at least the first activity, a common noun receptive vocabulary leveling test based on a test form developed (for English) by Paul Nation.
Recall from my initial post on the site, that vocabulary items in that activity are classified into one of five buckets based on how many times they occur in the Greek New Testament.
Here are the mean results (with standard error) for each bucket for the first activity (N=52):
bucket | occurences | mean ± std err |
---|---|---|
1 | 32 or more times | 0.966 ± 0.008 |
2 | 16 to 31 times | 0.837 ± 0.028 |
3 | 4 to 15 times | 0.667 ± 0.041 |
4 | 2 or 3 times | 0.556 ± 0.049 |
5 | 1 time | 0.582 ± 0.047 |
The first four buckets get increasingly more difficult, as one would expect. But notice the buckets 4 and 5 are indistinguishable within the standard error of the two means.
Here are the results of the next three activities of the same type.
bucket | GNT Nouns 2 | GNT Nouns 3 | GNT Nouns 4 |
---|---|---|---|
N=30 | N=19 | N=15 | |
1 | 0.985 ± 0.004 | 0.991 ± 0.005 | 0.985 ± 0.007 |
2 | 0.894 ± 0.020 | 0.901 ± 0.021 | 0.930 ± 0.018 |
3 | 0.631 ± 0.046 | 0.661 ± 0.039 | 0.689 ± 0.051 |
4 | 0.602 ± 0.060 | 0.570 ± 0.067 | 0.574 ± 0.059 |
5 | 0.450 ± 0.048 | 0.556 ± 0.064 | 0.611 ± 0.050 |
GNT Nouns 2 actually does successfully separate buckets 4 and 5 (apparently the hapax legomena in that test were harder) but it doesn’t do a great job distinguishing buckets 3 and 4. GNT Nouns 3 fails to distinguish buckets 4 and 5 and only barely separates 3 and 4. GNT Nouns 4 likewise doesn’t really distinguish buckets 4 and 5 and only barely separates 3 and 4.
It should be noted that the ability level of the average person doing an activity increases with each activity. This isn’t clear from the data presented here but is from other data. This is likely because a person who has done reasonably well on one activity is more likely to continue to do more activities.
I COULD mitigate this problem by only including results for earlier activities from people who have completed all four. But before I do that, I’d actually like to just see more people do all four activities.
Furthermore, the vast majority of people doing these activities are scoring above 50% and, in fact, no one scoring below 40% has attempted activities beyond the first. I NEED MORE BEGINNER-INTERMEDIATE LEVEL PEOPLE to do all four tests! They will better discriminate mid-to-hard difficulty items (more on that concept later).
But preliminary indications are that I haven’t quite got the buckets right yet. Fortunately, I can re-run analyses with different bucketing even if the distribution of items chosen for the tests are based on the existing bucketing scheme.
I’ll continue to blog more statistics over time. Some topics I’d like to explore include inter-test reliability, G-theory, ANOVA, and IRT modeling.
Thank you to everyone who is contributing to this. Please spread the word!