A post to the graded-reader mailing list from March 23, 2008.

Welcome to the graded-reader mailing list.

I’ve been getting a lot of queries in response to my presentation so I thought I’d start a mailing list so we can all discuss questions and issues together.

I also plan to make available the code that I’m using to produce the graded reader. Because it’s closely tied to the particular text and linguistic data I’m currently dealing with, it will take some time to make generic but I plan to release stuff incrementally based on your feedback.

I want to spend some time going through my current approach and explaining the different components and the ideas behind them. For the most part, these ideas can be used independently of one another so if you don’t like one aspect of what I’ve done, you can still make use of other aspects. Also I’m still improving things in lots of different ways and, of course, I look forward to a lot of new ideas coming from this list.

Because the video presentation actually doesn’t show much in terms of results, I’ve uploaded two files that will give you a flavour of the current state of my work.

You can get to these files at http://groups.google.com/group/graded-reader

example-reader.html shows the first 50 word forms output by the current version of my software when run on the Greek text of John’s gospel.

greek_2.pdf shows lesson 2 of an informal course I’m running for a couple of friends which uses the graded reader approach.

You’ll notice (1) there is a lot of extra information in the lesson given to students; (2) the order in which words are presented is different.

There are three reasons for the difference in order:

  1. the ordering in lesson 2 was hand tweaked from what the software originally produced
  2. the lesson 2 ordering was produced by an earlier version of the ordering algorithm that what was used for example-reader.html
  3. example-reader.html used slightly more linguistic information (in particular, it knew about some verb endings) in the generation of ordering

Note that the goal is to eventually not do any tweaking, but rather to capture in both the software and input data the criteria that motivated the manual reordering in the first place.

I’ll send separate posts discussing different aspects of what goes in to producing the automated output.