I’m working with Seumas Macdonald on an open, corrected digital edition of the Apostolic Fathers based on Lake.

Seumas Macdonald asked me a few weeks ago what it would take to expand some of our text and vocab ordering experiments to the text of Apostolic Fathers (we’re both desirous of more comprehensible input for Greek learners).

My reply was that we first of all needed to get a good open text and then lemmatise it. I thought the “get a good open text” would be trivial but it turned out not to be.

I asked around without much positive response. I found HTML versions of the Lake texts on the Christian Classics Ethereal Library (CCEL) website but they turned out to be problematic quality-wise (see below).

It then occurred to me to check what was in the Perseus Digital Library. It only had the Epistle of Barnabas but the related First 1000 Years of Greek at the Open Greek and Latin Project had done the rest.

The Perseus/OGL texts were considerably better than the CCEL ones, but were still not without problems. It was clear that the two collections had been produced independently, however, which is important for what follows.

I’m almost certain the CCEL texts were keyed in. There is haplography and dittography galore! The hapolography even corresponds almost perfectly to line breaks in the printed Lake editions I looked at.

The Perseus/OGL texts, on the other hand, are the results of OCR with some manual correction.

I wrote some code to extract both the CCEL and Perseus/OGL texts and put them in a comparable format. I then wrote a script to align the two. My thinking was to go through all the places where the two disagreed, check the printed Lake and correct the Perseus/OGL text accordingly.

I decided to throw the Lake text from Logos into the mix as well, not as an input to the correction itself but merely as another “edition” to flag differences with (to then check with the printed Lake).

Thus began a project Seumas and I have been working on the last few weeks. Once differences in any of the three texts are identified, they are flagged for review and Seumas and I independently look at the printed Lake and correct the Perseus/OGL base text.

If our corrections disagree, we continue to work on them until we come to consensus. This three-way comparison followed by two-way independent correction is proving to work very well (although it’s a lot of work!)

All the code, the source texts (except Logos), and work-in-progress are available at


and you can follow along the status in the README. There are also more detailed notes on the whole process.

Once the candidate versions of all the texts are published, I’ll do another post just with some interesting statistics on the nature of errors in the CCEL, Perseus/OGL, and Logos texts. The “scribal errors” in the CCEL text are particularly fascinating but even some of the Perseus/OGL OCR errors will be worth writing about.

Seumas and I will then contribute back the corrections to CCEL, Perseus/OGL, and Logos. Hopefully our texts will also be featured on the Biblical Humanities Dashboard as the go-to open digital text of the Apostolic Fathers (so no one else has to repeat this effort).

Finally, we’ll start the process of lemmatisation so the Apostolic Fathers can be included in our open learning materials.