The last three months, I’ve been working on rebasing the MorphGNT database off the SBLGNT text rather than the UBS3.

While I have had permission to work with the CCAT database for over a decade, the fact the UBS3 text can be extracted from it has always been problematic. The existence of the SBLGNT solves the problem of having a critical text with clear licensing and so, in October 2010, I started the process of moving the MorphGNT analysis to the SBLGNT text.

This task is mostly done and the work-in-progress is available on GitHub at https://github.com/morphgnt/sblgnt.

It was a three step process, done one book at a time.

  • A Python script was used to do a first-pass alignment. The script allowed for differences in punctuation, accentuation, capitalization and movable-nu.
  • Any differences were then manually inspected and corrected. In 90% of cases it was a simple re-ordering of words but in the other 10%, a fresh analysis had to be made. These analyses were then checked against various sources such as BDAG, Perseus and the Lexham Reverse Interlinear.
  • Finally, I wrote another Python script that checked various heuristics

I’m in the process of making a batch of corrections based on the third step and then I’ll formally release what will be called MorphGNT 6.0 (although possibly as a beta such as 6.0b1).

The next step (which I’ve started in parallel) will merge in the Robinson analysis and parse codes on the road to a completely new set of parse codes for MorphGNT 7.0.


originally published on morphgnt.org