Working on Plato Texts
I started working on some Plato texts a while ago but now I’m back to it, integrating various information and hitting some more issues with the Diorisis corpus.
Ordering Vocabulary by Pericope Dispersion
Jesse Egbert’s Plenary at JAECS 2020 is giving me a bunch of ideas of things to try on the New Testament and larger Greek corpora. In this post, I briefly explore text dispersion keyness using pericopes as a way of ordering vocabulary.
More on Plato After GNT
In the previous post, we looked at lemma and token coverage in the works of Plato assuming knowledge of Greek New Testament vocabulary. Here we graphically look at those results and make an important observation.
Plato Vocabulary Coverage After the New Testament
Seumas Macdonald asked me about vocabulary coverage for each work of Plato assuming one has learnt the New Testament vocabulary.
A Tour of Greek Morphology: Part 48
Part forty-eight of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 47
Part forty-seven of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 46
Part forty-six of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
Picking the Words for Greek Typing
Last week, we launched greektyping.com to help people get better at typing Greek. Aurélien Berra asked what the method of choosing words to type was so I thought I’d write a blog post about it.
A Tour of Greek Morphology: Part 45
Part forty-five of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Polytonic Greek Typing Tutor
I’ve revived an old web application to help people practice typing Ancient Greek.
A Tour of Greek Morphology: Part 44
Part forty-four of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 43
Part forty-three of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
Lemmatization for the Morphological Lexicon
As I slowly expand my plans for a Morphological Lexicon of New Testament Greek to a Morphological Lexicon of Ancient Greek, I’m dealing with extra challenges in lemmatization.
A Tour of Greek Morphology: Part 42
Part forty-two of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
Tool Updates
I have made a minor update to
greek-normalisation
, a more significant update tovocabulary-tools
, and have started a new projectpostag-convert
for converting between various morphosyntactic tagging schemes.A Tour of Greek Morphology: Part 41
Part forty-one of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
Tolkien, Sonnenschein and Language Learning
Via an unusual route, I discovered Edward Adolf Sonnenschein and his thoughts at the turn of the 20th century on teaching Latin (and Greek).
Vanessa Gorman's Lemmatisation Now in vocabulary-tools
Last year I started the Python library vocabulary-tools to consolidate the various scripts I’ve written over the years to analyse vocabulary in (particularly New Testament) texts. I’ve just added support for the vocabulary in Vanessa Gorman’s treebanks.
A Tour of Greek Morphology: Part 40
Part forty of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
Working with the Diorisis Ancient Greek Corpus
I’ve recently started working on cleaning up the Diorisis Ancient Greek Corpus for my own vocabulary and morphology work as well as potential use in Scaife.
A Tour of Greek Morphology: Part 39
Part thirty-nine of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 38
Part thirty-eight of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 37
Part thirty-seven of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 36
Part thirty-six of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 35
Part thirty-five of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 34
Part thirty-four of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 33
Part thirty-three of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 32
Part thirty-two of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 31
Part thirty-one of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 30
Part thirty of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
Release of text-validator 0.3
A few weeks ago, I announced the first release of
text-validator
, my pluggable command-line tool for validating the formatting and orthography of text files.A Tour of Greek Morphology: Part 29
Part twenty-nine of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
Mounce on Ablaut (Or Not)
Mounce’s Basics of Biblical Greek Grammar is a very popular modern textbook, with over 400,000 copies sold and now in its fourth edition. There’s a lot one could quibble with around the usual suspects of deponency, aspect, or the general grammar-translation approach but it’s particularly odd when basic (and, as far as I know, uncontroversial) terminology is misused or misunderstood. I’m talking in particular about the way “ablaut” is discussed.
Dictionary Markup versus Lexical Modelling
This year I’ve been thinking about (and working on) the representation of lexical information quite a bit.
Release of text-validator 0.1
I’ve released a first version of a pluggable command-line tool for validating the formatting and orthography of text files.
Off to the UCLA Indo-European Conference Again
Today I’m heading off to Los Angeles to attend the Thirty-First Annual UCLA Indo-European Conference.
Subcorpus Vocabulary Statistics
Long-time readers of this blog know that, along with morphology, a core research area of mine is vocabulary. Prompted by Seumas Macdonald and now as part of the Greek Texts Project, I started putting together some vocabulary coverage statistics for various subcorpora of Greek prose.
New Blog Platform
Following on from my success with it on the Digital Tolkien Project website, I decided to switch to using Jekyll for the generation of jktauber.com as a static site.
Greek Texts Project
A twitter conversation led to the creation of a new project to work on annotated Greek texts for language learners.
Release of greek-normalisation 0.3
In the last couple of weeks I’ve done a couple of minor releases of the
greek-normalisation
Python library which brings together various code I use to clean up Greek texts and normalise the forms.Summer Conferences
Here are the conferences I’m attending (and in some cases, presenting at) in June through August. I probably should have posted this at the start of my conference travel, but here it is.
Release of greek-normalisation 0.1
For years I’ve had Python code for normalising Greek forms, checking for stray characters, etc. I finally got around to consolidating them in a library.
A Tour of Greek Morphology: Part 28
Part twenty-eight of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
Consolidating Vocabulary Coverage and Ordering Tools
One of my goals for 2019 is to bring more structure to various disperate Greek projects and, as part of that, I’ve started consolidating multiple one-off projects I’ve done around vocabulary coverage statistics and ordering experiments.
Initial Apostolic Fathers Text Complete
Exactly three months ago to the day, I announced that Seumas Macdonald and I were working on a corrected, open, digital edition of the Apostolic Fathers based on Lake. That initial work is now complete.
More Thoughts on Different Morphological Analyses
In Five Types of Morphological Analysis I outlined five distinct ways of approaching morphological (or potentially any linguistic) analysis. In support of some of these, I have some additional examples from a pair of papers I’m reading and a conference I just attended.
Five Types of Morphological Analysis
People talking about morphological analyses can often speak across each other because they have different purposes in mind. Here’s an initial attempt to outline five possibly distinct notions one might be referring to.
Preparing an Open Apostolic Fathers
I’m working with Seumas Macdonald on an open, corrected digital edition of the Apostolic Fathers based on Lake.
A Tour of Greek Morphology: Part 27
Part twenty-seven of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 26
Part twenty-six of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
Back from International Colloquium on Ancient Greek Linguistics
Last week I attended the ninth International Colloquium on Ancient Greek Linguistics at the University of Helsinki.
A Tour of Greek Morphology: Part 25
Part twenty-five of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 24
Part twenty-four of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
The Normalisation Column in MorphGNT
Eliran Wong asked for a more detailed description of the “normalisation” column in MorphGNT so I promised him I’d write a blog post about it.
A Tour of Greek Morphology: Part 23
Part twenty-three of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 22
Part twenty-two of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
First Impressions of John Lee’s Accents Book
John Lee’s Basics of Greek Accents was released today. Here are some first impressions.
Conference Time
I’m off for another string of conferences, this time in Copenhagen, Chicago, and New Orleans.
A Tour of Greek Morphology: Part 21
Part twenty-one of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 20
Part twenty of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
New Draft Morphological Tags for MorphGNT
I’ve finally done the work in translating the MorphGNT tagging system to a new proposal for initial feedback.
Lexical Dispersion in the Greek New Testament Via Gries’s DP
Measures of dispersion are interesting to apply to a corpus because they tell you whether a word is distributed across parts of the corpus as expected or concentrated more in just some parts. I thought I’d play around with Gries’s DP as a measure of dispersion on the SBLGNT lemmas.
Some Unix Command Line Exercises Using MorphGNT
I thought I’d help a friend learn some basic Unix command line (although pretty comprehensive for this type of work) with some practical graded exercises using MorphGNT. It worked out well so I thought I’d share in case they are useful to others.
SBL Papers Now Online
I’ve put my two SBL papers this year (from both the recent Annual Meeting and the International Meeting) online and also sync’d my Annual Meeting slides to audio I recorded on my iPhone.
Speaking at SBL 2017 on Linking Lexical Resources
I’m again speaking at the SBL Annual Meeting, this time in Boston. My topic is basically the “lemma lattice” work started by Ulrik Sandborg-Petersen and I back in 2006 but which I’ve never presented in this sort of setting before.
Four Types of But
In his talk on adversive conjunction in Gothic at the 29th UCLA Indo-European Conference, Jared Klein started with a wonderful example paragraph in English.
A Tour of Greek Morphology: Part 19
Part nineteen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
Off to the UCLA Indo-European Conference
Tomorrow I’m off to Los Angeles for the Twenty-Ninth Annual UCLA Indo-European Conference.
A Tour of Greek Morphology: Part 18
Part eighteen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 17
Part seventeen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
pyuca 1.2 Released with Support for New Versions of Unicode
pyuca is my pure-Python implementation of the Unicode Collation Algorithm—a library I use almost every day to properly sort Greek (although the library is not Greek-specific). I was recently asked how to use pyuca with a more recent DUCET than 6.3.0. That led to me needing to make a number of changes to the core code so it now supports 8.0.0, 9.0.0 and 10.0.0 as long as you have the right Python version.
A Tour of Greek Morphology: Part 16
Part sixteen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 15
Part fifteen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
More Vocabulary Statistics
With a boost in numbers on vocab.oxlos.org, this post looks at some slightly more detailed statistics from the first activity.
A Tour of Greek Morphology: Part 14
Part fourteen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
Some Initial Vocabulary Statistics
Here are some very preliminary statistics from the Greek Vocab site’s first month.
Greek Letter Frequencies
I recently saw a nice visualisation of English letter bigram frequencies and decided to replicate it with Greek New Testament data.
A Tour of Greek Morphology: Part 13
Part thirteen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 12
Part twelve of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
Speaking in Berlin
This afternoon I’m heading off to Berlin for my first Society of Biblical Literature International Meeting, where I’ll be speaking on adaptive reading environments for Biblical Greek.
First Week of New Vocab Site
Last week I launched a site for Greek vocabulary. Here’s how the first week has gone.
A Tour of Greek Morphology: Part 11
Part eleven of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 10
Part ten of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
NT Book Similarity by Jaccard Distance of Lemma Sets
I was thinking about vocabulary differences between books of the New Testament and decided to see what happens when you do a hierarchical clustering analysis of NT books using the Jaccard distance of their lemma sets.
New Site for Vocabulary Experiments
I’ve put together a new little site to host various activities to research vocabulary knowledge and acquisition in the context of Ancient and Biblical Greek.
A Tour of Greek Morphology: Part 9
Part nine of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 8
Part eight of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Man Walks Into A Bar
I’ve thought for a while that “A man walks into a bar” jokes are a great example of how definiteness works in English. I mentioned this to Jonathan Robie in Cambridge and he seemed to like the example too so I thought I’d share it more broadly.
A Tour of Greek Morphology: Part 7
Part seven of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 6
Part six of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
Categories of Reader Work
I sometimes get people expressing an interest in my Greek reader work or get asked about the status of my “reader” and I have to ask them to clarify which reader they mean. I thought I might do a quick post where I spell out various “reader” projects I have worked on and am working on.
A Tour of Greek Morphology: Part 5
Part five of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 4
Part four of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
Collapsible Treedown
Jonathan Robie’s Treedown format is a really nice way of conveying basic syntactic structure in real texts. I recently experimented a little with some code for collapsing and expanding of the structure.
A Tour of Greek Morphology: Part 3
Part three of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
A Tour of Greek Morphology: Part 2
Part two of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).
Another European Trip
I was here last month but I’m back again for a series of conferences and then my graduation.
A Tour of Greek Morphology: Part 1
This is the first post in a (likely long) series exploring the inflectional morphology of Greek. My goal is to work through various aspects of Greek morphology to help students think more systematically about the subject.
Modelling Derivational Morphology
While most of my focus has been on inflectional morphology, I’ve done a little bit of work on modelling derivational morphology and it’s been a desideratum for my reader and learning algorithm work dating back to at least the original 2008 "New Kind of Graded Reader" presentations.
Comparing Analyses from Herodotus
An analysis I did of a couple of chapters of Herodotus looks like it might be an interesting example to use for various treebanking approaches—both in terms of how things are structured as well as how they are visualised.
Headed to Germany Next Week
Next week I’m headed to Germany for a whirlwind trip to Göttingen, Heidelberg, and Leipzig to share and discuss ideas with other scholars.
Handling Morphological Ambiguity
On my now page, I currently list “finalising an improved set of morphology tags to use” under Medium Term. As I find myself sometimes having to clarify the motivation for and state of this, I thought I’d share what I just wrote in the Biblical Humanities Slack.
An Initial Reboot of Oxlos
In a recent post, Update on LXX Progress, I talked about the possibility of putting together a crowd-sourcing tool to help share the load of clarifying some parse code errors in the CATSS LXX morphological analysis. Last Friday, Patrick Altman and I spent an evening of hacking and built the tool.
Analysing the Verbs in Nestle 1904
The last couple of weeks, I’ve been working on getting my
greek-inflexion
code working on Ulrik Sandborg-Petersen’s analysis of the Nestle 1904. The first pass of this is now done.Update on LXX Progress
As mentioned in previous posts, I’ve been working through the LXX, initially making sure my
greek-inflexion
library can generate the same analysis of verbs as the CATSS LXX Morphology and adding to the verb stem database accordingly. This is a preliminary to being able to run the code on alternative LXX editions such as Swete and provide a freely available morphologically-tagged LXX.New MorphGNT Releases and Accentuation Analysis
Over the last few weeks, I’ve made a number of new releases of the MorphGNT SBLGNT analysis fixing some accentuation issues mostly in the normalization column. This came out of ongoing work on modelling accentuation (and, in particular, rules around clitics).
First Pass of MorphGNT Verb Coverage and LXX Beginnings
In greek-inflexion and an Update on the Morphological Lexicon I said that all the verbs in the MorphGNT SBLGNT analysis should be done by the end of the year. I hit that goal and made a decent start on the Septuagint.
Diacritic Stacking in Skolar PE Fixed
Back in Polytonic Greek Unicode Still Isn’t Perfect and An Updated Solution to Polytonic Greek Unicode’s Problems I talked about problems with stacking vowel length and other diacritics. At least in terms of the font used on this site, the problems are now solved.
greek-inflexion and an Update on the Morphological Lexicon
Exactly seven months ago, I released a generic library,
inflexion
, and said I’d soon follow it up with the Greek-specific stuff. While I did open-source the latter on GitHub asgreek-inflexion
shortly thereafter, I didn’t want to announce it here until it was further along. I’m happy to say it now is.More on Diagramming Greek Accent Placement
I’ve put together slides and a voice-over to further explain Greek accent placement from a moraic point-of-view.
greek-accentuation 1.0.4 Released
Three weeks ago I fixed a few bugs in
greek-accentuation
and ended up doing three releases (although I only blogged about two at the time). I’ve now done a fourth bug fix release: 1.0.4.Diagramming Greek Accent Placement
Cleaning up code as part of another bug fix to
greek-accentuation
led me to update an old diagram I’d done showing the Greek accentuation possibilities in terms of morae.greek-accentuation 1.0.2 Released (and How Persistent Accentuation Works)
Hot on the heels of the 1.0.1 bug fix, I’ve released 1.0.2 with another fix, this time in the persistent accent placement. So I thought I’d explain how persistent accent placement is implemented and what the bug was.
greek-accentuation 1.0.1 Released
A minor bug fix release that fixes a problem with
add_necessary_breathing
.Thoughts on Voice
Occasionally I get in to conversations about the Greek middle (or voice in general) but I’ve never written down my thoughts on the topic. Here’s an attempt to summarize my current thinking although there’s nothing particularly novel about it.
greek-accentuation 1.0.0 Released
greek-accentuation
has finally hit 1.0.0 with a couple more functions and a module layout change.More Parsing of the DCC Principal Parts
This is part 7 of a series of blog posts about modelling stems and principal part lists and looks in even more detail at the format of the principal parts list in the DCC verbs.
Parsing the DCC Principal Parts
This is part 6 of a series of blog posts about modelling stems and principal part lists and looks more precisely at the format of the principal parts list in the DCC verbs.
Formatting of Principal Parts
This is part 5 of a series of blog posts about modelling stems and principal part lists and covers the format of the principal parts themselves in the Pratt, Morwood and DCC verb lists.
Merging the DCC Lemmas
This is part 4 of a series of blog posts about modelling stems and principal part lists and covers the Dickinson College Commentaries (DCC) Greek Core lemmas and issues in merging them with the existing merge of Pratt and Morwood.
Merging the Morwood and Pratt Lemmas
This is part 3 of a series of blog posts about modelling stems and principal part lists and covers the Morwood lemmas and issues in merging them with Pratt’s.
Lemmas in the Pratt Principal Parts
This is part 2 of a series of blog posts about modelling stems and principal part lists and covers the complexities in the notion of a lemma identifying lexical entries, specifically in the Pratt principal parts.
Sources of Principal Part Lists
This is part 1 of a series of blog posts about modelling stems and principal part lists and covers the three sources of Attic Greek principal parts used to expand and test the Morphological Lexicon.
Modelling Stems and Principal Part Lists
This is part 0 of a series of blog posts about modelling stems and principal part lists, particularly for Attic Greek but hopefully more generally applicable. This is largely writing up work already done but I’m doing cleanup as I go along as well.
pyuca Published in The Journal of Open Source Software
A research career requires publication in peer-reviewed journals but what if some of your scholarly output is in the form of software? The Journal of Open Source Software attempts to solve that by essentially wrapping peer-reviewed software packages up as lightweight papers. My pyuca library was just accepted for publication by the journal.
Varro’s Four Parts of Speech for Latin
In my post Morphological Parts of Speech in Greek last year, I presented a model of five or six parts of speech based purely on what they inflect for. I just found out Varro suggested similar for Latin over two thousand years ago.
Inflexion: Generic Code for Morphological Generation and Parsing
Over the last few years, I’ve worked on a number of iterations of code that can generate Ancient Greek verb forms. I’ve now broken out the Greek-specific pieces and released a generic library called inflexion.
17th International Morphology Meeting
I’m current in Vienna for the International Morphology Meeting.
An Updated Solution to Polytonic Greek Unicode’s Problems
In Polytonic Greek Unicode Still Isn’t Perfect, I enumerated various challenges that still exist with using Polytonic Greek when vowel length needs to be marked. I now have a better appreciation of what solutions are actually realistic.
Polytonic Greek Unicode Still Isn’t Perfect
Whether we’re talking about fonts, programming languages, keyboard entry or even the command-line, support for polytonic Greek has greatly improved even in the last 10 years much less the 23 years since I’ve been doing computational analysis of Greek texts.
greek-utils 0.1 Released
While I write and release a lot of Python code for working with Ancient Greek, it tends to be either throwaway code for data wrangling or fairly specialized code for things like accentuation or inflectional morphology.
Direct Speech Capitalization and the First Preceding Head
As part of my explicit annotation of the normalization column in MorphGNT, I started down the rabbit hole of capitalization conventions which led to an interesting experiment with direct speech and the GBI syntax trees.
MorphGNT 6.07 Released
The latest release of MorphGNT (with a corresponding release of the Python library py-sblgnt) fixes some lemmatization issues along with a couple of accent and part-of-speech changes.
Gouin on Language Learning
I recently found out about François Gouin, a sort of proto-Charles Berlitz who wrote (in French) a book called The art of teaching and studying languages, published in 1880 and then translated and published in English in 1892.
Off to the Linguistic Society of America’s 90th Annual Meeting
I’m heading off to the LSA’s annual meeting for the first time.
Functional Dependency in the MorphGNT Table
Often it’s useful to see whether certain columns in a table can be entirely determined by others. For example, can you unambigously get the lemma from just the form (the answer is no so a more useful question is which forms are ambiguous as to lemma)? Does knowing the part-of-speech help? Here we provide some code and give some examples.
A (Not So) New Numbering System for Greek New Testament Lexemes
Ten years ago, when Ulrik Sandborg-Petersen and I started collaborating, we came up with a way of referencing lexemes that would satisfy both the lumpers and splitters. At the time we wrote a paper that we circulated to a small audience but now it’s finally up on Academia.edu.
Annotating the Normalization Column in MorphGNT: Part 1
Since the Series-6 release, MorphGNT has had a column that normalizes the word forms in the text for contextual things like accent changes, elision, movable nu and capitalization. I thought it would be useful to provide an annotation of exactly what normalization had been done for each word in the text and why.
Back to a More Sustainable Blogging Pace
Well, I did it! I blogged a post for every day in the four weeks leading up to my talk at SBL. It was a fantastic motivator but I can’t sustain the pace.
A Morphological Lexicon of New Testament Greek: My SBL 2015 Slides
This morning I gave my talk at SBL 2015 on my Morphological Lexicon project.
Analyzing Verbal Morphology: Part 1
In anticipation of my SBL talk tomorrow, here’s an update on my verbal analysis.
Greek Accentuation Library
I knew that a necessary component of a comprehensive morphological analyzer for Ancient Greek was going to be a library for handling accentuation, so back in January 2014, I started the
greek-accentuation
Python library.The Dangers of Reconstructing Too Much Morphophonology
What is the genitive singular ending for 2nd declension nouns?
Full Citation Forms and Inflectional Classes
Back in July and August 2014, I started looking at patterns in the full citation forms of nouns in Danker’s Concise Lexicon. My goal was partly to explore, in a systematic way, the relationship between inflectional classes and the information expressed in the common pattern of
{nominative form}, {genitive ending}, {article}
. I also wanted to put together a kind of automated test to catch typos and inconsistencies in the lexicon.Modern Greek Text to Speech for Biblical Greek
Text-to-speech is pretty good these days but a lot of people don’t realize that operating systems like OS X have support for languages other than English, including Modern Greek. So I thought I’d experiment with using it to read the Greek New Testament.
Actual Core Vocab Lists for Greek New Testament
Back in The Core Vocabulary of New Testament Greek I talked about Wilfred Major’s 2008 paper on core vocabulary lists for Classical Greek and provided code for producing the same for the Greek New Testament along with some discussion of the results. I didn’t actually include the full results, however.
First Prototype of New Online Reader
Over in the lab section of this site, I’ve added a little prototype Patrick Altman and I built last night.
Analyzing Nominal Morphology: Part 2
In Analyzing Nominal Morphology: Part 1, I talked about putting together a list of nominal distinguishers and verifying it on the MorphGNT, generating a per-lexeme theme + distinguisher analysis. Here, I’ll outline some further steps I’ve taken.
Initial Thoughts on the Cost of Learning a Form
Over the years, when generating vocab coverage stats or orderings for graded readers, I’ve used either lemmas or inflected forms as the items being learnt.
Analyzing Nominal Morphology: Part 1
While much of my work going back 10 years or more was on the nominals, the last few years I’ve been focused on verbal morphology. I decided that for my SBL paper, however, I’d revisit some of my noun work and ended up exploring some ideas afresh.
Technical Aspects of Openness
In my previous post, I talked about the legal / licensing aspects of open linguistic data but there are technical aspects in order for linguistic data to be open too.
Why I Use CC-BY-SA Licenses
I don’t think I’ve ever articulated why I favour a Creative Commons CC-BY-SA license on all my New Testament Greek data.
Mean Log Frequency of Dependency Paths
Adding another potential readability metric, let’s look at the mean log frequency of dependency paths.
At the Half Way Point
Exactly two weeks ago I said I’d be blogging every day until my talk at SBL. Well, that’s two weeks away so I’m at the half way point. I think the blogging has gone well.
Generating Readers
Back in April 2014, Brian Renshaw posted a Good Friday Greek Reader. It was presumably manually produced but I knew such things could be generated automatically and so went about building a system to do so.
Inline Annotation of Sandhi
In many Greek morphology projects, I’ve wanted a way of conveying the surface form of an inflected word while also conveying the underlying components prior to the application of the sandhi rule. A couple of years ago, I came up with a simple representation for inline annotation.
Morphological Parts of Speech in Greek
The parts of speech in a particular language can be drawn up on the basis of syntactic properties, morphological properties, and/or (perhaps most problematically) semantic properties.
Mean Log Frequency of Forms
In a previous post, we looked at which chapters had the highest mean log frequency of lexemes. The code provided there was applicable to other items, though, so let’s now take a look at mean log frequency of forms.
Distinguishers in Morphology
A few years ago, I was introduced by Greg Stump to the notion of distinguishers in morphological description. The analysis of inflected forms in terms of theme + distinguisher is a very helpful concept and one that is made use extensively in my ongoing work on New Testament Greek morphology.
Atom Editor 1.1 Fixes Polytonic Greek Bug
Release 1.1 of GitHub’s Atom Editor fixes a problem I had with using it for polytonic Greek.
Renaming Non-Indicative Tense-Forms
I think it’s confusing that we name the non-indicative tense-forms with the same terms as indicative tense-forms. For example “present indicative” and “present infinitive”. The word “present” doesn’t mean the same thing in both cases.
An Experimental REST API to MorphGNT
Back in July, I thought I’d prototype a REST API for MorphGNT with resources for books, paragraphs, sentences, verses and words.
The Core Vocabulary of New Testament Greek
In a 2008 paper, Wilfred Major constructs what he calls the 50% and 80% vocab lists for Classical Greek. That is, the lemmata that account for 50% and 80% respectively of tokens in the Classical Greek corpus. In this post I provide the code for the equivalent for the Greek New Testament and talk about some of the results.
Mean Dependency Depth
With dependency paths calculated for the Greek New Testament, we can use mean dependency depth as a proxy for syntactic complexity.
Dependency Paths
For numerous corpus linguistics applications, it’s useful to have a word-level indication of syntax. A presentation by Vanessa and Robert Gorman gave me the idea of using dependency paths for this purpose so I’ve now calculated them for the GNT based on the GBI syntax trees.
Mean Log Frequency of Lexemes
One component of many readability measures on texts is the mean log word frequency. Here I do a basic calculation across chapters in the Greek New Testament (with code provided).
Updated Vocabulary Coverage Statistics
In various mailing list posts, blog posts and talks, I’ve shown vocabulary coverage statistics. It’s time to update the code to use more recent data and republish the results here.
Blogging Every Day Between Now and SBL Annual Meeting
It’s exactly four weeks until I’m presenting at the SBL Annual Meeting in Atlanta. As I have a long backlog of posts I’ve wanted to do for a while, I thought I might try to blog every day between now and my talk on November 22nd.
Speaking At The SBL Annual Meeting in Atlanta
I’ve just finished up registration for the SBL Annual Meeting. Here’s the paper I’ll be presenting.
Types of Disagreement in Syntactic Analyses
As helpful as the GBI Syntax Trees are, I have disagreements with them. Randall and Andi are receptive to feedback but there are very different types of disagreement that can arise in syntactic analysis so I thought I’d start to note down what they are.
Converting the GBI Syntax Trees to a Dependency Analysis
With one child on each branch identified as the head, a constituent analysis can be converted to a dependency analysis. Fortunately, the GBI syntax trees have an explicit indication of the head, so I went ahead and converted them to a dependency format.
pyuca supports Python 2 again
Thanks to Chris Beaven, Paul McLanahan and Michal Čihař, Python 2 support is back in pyuca 1.1.
My BibleTech 2015 Talk
BibleTech talks were not recorded but I turned on my iPhone’s Voice Memo recording and later sync’d the audio with my slides to make this video.
Version 1.0 of pyuca released
pyuca is my pure Python implementation of the Unicode Collation Algorithm (for sorting, amongst other things, Greek).
Rebasing MorphGNT off SBLGNT
The last three months, I’ve been working on rebasing the MorphGNT database off the SBLGNT text rather than the UBS3.
Inline Replacement for John 2
A post to the graded-reader mailing list from April 25, 2010.
All Subtrees Not Just Clauses
A post to the graded-reader mailing list from April 14, 2010.
Initial Code Based on PROIEL Dependency Analysis
A post to the graded-reader mailing list from April 12, 2010.
My BibleTech 2010 Talk
Yesterday I gave a talk on the graded reader ideas at BibleTech.
The “Next-Best” Algorithm
A post to the graded-reader mailing list from April 1, 2008.
Vocab Coverage Table for a Better Ordering
A post to the graded-reader mailing list from March 29, 2008.
Ordering is Ultimately of Targets not Items
A post to the graded-reader mailing list from March 29, 2008.
If Only They Knew That One Rare Word...
A post to the graded-reader mailing list from March 26, 2008.
Just How Much Can Frequency Ordering Be improved On?
A post to the graded-reader mailing list from March 26, 2008.
GNT Verse Coverage with Frequency Ordering
A post to the graded-reader mailing list from March 25, 2008.
Throttle and Delay
A post to the graded-reader mailing list from March 23, 2008.
Embedding the Target Language in English
A post to the graded-reader mailing list from March 23, 2008.
Welcome (and some files)
A post to the graded-reader mailing list from March 23, 2008.
Graded Reader Discussion and Code
Owing to the amount of interest I received about A New Kind of Graded Reader…
A New Kind of Graded Reader
Back in 2004, I talked about algorithms for optimal vocabulary ordering.
BibleTech 2008
I don’t think I’ve mentioned it here before but next week, I’m one of the keynote speakers at the BibleTech 2008 conference in Seattle.
GNT Verse Coverage Statistics
It is fairly common, in the context of learning vocabulary for a particular corpus like the Greek New Testament, to talk about what proportion of the text one could read if one learnt the top N words.
Announcing MorphGNT.org
I’ve hinted before about Ulrik Petersen and I collaborating on Greek New Testament linguistic endeavours.
Bug Fix to Python Unicode Collation Algorithm
See Python Unicode Collation Algorithm for background.
Dynamic Interlinears with Javascript and CSS
After the continuation of a permathread on the b-greek mailing list about the pros and cons of interlinears, I built some quick demonstrations of how CSS and Javascript could be used for dynamic interlinear glosses that would not be possible on the printed page.
Python Unicode Collation Algorithm
My preliminary attempt at a Python implementation of the Unicode Collation Algorithm (UCA) is done and available at:
File System Archaeology for MorphGNT
Some of you will be aware of Ulrik Petersen’s work on augmenting Tischendorf’s 8th edition with morphological tags and lemmata, based on work by Clint Yale and Maurice Robinson. Ulrik is also the developer of Emdros, an open-source text database engine for annotated text.
MorphGNT 5.08 Released
I’m pleased to announce the release of a new version of [MorphGNT], the morphologically parsed Greek New Testament database made available under a Creative Commons license.
MorphGNT 5.07 Released
I’m pleased to announce the release of a new version of MorphGNT, the morphologically parsed Greek New Testament database made available under a Creative Commons license.
Upcoming new MorphGNT
I’m just about to release [MorphGNT] 5.07 and, shortly after that, a major new release I’ll designate 6.07.
Ordering Goals Rather Than Prerequisites
The outcome of my simulated annealing program is a list of prerequisites to learn along with an indication, every so often, of what new goal has been reached.
Using Simulated Annealing to Order Goal Prerequisites
Back in November, I wrote about programmed vocabulary learning as a travelling salesman problem.
Parts of Speech and Number of Accents
I thought I’d write a quick Python script to check how many accents were on each of the lemmata in [MorphGNT] 5.06.
MorphGNT 5.06 Released
Well, it’s been about a hundred hours work over the last six months, but I’m pleased to announce the release of a new version of [MorphGNT], the morphologically parsed Greek New Testament database made available under a Creative Commons license.
MorphGNT Roadmap
This month I should be doing another release of my morphologically-parsed Greek New Testament. This will be release 5.06. I thought I’d outline my future plans (as they currently stand).
MorphGNT Update
A couple of months ago, I talked about the current process I’m going through to identify errors in my morphologically parsed Greek New Testament, [MorphGNT]. By the end of April, I was down to 400 mismatches I needed to check. At the time, I thought I’d be able to finish going through them by the time I left to go to Europe on holiday.
DATR in Python
I previously talked about wanting to implement the lexicon language DATR in Python. Well, I just received an email from Henrik Weber saying that (apparently inspired by my post) he has gone and done an implementation at http://pydatr.sourceforge.net/
Current MorphGNT Work
For the last few months, I’ve been making corrections to [MorphGNT] by attempting to merge an English translation (NASB) marked with Strong’s numbers with my database. Although it’s a tedious process, it’s revealing numerous errors.
BetaCode to Unicode in Python
BetaCode is a common ASCII transcription for Polytonic Greek. I’ve been dealing with it for around twelve years. (As an aside, back in 1994, I designed a METAFONT for Polytonic Greek that enabled one to use BetaCode in TeX—I typeset my self-published Index to the Greek New Testament with it).
DATR, MorphGNT, RDF and Python
I’ve been revisiting DATR, the lexical knowledge representation language, as a possible format for the next generation of [MorphGNT]. I was previously considering developing my own RDF/graph-based format but I suddenly remembered DATR from my student days and it makes a lot more sense to use it rather than try to build my own.
Thoughts on GNT-NET Parallel Glossing Project
Zack Hubert mentions that I’m thinking about using the NET Bible for a collaborative parallel glossing project.
MorphGNT v5.05 Available
Various corrections.
Best Use of MorphGNT So Far
Zack Hubert has taken my [MorphGNT] and built a GNT Browser that blew me away!
MorphGNT v5.04 and Beyond
I’ve released a new version of my [MorphGNT].
MorphGNT v5.03 available
More corrections now and more coming soon.
MorphGNT v5.02 Available
Some breathing corrections on rho-initial words.
Programmed Vocabulary Learning as a Travelling Salesman Problem
For a while I’ve been interested in how you could select the order in which vocabulary is learnt in order to maximise one’s ability to read a particular corpus of sentences. Or more generally, imagine you have a set of things you want to learn and each item has prerequisites drawn from a large set with items sharing a lot of common prerequisites.
MorphGNT v5.01 Available
Found an accent and breathing problem in both the text and lemma for ABEL, ANNA and ANNAS which is now corrected.
MorphGNT v5.00 Available
At wildly varying intensities over the last ten years, I’ve worked on correcting the UPenn CCAT Morphological Parsed Greek New Testament as a side-effect of larger linguistic analyses I’ve undertaken.
The Bible and the Semantic Web
For many years I’ve been thinking about the application of Semantic Web technology to studying (and presenting the results of the study of) the Bible. However, I never really thought about the application of Bible study (and the tools and techniques developed for it) to the Semantic Web.