In Polytonic Greek Unicode Still Isn’t Perfect, I enumerated various challenges that still exist with using Polytonic Greek when vowel length needs to be marked. I now have a better appreciation of what solutions are actually realistic.

After discussions with people on the Unicode mailing list, it’s clear the solution is NOT to add more precomposed character code points to Unicode (or rather, such a solution will never be adopted by Unicode). Rather, the solution likely lies in the tools just understanding grapheme clusters. For more background, see Grapheme Cluster Boundaries in the Unicode Standard Annex on Unicode Text Segmentation.

Perl 6 already has support for this: a layer above code points representing what are considered single graphemes even if made up of multiple code points. See, for example, Jonathan Worthington’s slides on Normal Form Grapheme.

So my plan is to at the very least implement a similar approach for Python 3 (unless someone else already has). That will still mean the problem has to separately be solved by:

  • font foundries
  • text editor developers
  • keyboard / input source software developers
  • operating system developers

I’ll try to engage with each of these groups and will keep people posted on my progress.

Thanks to Ken Whistler for making clear that the path forward is not in more precomposed characters but in working with system vendors and font foundries.

Thanks to Markus Scherer and Elizabeth Mattijsen for their pointers to TR29 and the Perl 6 work.

UPDATE (2016-12-04): Now see Diacritic Stacking in Skolar PE Fixed.