A timeline of my adventures in linguistics, philology, humanities computing, and learning science
(a work in progress)
c.1980
- My dad borrows a “microcomputer” with 4K and teaches me BASIC (while learning it himself from a photocopy of a BASIC manual).
- I continue to learn programming by typing in source code listings from books like David Ahl’s BASIC Computer Games.
c.1983
- I play my first of many text adventure games, Bedlam, on a TRS-80 Color Computer, and become fascinated with how it “understands” English. I later read articles in computer magazines about “parsing” and “lexicons”.
1985
- One of my books in Year 6 is Ronald Ridout’s Word Perfect Spelling Book 8 which, amongst other things, teaches me the International Phonetic Alphabet.
- Knowing I’m a huge fan of The Hobbit, my aunt buys me Return of the King. Not having read the first two volumes of Lord of the Rings, I instead study the appendices, including Tengwar and learn more phonetics as a result.
1986
- In my first year of high school, I do a semester of Latin.
c.1990
- I start developing a conlang, SLP, with my best friend, Brett Meyers. We develop a phonetically-motivated writing system and related signing system. I start trying to learn more about phonology and grammar to help in the development of SLP.
1991
- I become a Christian.
- I start a mathematics and physics degree at the University of Western Australia.
- I become interested in Biblical Hebrew and New Testament Greek.
1992
- I develop an interest in analysing biblical texts and languages computationally and start learning about (and implementing) parsing algorithms.
- I decide to switch majors from mathematics/physics to linguistics.
- The University decides to give Internet access to undergraduate students who are interested.
- November: I get Internet access and discover mailing lists and FTP sites.
- November: I write my first post to nt-greek (predecessor of b-greek) asking about computational grammars of Koine Greek.
- November: I write my first post to LINGUIST list asking about a morphologically-tagged Hebrew Bible and computational grammars of Hebrew and Koine Greek. [the fact I didn’t ask for a morphologically-tagged GNT suggests I already had one, probably the Fribergs’, from an FTP site]
1993
- May: I write to nt-greek asking about morphological parsing and stem information. I mention that I have Fribergs’ which has parse information but not lemmatisation. Bob Kraft from UPenn responds, resulting in him putting up the CCAT UBS3 on FTP for me (with the Fribergs’ analysis converted to Packard’s format and lemmatisation from Packard’s analyser).
- I provide various statistics on the Greek New Testament to Richard Hudson of UCL after a query on LINGUIST list.
- July: My first free (GPL) software project (a C program for building a minimal deterministic finite state automata from lists of strings) is announced on the LINGUIST list.
- I start to become very interested in the nascent information service on the Internet known as the World Wide Web and start lurking on the HTML mailing list with people like Dan Connolly and Marc Andreesen.
- I start reading up on SGML to better understand the background to HTML.
1994
- I generate an index of NT lemmas based on my corrections to the CCAT UBS3 analysis. I decide to typeset it using LaTeX but choose to design my own typeface.
- I learn Perl which makes text processing so much easier than it was in C.
- April: I announce MELANOS, a METAFONT font for Greek that, unlike existing polytonic Greek fonts for TeX, is based on betacode (with minor modifications to maintain TeX validity without escaping).
- April: I announce my corrections to the CCAT analysis (mostly in lemmatisation).
- May: I self-publish Index to the Greek New Testament.
- June: Dick Hudson’s discussion note “About 37% of word-tokens are nouns” is published in Language, acknowledging me in a footnote for the Greek New Testament statistics.
- I start learning about TEI and mark up Dante’s Divine Comedy (in Italian) and the Greek New Testament (with morphology) in TEI SGML.
- I start working casually in Unix user support and systems administration at my university. I also start running classes teaching HTML.
- Micheal Palmer announces the project headed up by Robert Funk and Daryl Schmidt to produce a new Hellenistic Greek reference grammar.
- I start a new mailing list (initially called greek-grammar and then hgrk-ling) for scholarly discussion about the reference grammar and the application of linguistics to Hellenistic Greek in general.
- It’s the start of many great online discussions with Micheal about applying linguistics to the study of Hellenistic Greek, although we don’t meet in person until over 20 years later.
- I start a website dedicated to Hellenistic Greek Linguistics.
- I become very interested in Dick Hudson’s Word Grammar and applying it to Ancient Greek.
- That in turn gets me interested in dependency parsing of free word order languages. I come across Michael Covington’s work on this topic.
c.1995
- At some point Bob Kraft said that, rather than send him all my corrections, I should just take over maintaining the CCAT analysis of the GNT.
1995
- January: I announce the Hellenistic Greek Linguistics site.
- I give my first conference talk, at the inaugural AusWeb conference, on the Hellenistic Greek Linguistics pages.
- At the AusWeb conference, I talk to Cathy Record from O’Reilly about writing a “humanities computing” book for them. She thinks Tim O’Reilly, having been a classics major, will love the idea.
- I publish an article in Text Technology (a special TEI issue edited by Lou Burnard) about the Dante and GNT TEI markup.
- I start Electronic New Testament Manuscripts Project (ENTMP) with Tim Finney. Initially Tim’s idea is to make available images and transcriptions of NT manuscripts on CD-ROM but I suggest the Web as an alternative. Tim puts together an advisory board and works on transcription conventions while I build an online database of manuscripts.
- September: I speak at the Australian Linguistics Society conference on my honours thesis topic: Ancient Greek noun phrase structure. The data is gathered by running regular expression searches over strings of morphological tags.
- I get involved with online discussions about what it would take to make SGML more suitable for the Web. I start developing an extension to SGML Open Catalogs for the resolution of formal public identifiers to URLs.
- I also start to explore SGML for encoding things other than texts, such as music notation. I start trying to understand SDML and HyTime.
- I start the process of applying for a PhD with Dick Hudson at UCL with a view to attempting to apply Word Grammar to Ancient Greek.
- I sit the GRE with a view to also applying to do a PhD in the US. I request my GRE score be sent to Stanford, Ohio State, UPenn, UMass Amherst, and MIT.
- My syntax professor and undergrad thesis advisor, Jim Blevins, arranges an internship at Sun Labs East in Boston for the 1996 winter (summer in Australia). I will be the first non-US intern at Sun Labs East.
- For the last three months of the year, I take on a 30-hour-a-week job doing systems and network administration and user support for the university while writing my honours thesis and overloading coursework to finish my degree quicker.
- I do poorly in my final year at university, largely jeopardising my chances of getting into a PhD program and I don’t bother completing my applications for universities in the US.
1996
- While interning at Sun Labs in Boston, I fly over to Long Beach, CA to present a proposal for Delegating SGML Open Catalogs to the SGML Open (later OASIS) meeting. Charles Goldfarb, the inventor of SGML, describes my proposal as “dandy”. Eduardo Gutentag introduces me to Jon Bosak who has recently joined Sun from Novell.
- Java is the hot new thing at Sun so I decide to learn it while surrounded by people like Ken Arnold and Guy Steele. I start the Java Class Warehouse and the strong-java and java-sgml mailing lists.
- I also try to learn DSSSL with a view to implementing it.
- Even though there are opportunities for me to stay in the US, for personal reasons I decide to go back to Australia.
- I take a part-time job at my alma mater as their first webmaster (continuing officially a role I had unofficially the previous year).
- c.May: Jon Bosak invites me to the newly formed W3C SGML working group to help develop what becomes XML. Being in Australia limits my involvement due to time zone and travel difficulties.
- My part-time webmaster job becomes a full-time publications manager job and I’m responsible for all official publications of the University, both print and online. My mandate is to make things like the telephone directory and course catalog database-driven rather than put together in a word processor.
1997
- I start a section on my website for XML information (what eventually becomes the popular XMLINFO website).
- Because the XML-LANG spec draft requires processors support UCS-2 and UTF-8, I dive into Unicode, ISO/IEC 10646, UCS-2, and UTF-8 and start to write documentation to help others understand them.
- I also start a page about the Notation Interchange File Format (NIFF), a binary format for music notation data.
- I’m no longer actively doing much Greek New Testament work but Vincent Broman continues to make my corrected UBS3 analysis available on his site.
- There is increasing interest in me consulting on XML and so near the end of the year, I quit my university publications job and start an XML consulting company.
1998
- I take on a casual (adjunct) lecturing position at Curtin Business School developing and teaching a course Website Management (covering markup, information architecture, basic design).
- I join Standards Australia Technical Committee IT/1/18 (responsible for, amongst other things, SGML)
- I do various consulting jobs and run XML training courses in Australia and Singapore.
- I also start a postgraduate diploma in document management at RMIT but discontinue it after a semester. Distance learning at this point means getting VHS tapes in the mail.
- I start the first implementation of the XSL-FO specification: FOP. I decide to begin prototyping PDF generation in Perl (even though FOP will be Java). Paul Prescod suggests I should try Python instead and use JPython (the predecessor of Jython) for easier migration to Java.
- Tony Fisher puts together a popular online reading environment for the Greek New Testament based on Vincent Broman’s copy of my corrected CCAT analysis.
- I start putting together a proposal to do a PhD at Curtin on XML data modelling.
- Through the research group I’m involved with at Curtin, I start to become interested in content reuse and adaptability for online education.
1999
- I continue to teach at Curtin, consult on XML, run training courses, and speak at conferences.
- Frustrated both with the startup culture in Australia and the shear distance from the action, I decide to entertain offers from startups in the US which I’d previously turned down.
- The first inquiry that comes after making that decision is from a company in Portsmouth, New Hampshire called Bowstreet.
- May: While attending WWW8 in Toronto, I interview with Bowstreet.
- June: I co-write the XML Canonicalization Requirements W3C Note.
- June: I accept a job offer from Bowstreet. I work the rest of the year as a remote contractor until my visa is finalised.
- To make FOP more sustainable as a project, I donate it to the Apache Software Foundation.
2000
- I move to the US to take the position of Director XML Technology at Bowstreet. My job is basically XML standards and conference speaking. I speak at about 20 events a year.
- At one of these conferences, I meet Jonathan Robie who I’ve known on b-greek since around 1996 but who is also involved with the W3C.
- As Bowstreet is a W3C member, I know longer need Invited Expert status and become a member representative, serving on the XML Core Working Group and XSL Working Group.
- Although we use Java at Bowstreet, I start doing more personal projects in Python. With my co-worker Dan Krech, I write an RDF server framework in Python called Redfoot. The RDF library part of it will eventually become rdflib.
- Network World Magazine names me one of “50 on Power’s Edge” ostensibly for my contributions to XML (although I always wondered if someone called in a favour).
2001
- I implement James Clark’s TREX schema language in Python as an open source project called PyTREX. TREX is eventually superseded by RELAX NG.
- I continue to speak at a LOT of conferences on XML and web services.
2002
- I move back to Australia but start consulting to UK company Choreology and help with the founding of US-based mValent, started by some former Bowstreet colleagues.
- I start to get back into working on the Greek New Testament and decide to start working on a comprehensive morphological analysis of my own, independent of the CCAT/Friberg/Packard work.
- Once mValent gets funding, I work for them full-time as Chief Scientist and travel frequently back to the US.
- July: I publish my first Python script for converting from beta code to Unicode.
- September: I have code for generating all the noun forms in the GNT but it’s a huge hack. But it gets me thinking more about the nature of morphological modelling in general. I also start thinking more about acquisition issues and the inference of morphological models from corpora.
2003
- June: I start a project called PyLing for computational linguistics in Python but never get very far with it beyond modelling feature structures to improve my morphological analysis.
- I start to think about a Python-based symbolic system for music composition and analysis I call Sebastian.
2004
- February: I start blogging.
- May: I comment in a blog post that I really need to get back to my morphological analysis work from 2002 but should also come up with a URI scheme for lemmas and word forms.
- November: Not making much progress with my ab initio analysis, I resurrect my corrected CCAT analysis from various sites, convert it to UTF-8 (rather than betacode) and, for the first time in almost a decade, make it available on my own site, branded as “MorphGNT 5.00”. Over the next month I do another five releases with various corrections.
- November: I blog for the first time some thoughts about the order in which vocabulary should be acquired for reading a set text and the relationship to the traveling salesman problem.
- December: I discover Zack Hubert’s website, by far the best online reading environment based on the CCAT analysis to this point.
2005
- January: I toy with the idea of DATR as a format for modelling morphology and think about its relationship to RDF.
- I decide that my interest in the inference of inflectional morphology might be a good PhD topic and start researching what decent research-only linguistics programs would let me do it part-time and remotely. I discover that the University of Essex, whose faculty includes Andrew Spencer, a well-known morphologist, allows this.
- July: I announce that the next major revision (6.XX) of MorphGNT will include a new format. I don’t explain what it is until August but, spoiler alert: it’s the “norm” column. I also say that the goal is for 8.XX to no longer be based on the CCAT/Friberg/Packard work at all but be my ab initio analysis.
- I continue to do a handful of releases of MorphGNT, some of which are the result of close to a hundred hours of checking potential errors based on various heuristics.
- I start a little competition on my website to see who can come up with the best algorithms for the “vocabulary ordering” problem.
- Ulrik Petersen (now Sandborg-Petersen) and I start collaborating. He says he’d emailed me years earlier but I never responded.
- I implement a local Python server to do Leitner-style flashcard testing of vocabulary.
- December: I announce that the flashcard code I’ve been running locally for my own use is being turned into a website called Quisition.
2006
- January: I start going through old backups and putting together all the MorphGNT-related data and code I’ve worked on over the last 12 years and put it all into a hosted Subversion repo.
- January: I implement just enough of the Unicode Collation Algorithm to properly sort polytonic Greek in Python. This is the first version of pyuca.
- January: I build a demo of dynamic interlinears with Javascript.
- March: Ulrik Petersen and I launch morphgnt.org.
- March: I am accepted into the PhD program at the University of Essex.
- May: I give up on PyLing as NLTK does much of what I planned for PyLing (and more!).
- July: I attend the Australian Linguistics Institute and meet Andrew Spencer in person for the first time. I then attend COLING-ACL and get disappointed how little people there care about linguistics.
- September: I attend the Analogy workshop run at the Max Planck Institute for Evolutionary Anthropology at Leipzig, seeing Jim and Juliette Blevins for the first time in a decade and meeting people like Greg Stump, Rob Malouf, Harald Baayen for the first time.
- October: I teach a couple of introductory Greek lessons based on computer-selected fragments from John to learn inductively.