Some Unix Command Line Exercises Using MorphGNT

I thought I’d help a friend learn some basic Unix command line (although pretty comprehensive for this type of work) with some practical graded exercises using MorphGNT. It worked out well so I thought I’d share in case they are useful to others.

The point here is not to actually teach how to use bash or commands like grep, awk, cut, sort, uniq, head or wc but rather to motivate their use in a gradual fashion with real use cases and to structure what to actually look up when learning how to use them.

This little set of commands has served me well for over twenty years working with MorphGNT in its various iterations (although I obviously switch to Python for anything more complex).

Task 0

Clone https://github.com/morphgnt/sblgnt in git.

Task 1

Using wc and the concept of wildcards/globbing (and relying on the fact I have one line-per-word in those files) work out how many words are in the main text of SBLGNT.

Task 2

Using grep and wc work out how many times μονογενής appears. (You might be able to do it with just grep and appropriate options, but try using grep without options and wc and understand the concept of "piping" the output of one command to the input of another)

Task 3

How many verbs (tokens) are there in John’s gospel? (still doable just with grep and wc)

Task 4

How many unique verbs (lemmas) are there in John’s gospel?

(learn how to use awk to extract fields, and how to use sort and uniq in tandem)

Task 5

What are the 5 most common verbs (lemmas) in John’s gospel? (you might want to use head)

Task 6

Get counts in John’s Gospel of how many tokens appear in each tense/aspect (hint: use cut) and write the results to a file called john.txt rather than just output it in the terminal.

Task 7

Come up with your own question that you think could be answered using the types of operations and try it out.