I thought I’d help a friend learn some basic Unix command line (although pretty comprehensive for this type of work) with some practical graded exercises using MorphGNT. It worked out well so I thought I’d share in case they are useful to others.
The point here is not to actually teach how to use bash
or commands like grep
, awk
, cut
, sort
, uniq
, head
or wc
but rather to motivate their use in a gradual fashion with real use cases and to structure what to actually look up when learning how to use them.
This little set of commands has served me well for over twenty years working with MorphGNT in its various iterations (although I obviously switch to Python for anything more complex).
Task 0
Clone https://github.com/morphgnt/sblgnt in git.
Task 1
Using wc
and the concept of wildcards/globbing (and relying on the fact I have one line-per-word in those files) work out how many words are in the main text of SBLGNT.
Task 2
Using grep
and wc
work out how many times μονογενής appears. (You might be able to do it with just grep
and appropriate options, but try using grep
without options and wc
and understand the concept of “piping” the output of one command to the input of another)
Task 3
How many verbs (tokens) are there in John’s gospel? (still doable just with grep
and wc
)
Task 4
How many unique verbs (lemmas) are there in John’s gospel?
(learn how to use awk
to extract fields, and how to use sort
and uniq
in tandem)
Task 5
What are the 5 most common verbs (lemmas) in John’s gospel? (you might want to use head
)
Task 6
Get counts in John’s Gospel of how many tokens appear in each tense/aspect (hint: use cut
) and write the results to a file called john.txt
rather than just output it in the terminal.
Task 7
Come up with your own question that you think could be answered using the types of operations and try it out.