Updated Wed Feb 8 19:30:29 EST 2023
If you find errors here, or think the explanation could be better, please let me know post haste. Thanks.
Hi, everyone. I apologize for the somewhat chaotic final 20 minutes of today's studio exercise. In the online instructions I was trying to show how to create a personal word-frequency command (in effect answering Tenzing's question), but the on-screen instructions were not clear or complete or even totally correct, and this threw people off. And as always there's more arcane bits and pieces than I had remembered, since a lot of this is wired into my fingers, and hasn't passed through my brain recently.
So, with that said...
cat $* | tr -sc A-Za-z '\012' | sort | uniq -c | sort -nGoing through this one piece at a time:
cat $*concatenates the contents of all the commandline arguments (filenames); that's what $* means. (Another pattern!)
tr -sc A-Za-z '\012'translates all the characters that are not A-Za-z (-c), squeezing them into a single character (-s), and then replacing them by a newline (\012). This puts each "word" (that is, a sequence of alphabetic characters) on a line by itself. Arcane enough for you?
Now
sort | uniq -c | sort -nsorts the words (which are one per line), converts each sequence of identical words into one line with a count and the word, then sorts the result numerically (-n), so the small counts come out first and the large ones come out last.
Why cat | tr? The issue is that the tr command, unlike most commands, only reads from its standard input (the keyboard, or often a pipe); it doesn't take a filename argument. Using cat this way allows you to use the pipeline with any number of filenames, including none. So, for example, you could convert a file into purely lower case and then count the words:
$ tr A-Z a-z <sonnets.txt | cat | ...
$ ls -l wordfreq -rw-r--r--. 1 bwk fac 57 Feb 8 17:26 wordfreq $ cat wordfreq cat $* | tr -sc A-Za-z '\012' | sort | uniq -c | sort -n
You can say
$ sh wordfreq sonnets.txtto run it directly. This is probably the easiest thing to do -- no fuss, no muss, just use it.
If you want to make it feel more like a real command, you can tell the operating system that it is an executable program by changing its mode to "executable", with the chmod command:
$ chmod +x wordfreq one time only $ ls -l wordfreq -rwxr-xr-x. 1 bwk fac 57 Feb 8 17:26 wordfreq notice those x's? That means its executable.Now you can run it like this:
$ ./wordfreq sonnets.txtNotice the ./ at the front. That tells the shell to look for the command in the current directory (".") rather than in the usual places where commands are stored.
Finally, it is possible to set things up so this command becomes part of your personal repertoire, accessible from anywhere in the file system. That's getting too far into the weeds for us, but if you want to know more, let me know. (Hint: look for shell search path.)