Updated Mon Feb 6 19:56:24 EST 2023
This assignment is practice in using the file system and Unix commands that you learned in the second class, using the file shakespeare.zip, which contains William Shakespeare's sonnets. You should already have downloaded this as part of the first assignment.
$ pwd make sure you're in the right directory $ cd hum307/shakespeare if necessary $ curl -L -k 'www.hum307.com/shakespeare.zip' -o shakespeare.zip if necessary -L says to follow any redirection of links -k says to allow insecure server connections probably not a good idea in general but ok here $ curl --help produces a compact list of options; --help works for many commands $ man curl produces distinctly non-compact but thorough documenation
Repeat the Barrett Browning experiments from Studio 2, but with Shakespeare's sonnets. You should repeat the "love" experiments from EBB's sonnets (there are some similarities and some differences), but you should aso explore something beyond the word "love". This is a chance to try other aspects of how language was used.
How many sonnets are there? think about ls and pipes How many lines, words and characters are there in each sonnet did you see something odd? hint: wc | sort By what factor did the zip process shrink the original input? hint: wc * and variations
"Love" is a major theme in both sets of sonnets.
How often does "love" appear literally? How often including variants like "Love", loving, beloved, etc.?Compare the frequency of "love" in the two collections by using wordfreq to count the number of occurrences, and wc to count the total number of words.
What percent of EBB's words are "love"? What percent of WS's words are "love"?
How would you find lines that only have Roman numerals in shakespeare.txt? What grep command would print only the lines of the sonnets, but not the Roman numerals? What other grep command could you use to do the same thing? hints: ^ matches the beginning of a line $ matches the end of a line [abcde] matches any one of those letters [abcde]* matches zero or more occurrences grep -v prints only lines that *don't* match
Do something similar to "love", but with any words, phrases or anything else that appeals. You can use EBB or WS or both. Tell us what you tried, with a quick summary of results,
You might find it easiest to either copy and paste the output from running the commands, or redirect their output into a file that you then include. These are good ways to avoid transcription errors that might happen if you retype anything.
Note: submit a .txt file, please, not a Word file. Use your favorite (or least unfavorite) text editor, like nano. I'm not worried if the upload is mangled, but I do want you to be comfortable with a real text editor.
The puzzle is to find all the words you can make with at least five letters and using the central letter ("Q" here). Lower-case words only. Your score is 1 for each word, with 3 points for a word that uses all seven letters.
The file web2 in the Datasets folder on the Google Drive contains the word list from Webster's Second International Dictionary, if you want to see how well your code works.
No extra credit for this beyond the satisfaction of knowing that you understand REs better than most.