|
What are the main goals of this assignment? Use a symbol table, learn about natural language processing, and learn about Markov models.
What preparation do I need before beginning this assignment? Review the beginning of Section 4.4 (pages 608–617) on using symbol tables in client code. For reference, here is a partial API of the ST data type. Also review the material on the textbook on parameterized data type and generics (pages 566–570). The Key type can be any comparable type, such as String or Integer.
What is the origin of the Markov text generator? It was first described by Claude Shannon in 1948. The first computer version was apparently written by Don P. Mitchell, adapted by Bruce Ellis, and popularized by A. K. Dewdney in the Computing Recreations section of Scientific American. Brian Kernighan and Rob Pike revived the program in a University setting and described it as an example of design in The Practice of Programming. The program is also described in Jon Bentley's Programming Pearls.
Do I have to implement the prescribed API? Yes, or you will lose a substantial number of points.
|
My program won't compile because the compiler can't find the symbol "ST." How can I fix that? The ST data type is not part of Java or stdlib.jar. You need to download ST.java&mdashit's part of markov.zip—and put it in the same directory as MarkovModel.java.
I want to refer to the magic number 128 in more than one method. Any advice? You can define a global constant, as follows:
This code should appear inside the class block, but before any instance variables or methods. By convention, constant variable names should be ALL_UPPERCASE. The static modifier means that every instance method will refer to the same variable (as opposed to instance variables, for which there is one variable per object); the final modifier means that you can't change its value, once initialized.private static final int ASCII = 128;
How do I extract a substring of a given string? The method call s.substring(i, i+k) returns the the k-character substring of s, starting at i. Note that it includes the left endpoint but excludes the right endpoint.
How do I emulate the behavior of a circular string? There are a number of approaches. One way is to append the first k characters of the input text to the input text.
How do I convert a char to an int? A char is a 16-bit (unsigned) integer. Java will automatically promote a char to an int if you use it as an index into an array.
How do I use StdRandom.discrete()? There are two overloaded methods named StdRandom.discrete(). One takes a floating-point array p[] and returns index i with probability equal to p[i]. (The array entries must be nonnegative and sum to 1.) The other takes an integer array freq[] and returns index i with probability proportional to freq[i]. (The array entries must be nonnegative and not all zero.)
|
Should my program generate a different output each time I run it? Yes.
How can I read in the input text from standard input? Use StdIn.readAll(). Do not remove whitespace.
My random text ends in the middle of a sentence. Is that OK? Yes, that's to be expected.
For which values of k should my program work? It should work for all well defined values of k, from 0 up to, and including, the length of the input text. As k gets larger, your program will use more memory and take longer.
My program is slow when T is large because I am concatenating the T characters, one at a time, to form string of length T. What else can I do? Do you need to form the entire string? Why not print the characters, one at a time, as you generate them?
I get an OutOfMemoryException. How do I tell Java to use more of my computer's memory? Depending on your operating system, you may have to ask the Java Virtual Machine for more main memory beyond the default.
The 500m means 500MB, and you should adjust this number depending on the size of the input text.% java-introcs -Xmx500m TextGenerator 7 1000 < input.txt
|
Thoroughly test your MarkovModel. We provide a main() as a start to your testing.
If your method is working properly, you will get the following output:public static void main(String[] args) { String text1 = "banana"; MarkovModel model1 = new MarkovModel(text1, 2); StdOut.println("freq(\"an\", 'a') = " + model1.freq("an", 'a')); StdOut.println("freq(\"na\", 'b') = " + model1.freq("na", 'b')); StdOut.println("freq(\"na\", 'a') = " + model1.freq("na", 'a')); StdOut.println("freq(\"na\") = " + model1.freq("na")); StdOut.println(); String text2 = "one fish two fish red fish blue fish"; MarkovModel model2 = new MarkovModel(text2, 4); StdOut.println("freq(\"ish \", 'r') = " + model2.freq("ish ", 'r')); StdOut.println("freq(\"ish \", 'x') = " + model2.freq("ish ", 'x')); StdOut.println("freq(\"ish \") = " + model2.freq("ish ")); StdOut.println("freq(\"tuna\") = " + model2.freq("tuna")); }
% java-introcs MarkovModel freq("an", 'a') = 2 freq("na", 'b') = 1 freq("na", 'a') = 0 freq("na") = 2 freq("ish ", 'r') = 1 freq("ish ", 'x') = 0 freq("ish ") = 3 freq("tuna") = 0
To test random(), write a loop that calls random() repeatedly, and count how many times a particular character is returned. For example, with model1 above, random("na") should return 'b' about one-half of the time; with model2 above, random("fish") should return 'o' about one-quarter of the time.
|
Be sure to test TextGenerator with different values of k.
% java-introcs TextGenerator 0 100 < input17.txt gaaagaacagcagacgacggaagaaggaggaaaaggaggggaggggggaggaggaagggagaaaggagacagcggaggggacgggaggagaggaggagag
% java-introcs TextGenerator 2 3 < input17.txt gag
|
These are purely suggestions for how you might make progress. You do not have to follow these steps.
Do not save the original text (or the circular text) as an instance variable because no instance method will need this information after the symbol table is initialized.
// st is the symbol table object for (String key : st) { StdOut.print (key + ": ") ... value = st.get(key); // print non-zero frequency for each character in the entry ... }You can implement your test code by either:
Sample output 1 for model1: ab: 1 an: 2 ba: 1 na: 2 Sample output 2 for model1: ab: (a, 1) an: (a, 2) ba: (n, 1) na: (b, 1) (n, 1)
|
These are purely suggestions for how you might make progress. You do not have to follow these steps.
|