|
How do I parse the command-line inputs? The easiest solution is something like
As usual, our emphasis in 226 is not on bulletproof input validation.int main(int argc, char *argv[]) { int K, M; if (argc != 3) exit(EXIT_FAILURE); /* means that the command line did not contain all required argumnents. */ K = atoi(argv[1]); /* Converts first command line argument (a string) to an integer */ M = atoi(argv[2]); /* Converts second command line argu,ent to an integer */
What is the alphabet size? The inputs consist of ASCII text, not just lowercase letters and whitespace. Do not assume it is 26 or 96 or some other arbitrary number.
My random process dead ends because it generates a match for for the final k characters of the input file, but this string does not appear anywhere else in the input. That's OK. If you prefer, you can repeatedly restart your code from the beginning until you get the right number of characters.
How do I randomly choose from a set of symbols with associated probabilities? This is best explained using an example. Assume you want to choose "A" with probability 0.5, "B" with probability 0.25, "C" with probability 0.1 and "D" with probability 0.15. The rand() function (defined in <stdlib.h>) returns a random integer uniformly distributed in the range 0..RAND_MAX-1 (RAND_MAX is also defined in <stdlib.h>). The following hardwired piece of code does the trick for this example in a simple (but suboptimal) way:
Of course, in the general case, when the number of symbols and their associated probabilities are arbitrary, something more general should be written... (Note that the solution described in the above example is suboptimal. It takes O(N) time to pick a symbol among N symbols. Can you do this in O(log(N)) time? Save this optimization for last, when everything else is working, for extra credit).r = rand(); if (r < RAND_MAX * 0.5) symbol = 'A'; else if (r < RAND_MAX*(0.5+0.25)) symbol = 'B'; else if (r < RAND_MAX*(0.5+0.25+0.1)) symbol = 'C'; else symbol = 'D';
Should my program generate a different output each time I run it? Yes. You can do this by setting the random number generator seed according to the system clock. Include <stdlib.h> and <time.h> and the following line (once only) at the beginning of the main routine:
Note that you might want to disable this feature when debugging.unsigned int seed = time(NULL); srand(seed);
Do I need to print the distinct keys or the frequency lists in lexicographic order like in the reference solutions? No, you are free to print them out in whatever order is most convenient. However, if you want to compare your solutions with our reference solutions, the lexicographic ordering might make things easier.
|
Input and output. Here are a number of sample input files.
Our program ignores the last character read in because this is always a newline character in our sample files. This helps prevent the dead-ending on some small test files.
Reference solutions. For reference, we provide executable code for our solution in Windows, Solaris, and OS X. We encourage you to develop a more efficient program! The executables can be found in the ftp site. It is certainly possible that you may be able to beat our solution, and it is also possible that you will get most/all of the credit even if your method is not quite as efficient.
|
Submit the three clients described in the assignment page, freqcount.c, langmod.c and textgen.c, together with any additional .c or .h files you need in your project. Also submit the readme.txt file, using the skeleton found in the ftp site. We will compile your program with "gcc226" 3 times, each time with one of the clients and the rest of the .c files. You should practice sound design principals.
Here is the template readme.txt file. you are required to use it. In this readme.txt template file, the term "program" refers to the third client (textgen). It should contain the following information:
Unix users may find the shell script timeit.csh useful for computing a table of CPU times.
|