COS 333 Assignment 5

COS 333 Assignment 5: The unRegistrar

Due midnight, Friday, March 26. Note that this is after spring break. There are concurrent project obligations, however, so use the time wisely. No extensions on this deadline.

Tue Mar 2 08:44:13 EST 2010

The Registrar's web site leaves something to be desired when all you want is a quick look at a few courses. Fortunately, much registrar data is freely available, so it's possible to make a private version that might be more satisfactory at least along this dimension. This assignment is a somewhat open-ended exercise in using Ajax technology to make a highly responsive alternative.

My own quick and dirty version is the unRegistrar, which you can use as a starting point. It includes minimal Ajax functionality as described in class, and simple tooltip code adapted from lixlpixel.org. You will soon see that although it is responsive and easy to use, it's also dumb and the code is sleazy. Your task is to make it somewhat smarter and add some new features while preserving its speed, simplicity and convenience for ad hoc queries like "what QR classes start at 1:30 on Monday and Wednesday?" (The version on my web page does a bit more than the code provided; you can replicate features of my web-page version if they appeal.)

1. Something Old

Your version must include any three of these features, presented in any way you like:

It should enable searching for the standard 3-letter department codes for departments and programs, but only in context; that is, "cos" if clearly by itself should yield CS courses, but not items whose descriptions merely include strings like "cost" and "Costa Rica". There are some 3-letter codes like "art" and "his" that are also 3-letter words; it would be nice if you could do something sensible with them.
It should enable easy and unambiguous searches for the distribution codes like QR and HA. Identifying "qr" is easy, since it appears nowhere in English text, but many of the other codes are parts of English text. This is a variant of the problem in the previous paragraph. Neither feature has to work perfectly but they should mostly get things right.
Add something easy to use and not too space intensive that easily converts cryptic department and program codes like "QCB" into official names like "Program in Quantitative and Computational Biology".
Add something similar for distribution codes, to map for example "EM" to "Ethical Thought and Moral Values".
Add something that will convert the cryptic 5-letter building codes into the full building name.

2. Something New

You must add two new features of your own. If nothing new comes to mind, consider some of these:

My version does not handle courses like PHY 104 that have two or more lectures; fix it so it displays all lectures, not just the last one.
My version does not display precept information; include precepts if you have a sensible way to display them.
The data includes prerequisites info but my code makes no use of it; you might include some.
I have made a stab at handling non-USASCII characters but the job is not complete; you could fix more of those.
My reg.cgi uses grep and quietly processes regular expressions like "[123]:30" for users who know about them. Is there some way to use RE's more effectively without complicating things for users who don't realize that they exist? What about the wildcard version of RE's?
Alternatively, how about bullet-proofing it against attempts to run programs like a shell on the server, by sanitizing input queries.
Allow the output to be sorted in various ways, e.g., by time of day or by instructor.
Add a way to complement time periods, or have richer ranges, like "not before 2:30 pm".
Provide some way to save the information about some course(s) on the page, perhaps by catching onMouseDown events, so that multiple search results can be accumulated.
The scripts only present data for the current semester, but some previous semesters are there too. Add a way to make (some or all of?) that information readily accessible.
Is there any summary information that might be worth displaying?
You can integrate other databases if you like; for example, one useful registrar site eventually reveals current enrollments, though it is slow and awkward.

The primary goal is functionality, with esthetics a secondary but still relevant consideration. Provide three of the numbered features above and two additional features of your own, for a total of five. More is ok if you're on a roll.

The directory a5data includes the raw materials that you need to get started. My unRegistrar code includes several ugly Awk scripts that convert the registrar's information into nicer form, the CGI script reg.cgi that searches it, and the HTML file reg.html that includes the basic tooltip and Ajax code. The files foo_* contain processed data from the Registrar's web site for Spring 2010. The script get_all.awk explains briefly what each one contains. These are all in a tar file a5data.tar that you can download.

3. Advice

To get started,

[You can skip this step if you decide you're not going to use CGI at all.] Get access to some cgi server. You can use the campuscgi facility, but it can be slow and its software is often very dusty. The CS server is likely to be much better, but you must have a CS account (which is available to all class members upon application). You can use your own machine but it has to remain up while we're grading.
Download the tar file a5data.tar and extract it into a subdirectory, for example /usr/campuscgi/your_netid/a5 if you use campuscgi, or /u/your_netid/public_html/a5 if you use CS. You should keep all the files in this one directory so you can create a submission from them when you're ready.
Important: Make sure it works for you in its current form before you start modifying it. This has been tested but slipups are always possible.

You're welcome to ignore my code entirely, and you can use any tools and languages that get the job done. The assignment is meant to give you some hands-on experience with Ajax and Javascript, but not to take a huge amount of time, so don't kill yourself.

Your code must work with Firefox. Safari would be nice but not if it takes extra work. I have been unable to make my code work properly with Internet Explorer (mostly because of CRLF issues), and I see no reason why you should waste your time on IE either. Here are some other hints:

Firefox's Javascript Console (on the Tools menu) is helpful for debugging Javascript code; the Firebug add-in is invaluable. There are also plugins called Web Developer and DOM Inspector. I think that Firebug is the most useful, but your mileage may vary. No matter which combination you use, you will find them invaluable. (If you must use Safari, you can enable a debugging menu with this Terminal command:
```
     defaults write com.apple.Safari IncludeDebugMenu 1
```
It seems to provide about the same as Firebug does.)
You have to run get_all to make a reg.txt; the latter is not part of the distribution.
Make sure that permissions in your campuscgi or public_html directory allow an ordinary user to run your code and access your files; the server is probably running your scripts as user "none", not as you.
The campuscgi and CS machines run Solaris, not Linux; that's what's running when you run the script(s) via reg.html. This means that what you run when logged in directly is not necessarily what's running via the browser. I got bitten by this myself by using a search path in reg.cgi that was fine for Linux but not for Solaris.
You do not have to use Awk! It's good for some things but it is not the most expressive language in the world and it has some surprising behaviors. I wrote my partial solution in Perl, based on example code presented in class. Python is good too. You could even write in Java, which may be more familiar, though I think less expressive.
My code does almost everything on the server, so the reg.txt file is a mess. It might be easier to send more straightforward text or JSON to the browser, and process it with Javascript, thus avoiding CGI entirely. I haven't tried this approach but it might be worth exploring.
Print statements are your friend! Those residual echo statements in reg.cgi and the commented-out prints in various scripts are examples. When you're working in an unfamiliar environment with an unfamiliar language and unfamiliar tools, verifying each step by printing input and output is much more efficient than beating your head against the wall. Work your way through one line at a time if necessary: print what came in and what went out, to see if they are correct. (That's how I ultimately figured out my search path problems, though it took longer than it should have because I forgot this cardinal principle.)
Watch out for infinite loops and runaway processes.
This is not meant to be a time-consuming exercise, nor is a lot of code necessary if you think clearly and cut the right kinds of corners. My Perl script is under 20 lines long aside from some static data structures, and it's very mundane. In hindsight I could have done it nearly as easily in Awk, though the latter doesn't have an explicit case-insensitive RE match. An example of corner-cutting: the static data structures were created with a text editor; there's no need to write code to create them since they don't change over a semester.
Here's a useful Awk feature, using FILENAME to select what actions to perform on different input files:
```
 awk '
   FILENAME == "distcode.txt" { action done only on lines in distcode.txt }
   FILENAME == "reg.txt" { action done only on lines in reg.txt }
 ' distcode.txt reg.txt
```
This lets you use the implicit input loop rather than explicit getline statements.
Another useful feature that might help pass in a query string to an Awk program:
```
  awk -v qs="$q1" -f whatever.awk reg.txt
```
The -v argument (of which there may be more than one) sets an Awk variable to a value before the Awk program begins execution.

4. Submission

You must use the names reg.html, reg.cgi, and get_all for the web page, the cgi script and the code that creates your data file(s). Your get_all must create the output file(s) that reg.cgi will read, as mine does, so we can experiment with it if we want to. [There need not be a reg.cgi if you don't use one.]

Create a README file with one paragraph for each feature that you added so we can see what you had in mind. A few sentences each should be enough, so it probably won't be over a page long. For example, it might say

Feature 3: Displays expanded form "Computer Science" when the mouse
passes over 'COS' in the one-line display, and similarly for other
departments.

We will run your code on whatever server you have it running on. The first paragraph of the README must clearly state the full URL for accessing your system, wherever that is; for example, mine is

  http://www.cs.princeton.edu/~bwk/a5/reg.html

which is the CS cgi server. If you want us to use your own or some other machine, make sure it works and stays up for the duration of grading.

Collect all your files (but not the registrar data) into a single tar file:

	tar cf a5.tar reg.html reg.cgi get_all README other_files...

and submit a5.tar using this Dropbox link. This will ask you to upload the README file separately, but include it in the tar file as well.

We will assess your submission primarily on whether it correctly implements the features requested and the new features you added, how well it handles interesting queries, and how easy and natural it seems; esthetics are a secondary concern but not irrelevant.

Please follow the rules on what to submit. It's a big help if your submission arrives in the right form, and your programs do exactly what is asked for.

Acknowledgement

Many thanks to Eirik Bakke '08, who provided the initial scripts to extract raw information from the Registrar's database, and helped me to make sense of the data.