COS 333 Assignment 5: The unRegistrar
Due midnight, Friday, March 26.
Note that this is after spring break. There are concurrent project
obligations, however, so use the time wisely.
No extensions on this deadline.
Tue Mar 2 08:44:13 EST 2010
The Registrar's web
site leaves something to be desired when all you want is a quick
look at a few courses. Fortunately, much registrar data is freely
available, so it's possible to make a private version that might be more
satisfactory at least along this dimension. This assignment is a
somewhat open-ended exercise in using Ajax technology to make a highly
responsive alternative.
My own quick and dirty version is
the unRegistrar,
which you can use as a starting point. It includes minimal Ajax
functionality as described in class, and simple tooltip code adapted
from
lixlpixel.org.
You will soon see that although it is responsive and
easy to use, it's also dumb and the code is sleazy. Your task is to
make it somewhat smarter and add some new features while preserving its
speed, simplicity and convenience for ad hoc queries like "what
QR classes start at 1:30 on Monday and Wednesday?" (The version on
my web page does a bit more than the code provided; you can replicate
features of my web-page version if they appeal.)
1. Something Old
Your version must include any three of these features, presented in
any way you like:
- It should enable searching for the standard
3-letter department codes for
departments and programs, but only in context; that is, "cos" if clearly
by itself should yield CS courses, but not items whose descriptions
merely include strings like "cost" and "Costa Rica". There are some
3-letter codes like "art" and "his" that are also 3-letter words; it
would be nice if you could do something sensible with them.
- It should enable easy and unambiguous searches for the
distribution codes like QR
and HA. Identifying "qr" is easy, since it appears nowhere in English
text, but many of the other codes are parts of English text. This is a
variant of the problem in the previous paragraph. Neither feature has to work
perfectly but they should mostly get things right.
- Add something easy to use and not too space intensive that easily
converts cryptic department and program codes like "QCB" into
official names like "Program in Quantitative and Computational Biology".
- Add something similar for distribution codes, to map
for example "EM" to "Ethical Thought and Moral Values".
- Add something that will convert the cryptic
5-letter building codes
into the full building name.
2. Something New
You must add two new features of your own. If nothing new
comes to mind, consider some of these:
- My version does not handle courses like PHY 104 that have two or more
lectures; fix it so it displays all lectures, not just the last one.
- My version does not display precept information; include precepts
if you have a sensible way to display them.
- The data includes prerequisites info but my code makes no use of
it; you might include some.
- I have made a stab at handling non-USASCII characters but the job
is not complete; you could fix more of those.
- My reg.cgi uses grep and quietly processes regular expressions like
"[123]:30" for users who know about them. Is there some way to use RE's
more effectively without complicating things for users who don't realize
that they exist? What about the wildcard version of RE's?
- Alternatively, how about bullet-proofing it against attempts to
run programs like a shell on the server, by sanitizing input queries.
- Allow the output to be sorted in various ways, e.g., by time of day
or by instructor.
- Add a way to complement time periods, or have richer ranges, like
"not before 2:30 pm".
- Provide some way to save the information about some course(s) on
the page, perhaps by catching onMouseDown events, so that multiple
search results can be accumulated.
- The scripts only present data for the current semester, but
some previous semesters are there too. Add a way to make (some or all of?)
that information readily accessible.
- Is there any summary information that might be worth displaying?
- You can integrate other databases if you like; for example,
one useful registrar site eventually reveals
current enrollments,
though it is slow and awkward.
The primary goal is functionality, with esthetics a secondary but still
relevant consideration. Provide three of the
numbered features above and two additional features of your own,
for a total of five. More is ok if you're on a roll.
The directory a5data includes the raw materials
that you need to get started. My unRegistrar code includes several ugly Awk
scripts that convert the registrar's information into nicer form, the
CGI script reg.cgi that searches it, and the HTML file
reg.html that includes the basic tooltip and Ajax code. The
files foo_* contain processed data from the Registrar's web
site for Spring 2010. The script
get_all.awk explains briefly what each
one contains. These are all in a tar file
a5data.tar
that you can download.
3. Advice
To get started,
- [You can skip this step if you decide you're not going to use
CGI at all.] Get access to some cgi server. You can use the
campuscgi facility, but
it can be slow and its software is often very dusty. The CS server is
likely to be much better, but you must have a CS account (which is
available to all class members
upon application).
You can use your own machine but it has to remain up while we're
grading.
- Download the tar file a5data.tar
and extract it into a subdirectory, for example
/usr/campuscgi/your_netid/a5 if you use campuscgi, or
/u/your_netid/public_html/a5 if you use CS. You
should keep all the files in this one directory so you can create a
submission from them when you're ready.
- Important: Make sure it works for you in its current form
before you start modifying it. This has been tested but slipups are
always possible.
You're welcome to ignore my code entirely, and you can use any tools
and languages that get the job done. The assignment is meant to give
you some hands-on experience with Ajax and Javascript, but not to take a
huge amount of time, so don't kill yourself.
Your code must work with Firefox.
Safari would be nice but not if it takes extra work.
I have been unable to make my code work properly with Internet
Explorer (mostly because of CRLF issues), and I see no
reason why you should waste your time on IE either.
Here are some other hints:
- Firefox's Javascript Console (on the Tools menu) is helpful for
debugging Javascript code; the Firebug add-in is invaluable. There are
also plugins called Web Developer and DOM Inspector. I think that
Firebug is the most useful, but your mileage may vary. No matter
which combination you use, you will find them invaluable.
(If you must use Safari, you can enable a debugging menu with this Terminal command:
defaults write com.apple.Safari IncludeDebugMenu 1
It seems to provide about the same as Firebug does.)
- You have to run get_all to make a reg.txt; the
latter is not part of the distribution.
- Make sure that permissions in your campuscgi or public_html
directory allow an ordinary user to run your code and access
your files; the server is probably running your scripts as user "none",
not as you.
- The campuscgi and CS machines run Solaris, not Linux; that's what's
running when you run the script(s) via reg.html. This means
that what you run when logged in directly is not
necessarily what's running via the browser. I got bitten by this myself
by using a search path in reg.cgi that was fine for Linux but
not for Solaris.
- You do not have to use Awk! It's good for some things but it is
not the most expressive language in the world and it has some surprising
behaviors. I wrote my partial solution in Perl, based on example code
presented in class. Python is good too. You could even write in Java,
which may be more familiar, though I think less expressive.
- My code does almost everything on the server, so the
reg.txt file is a mess. It might be easier to send more
straightforward text or JSON to the browser, and process it with
Javascript, thus avoiding CGI entirely. I haven't tried this approach
but it might be worth exploring.
- Print statements are your friend! Those residual echo statements
in reg.cgi and the commented-out prints in various scripts are examples.
When you're working in an unfamiliar environment with an unfamiliar
language and unfamiliar tools, verifying each step by printing input and
output is much more efficient than beating your head against the wall.
Work your way through one line at a time if necessary: print what came
in and what went out, to see if they are correct. (That's how I
ultimately figured out my search path problems, though it took longer
than it should have because I forgot this cardinal principle.)
- Watch out for infinite loops and runaway processes.
- This is not meant to be a time-consuming exercise, nor is a lot of
code necessary if you think clearly and cut the right kinds of corners.
My Perl script is under 20 lines long aside from some static data
structures, and it's very mundane. In hindsight I could have done it
nearly as easily in Awk, though the latter doesn't have an explicit
case-insensitive RE match. An example of corner-cutting: the static
data structures were created with a text editor; there's no need to write
code to create them since they don't change over a semester.
- Here's a useful Awk feature, using FILENAME to
select what actions to perform on different input files:
awk '
FILENAME == "distcode.txt" { action done only on lines in distcode.txt }
FILENAME == "reg.txt" { action done only on lines in reg.txt }
' distcode.txt reg.txt
This lets you use the implicit input loop rather than explicit getline
statements.
- Another useful feature that might help pass in a query string to an
Awk program:
awk -v qs="$q1" -f whatever.awk reg.txt
The -v argument (of which there may be more than one) sets an Awk variable
to a value before the Awk program begins execution.
4. Submission
You must use the names reg.html, reg.cgi, and
get_all for the web page, the cgi script and the code that
creates your data file(s). Your get_all must create the output
file(s) that reg.cgi will read, as mine does, so we can
experiment with it if we want to. [There need not be a reg.cgi
if you don't use one.]
Create a README file with one paragraph for each feature that you
added so we can see what you had in mind. A few sentences each should
be enough, so it probably won't be over a page long. For example, it
might say
Feature 3: Displays expanded form "Computer Science" when the mouse
passes over 'COS' in the one-line display, and similarly for other
departments.
We will run your code on whatever server you have it running on. The
first paragraph of the README must clearly state the full URL for accessing your
system, wherever that is; for example, mine is
http://www.cs.princeton.edu/~bwk/a5/reg.html
which is the CS cgi server. If you want us to use your own or
some other machine, make sure it works and stays up for the
duration of grading.
Collect all your files (but not the registrar data) into a single tar file:
tar cf a5.tar reg.html reg.cgi get_all README other_files...
and submit a5.tar using
this Dropbox link.
This will ask you to upload the README file separately, but include
it in the tar file as well.
We will assess your submission primarily on whether it correctly
implements the features requested and the new features you added, how
well it handles interesting queries, and how easy and natural it seems;
esthetics are a secondary concern but not irrelevant.
Please follow the rules on what to submit.
It's a big help if your submission arrives in the right
form, and your programs do exactly what is asked for.
Acknowledgement
Many thanks to Eirik Bakke '08, who provided the initial scripts to
extract raw information from the Registrar's database, and helped me to
make sense of the data.