CS333 Project: Preliminary Description (Spring 2001)

Mon Feb 19 16:10:06 EST 2001

Overview

The CS333 project is an opportunity to work on a task larger and more elaborate than any of the assignments: design and write a significant piece of software, working in groups of 3 or 4 people.

The intent is that this will not be just hacking, but a serious attempt to simulate some aspects of reality: choosing something suitable to work on, planning how to get it together, designing it before building it (though allowing for the inevitable changes of direction as you learn), building it in stages, testing it thoroughly, and documenting and presenting the result, all as part of a small team. If you do it well, this should be something that you can show with pride to friends and prospective employers.

The project will involve many of the issues of software engineering as they occur in small, multi-person real-world projects. Some of this material will be discussed in class, and some will be found in assigned readings.

The considerations affecting the form of the project are:

All projects have to have enough common structure to make it feasible for the instructor and TA's to manage 10 to 15 projects, and to grade them fairly and uniformly.
At the same time, the basic project should have plenty of room to try out interesting ideas, and freedom to use a broad spectrum of languages and tools.
It should be relevant to the general themes and topics of the course, which are programming techniques, languages, tools, and interfaces; it should not stray too far into topics better covered in other courses, like graphics, games, display wall, compilers, networks, and databases.
It should break down naturally into a few components so it can be done in teams of 3 or 4 people, and into stages so that progress can be monitored.
It should be of some intrinsic interest and potential utility.

Project Definition

A large number of real-world systems are based on what is sometimes called the "three-tier model": a user interface, some kind of data management and access, and some process(ing) between them. In some ways, this is merely standard client-server with a new name, but the term does capture a sometimes-useful distinction among components.

Many online shopping services use this architecture. For example Amazon has a web-based user interface; the data underneath is fundamentally a large book catalog and customer information; and the process includes a wide variety of searching and retrieval operations.

News and financial services are analogous: again, a user interface, a background information gathering and filing service, and mechanisms that let a client register for, access and process interesting items.

Even last year's occasionally maligned "build an IDE" project took this form: an interface, perhaps but not necessarily web-based, for editing and control; a persistent store for recording programs and other information; and mechanisms for editing, compiling, and debugging programs.

The project this year will be to build

a 3-tier system for any application that appeals to you.

This is a very open-ended project, so the big problem is more likely to be defining a suitably bite-sized topic than inventing one in the first place. Almost every web service will suggest something, perhaps novel or perhaps "We can do that much better"; either would likely be fine. Hiding, selecting, or merging data from existing web services might be a possibility; shopping and other bots are an instance. Yahoo is a good place to start; one of their core competencies seems to be to invent such special services.

Look around the campus for other possibilities: online maps, tours, notification services, databases, and so on are all potentially interesting and feasible (though make sure that the information that you want to use is actually available -- concerns for privacy, property and other people's turf can all get in the way of a great idea). Some of the best projects come from noticing a place where some task is done by hand or poorly by machine when it could be really well done by a suitable program.

The assignment is to create such a system, using whatever combination of existing tools and new code is necessary. The functionality that you must provide includes the following:

User interface: This is what the user sees: a graphical interface that supports some kind of direct interaction between user and system. Web-based interfaces are likely to be most common, but it's also fine to build something that has nothing to do with a browser.
Process: This is probably the "value added" part, since this is where you process and glue together whatever the user wants with information sources and repositories.
Data management: Somewhere there's some data, whether maintained by your system on some local machine, or accessed as needed from the web, or synthesized on the fly. You don't have to use a database system (though that might be interesting), but you do have to have some component that involves recording state and using it in a subsequent interaction.

Some Options:

Distributed or local: The most typical systems are distributed: the user interface runs on a client machine and the data is stored on some server. The processing might be at either end, or some of each.

Languages, tools and environment: You can use any combination that appeals for any aspect -- web-based or stand-alone; PC or Unix or Linux; Java or VB or GTk or Javascript; CGI or JSP or roll your own. The only restriction here is that your system must be readily accessible to me and the TA's for grading, and accessible enough that you can demo it effectively in the CS building using departmental connections to the rest of the campus.

Make versus buy: Much modern software development is done by combining components that others have created. You can do as much of this as you like, as long as the finished product acknowledges the work of others, and has enough contribution of your own.

Things to Think About

I have attempted to make this as open-ended as possible. This is your chance to invent something new, or to do something better than others do. You might think of this as practice for a new e-business or e-service, the sort of e-thing that made some of your predecessors here (like Jeff Bezos and Phil Goldman) into e-zillionaires, at least before the recent dot-com collapse that has left them merely multi-millionaires. Of course if you have a really good idea and sell out to Microsoft or do an IPO by the end of the semester, it's an automatic A+.

But you have only about 10 weeks, so you can't get too carried away. Part of the assignment is to plan exactly what you are going to create, what each team member will be responsible for, and what interfaces you will require between components so independently-created pieces will fit together. What schedule will you follow? How can you work on different parts in parallel and keep them integrated? How will you ensure, if your time estimates are too optimistic (as they inevitably will be), that you have a working subset, rather than a non-working collection of partially completed pieces? How will you convince skeptical TA's and instructor that you are making progress, not just writing code?

Since the project will involve multiple people, a significant part of the task is to divide the work into reasonable pieces, with planned interfaces. Each of these components must be a separate entity, which can be implemented and tested separately. But you will have to think carefully about the interfaces between them. So that each person contributes equitably, it is also necessary to be explicit about the roles of each person on the team. Each person must write a reasonable fraction of the code for the system, no matter what other role they play.

The project will represent about 60 percent of the course grade. All members of a team will get the same grade (with the potential for a small correction factor assigned anonymously by other team members), so you must make sure that you all agree on who will do what, by when, and to what standard. This is an important point: at the end everyone must sign the project documentation attesting that they have done a reasonable share of the work.

Here are some things you should start thinking about now; this will list be augmented over the next few weeks, and we will do a fair amount of talking about it in class as well.

How big a task are you proposing to take on? It should be big enough to justify spending half a semester on it, but not so big that it's unrealistic.
What are the components or pieces going to be? How will you organize your project into stages so that you can stop at any point with something complete accomplished? You don't want to do something that requires that everything be finished before anything works -- "big bang" projects are a bad idea.
What will you learn from it? It's good to try a project that will force you to learn something new, like a language or a tool or a system, but you don't want to take on too many new things all at once.
What technical issues lie in the critical path? If you plan to use some particular tool or language or component or communication technique, what quick experiments can you perform now to be sure that it works and does what you need? Connecting components across a network in the face of security restrictions is sometimes harder than might appear; you want to know about potential roadblocks early. If you need someone else's data or other resources, make sure right up front that you can get access to them.
How will you divide the work among the members of the team? This is partly personalities and partly interests and aptitudes. Some people are better coders, others write English better; some plan ahead, others work well under last-minute pressure. Some take charge naturally; others are happier with a defined task set by someone else. Try to organize yourselves to match the work to the people, and try to have a balanced group -- a team of superprogrammers can have a harder time than a team of mere mortals who work together effectively. Groups of 3 or 4 are best; I will permit groups of 5 only very reluctantly.

Schedule

The following schedule is subject to change in detail but the spirit is right. Take note.

Since the project involves more than half a semester, it is possible to develop a significant piece of software. At the same time, serious planning and steady work will be required for your project to be completed on time. To encourage planning and organization, the project will have several deadlines that will be strictly enforced. [Some of the dates near the end may have to be shifted a bit; there will be adequate warning.] Carl and Nitin will each be responsible for primary supervision of half the teams; I will act as backup and second-level management. You will be required to meet with your TA manager approximately once a week after spring break; this is a graded component of the project.

We will also try to follow good software engineering practice as much as possible; in particular, this will mean using checklists and other planning forms that help organize and monitor activity. I will lean heavily on tools by Steve McConnell.

   February 2001
 S  M Tu  W Th  F  S
             1  2  3
 4  5  6  7  8  9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24	preliminary announcement
25 26 27 28
   March 2001
             1  2  3
 4  5  6  7  8  9 10    meeting with bwk by 13th
11 12 13 14 15 16 17	initial proposal
18 19 20 21 22 23 24	spring break
25 26 27 28 29 30 31	draft plan, TA meetings
   April 2001
 1  2  3  4  5  6  7
 8  9 10 11 12 13 14	prototype
15 16 17 18 19 20 21
22 23 24 25 26 27 28	alpha test
29 30
   May 2001
       1  2  3  4  5	beta test
 6  7  8  9 10 11 12	demo days
13 14 15 16 17 18 19	Dean's date

March 13: Preferably well before this date, I would like to meet with each team for at most half an hour to discuss your plans before they are set too firmly.

March 16: Your team must have been formed, and you must have sketched out what you expect to accomplish in your project, and how. Each team must submit a brief (about 2 pages) description of their project, along with a list of the people involved and their anticipated roles. If you need help finding partners, use the newsgroup. If you need help finding a topic, please talk to me and the TA's; we're happy to respond to ideas, make random suggestions, and help steer you. But it is your responsibility to come up with a project.

March 30: By this date, each group must meet with their assigned TA to go over their projects in more detail, to describe their schedule, the components and interfaces, and the allocation of people to tasks. It is highly desirable to plan a sequence of stages such that each represents a working system; if your schedule proves too optimistic, this gives you a fallback that you can still demonstrate. You must bring a draft of your plan to the meeting, and hand in a final plan by this date.

April 13: Prototype. By this date, you should have a bare-bones prototype that shows approximately what you are trying to do, and what your system will look like. It need not do much, but it must do some minimal part of the job. You should not be considering major or even very many minor feature changes after this date.

April 27: Alpha test. By this date, you must demonstrate an "almost working" version of the core functionality of your project. "Core" means the basic operations that form the essence of your project; if you were doing Amazon-2, for instance, this might mean searching for books and accepting orders. "Almost working" means that wizards (you) can help experienced programmers (us) to use the system. Ideally, since this is a user-interface project, the use should be obvious, but some hand-holding is permitted. Your code may crash and need restarting. But you must be able to convince us that your project can be completed by the deadline.

May 4: Beta test. Your code should largely work, all intended features should be installed and working, no major component should be incomplete. A determined sadist might be able to break your system, but a casual experimenter should not. Drafts of written material should be done.

May 9-11: Demo days. Each group will give a brief (30 minutes max) quasi-public presentation of their work. You will also be required to attend at least a few other presentations, for instruction and moral support. A one-page marketing blurb and a web page must be available for interested users. I would also like to encourage real use by naive users: your classmates on other teams. You must attempt to use some of the systems written by others and see how well they work. Depending on how projects shape up, this may be done by area of interest.

May 15: Dean's date. Everything must be done and handed in by 5:00 PM on this date, without exception. Final submission has to include a man page, an internals document describing the implementation, and a report on how the project and the team worked out and what was learned.

Grading

Grading for the project will be based on a number of criteria, including

planning: how careful, realistic and thoughtful your plan is.
development process: how well you carried out stages of design and early implementation.
functionality: successful design and implementation of the basic task you set out to do.
implementation: clean, working code is important, and will be a significant factor in your grade.
engineering: how well your project adheres to the important ideas of the course, like testing, portability, comparative efficiency, etc.
documentation: marketing blurb, web page, business plan, lessons learned: a good working description of the project and its implementation.
presentation: organization, preparation, polish, interest.

There will be more information as we go along, to flesh out or clarify some of the sketchy parts here. There will also be class lectures on some of the GUI-building tools and on networking basics.

You are encouraged to ask questions that will help clarify things for everyone. Murphy's Law applies to projects and their administration, so there will undoubtedly be screwups. I apologize for those in advance, but of course they too will be a simulation of reality...

Appendix

This description of the 3-tier model was lifted from this article. Some of it is dated and acronym filled, but it might give you some clearer idea of what the idea is. Most projects will naturally have somewhat this form anyway, so it's not a big deal.

Emergence of the Three-Tier Client/Server Model

Drawing on the lessons learned from the first generation of client/server applications, a second generation has emerged that segments these applications into three logical tiers: Presentation, Business Objects, and Data Management. This development was based on the recognition that the development and maintenance of business applications is a complex task and that the two-tier model cannot meet requirements for application development, deployment, maintenance, and scalability. In addition, mission-critical applications require absolute data integrity and security.

Client/server appeared on the scene in the mid-1980s as a way to capitalize on the increasing amount of processing power that was being deployed on the desktop and to control the soaring cost of providing sufficient processing power for RDBMS applications. However, the first generation of client/server applications had two major shortcomings. First, the logical architecture of the application and the physical deployment of the software were not treated independently. As a result, business logic tended to be hard-coded into the client component of applications, so each time the application logic had to be changed in response to changes in business processes or the business environment, software had to be updated on hundreds of PCs. Second, applications often are designed with a specific RDBMS target. Client software was developed using RDBMS-specific features, including SQL extensions, stored procedures, APIs, and middleware, which locked the application into using one particular database product and made it difficult to integrate additional data sources. This was an especially troublesome limitation since much of the data that applications needed to access resided in a myriad of legacy data sources. Also, business rules and processes were programmed into the data management layer, resulting in a lack of flexibility in application development and deployment, making these applications expensive to extend and maintain.

Three Logical Tiers

As corporate developers gained more experience with client/server applications, they recognized that they were dealing with three logical tiers, not two, and that the two-tier model was an artifact of the first generation of client/server products. In fact, the first generation of client/server applications was defined more by the hardware used, PCs and database servers, than by the business requirements that were being addressed. Using three logical tiers, Presentation, Business Objects, and Data Management, allowed applications to be defined in a way that is both product- and implementation-independent. Once an application is modeled using a three-tier architecture, it can be implemented on one, two, or three physical tiers, or even in a peer-to-peer architecture, depending on what best suits the specific application.

PRESENTATION. The Presentation tier consists of the user environment. This includes the GUI and associated menus, display, and the flow of screens and dialog boxes with which the user interacts.

BUSINESS OBJECTS. The concept of a business object is different in some ways from an object in a strict object-oriented programming sense, and similar in other ways. Business objects are entities that are used to model business processes. Examples of business objects are products, purchase orders, invoices, inventory, and production lines. Represented in software, a business object has methods (business rules and logic) and data. A business object's data is not necessarily stored with the object because the data often reside in multiple databases dispersed throughout the organization. For example, Customer is a business object. Each customer may have data stored in several different databases depending on the customer's various business relationships. But from the perspective of the application developer, the physical storage of the information related to a customer should be irrelevant. The developer works only with a set of Customer object interfaces that can be used to shield the developer from the details of how the Customer data are being physically stored and managed by the Data Management tier.

DATA MANAGEMENT AND DISTRIBUTION. The third tier in this model is where the data are physically stored and managed. It provides the persistent storage of data for the business objects, usually in an RDBMS. Database queries and updates occur here. This function of the third tier is to act as the enterprise server. The systems and software that comprise this tier have the task of ensuring that data are available to all users and applications that require them and that performance is predictable, reliable, and acceptable. This tier is the focal point for the definition of requirements for scalability, high availability, data integrity and security, and disaster recovery.

Three Logical Tiers Doesn't Mean Three Physical Tiers

A common mistake made in planning for three-tier architectures is assuming that, because a client/server application has three logical tiers, it must be implemented in three physical tiers. This misconception is unfortunate because it limits opportunities to capture application-specific requirements in the areas of user platform, network support, platform price/performance, development tools, and management capabilities. The three logical components can be distributed in many different ways to provide optimum configurations for application maintenance and support. It may be best for some business objects to be distributed to each user's PC, while in other cases they should be on the enterprise server. Some databases might be best replicated to each site along with associated business objects, while in other cases, centralization of enterprise services along with all databases might be best.

The key is that once the three-tier architecture has been embraced in the design of an application, the partitioning of that application into its physical instantiation should be done in a way that optimizes performance, security, integrity, maintainability, and management.

Three-Tier Platform Requirements

The three logical elements of a three-tier client/server application architecture place different requirements on the platform to be used in implementation. The decision about the physical deployment of the application has to consider the requirements imposed by each tier of the application.

Presentation Requirements

Virtually all client/server applications have a GUI. Most are PC based, but this is not always the case. When a PC is in use, then the style of presentation is determined mostly by the native GUI of the PC operating system. However, the presentation component may be at a Kiosk or some other type of device, in which case the interface is in the hands of the developer. In any case, platforms used for Presentation in a three-tier client/server environment must to be able to support the application's interface with very responsive performance. The Presentation platform must also be able to support a standard set of protocols necessary to connect the user with the other tiers of the application.

The application interface is determined both by the desktop operating system and by the tools used to develop the application. When the application is being used along with other applications, the platform should provide methods for easily integrating multiple applications so that corporate developers do not have to invent the integrating technologies themselves.

Protocol support on the desktop has been a consistent barrier in client/server applications. Since the majority of corporate desktops are still running 16-bit Windows, the ability to provide robust support for concurrently running many different communication protocols and different types of middleware is limited. These desktops lack the preemptive multitasking and protected memory necessary to ensure that the user can keep working if another tier of the application is temporarily unavailable. This places an additional burden on the other tiers of the application to provide a highly available environment that the desktop components can rely upon. Three-tier applications will benefit greatly from the use of 32-bit platforms, such as Windows 95 and Windows NT, as the Presentation platform.

Business Objects Requirements

The middle tier in the three-tier architecture is represented by an organization's unique business objects. This tier has to manage and support three components.

APPLICATION INTERFACES. The middle tier has a set of interfaces that are used by application developers in developing the desktop application component. Whether these interfaces are APIs or object interfaces defined in Interface Definition Language (IDL), they are unique to a specific enterprise and embody company-specific business rules and business logic.

MIDDLEWARE. The business objects tier plays a critical role in three-tier applications. Users are interacting with these objects, and the objects are sending and receiving data from enterprise servers.

DATA ACCESS. This tier also incorporates the data access routines necessary to populate the business objects with data which are stored persistently on the enterprise server.