Princeton Data Sources for COS 333 Projects


Last updated: Wed Jul 31 22:59:40 EDT 2024

This document, except for the new Department Requirements Data section at the end, was composed by Princeton alumnus and former COS 333 student Vinay Ramesh (2020) as part of an independent study project.

For his project Vinay performed an extensive search of University agencies to find Princeton-specific data sources that might be useful to COS 333 project teams. He also communicated with former COS 333 project teams to learn what Princeton-specific data sources they used. Vinay then composed instructions, with example code, describing how to access those data sources.

This document might help you to access data sources that you need for your project. It might also help you to choose a project topic in the first place!

This is a living document. It was accurate at the time of writing, but it will need to be updated over time in response to changes in the data sources. Please report any inaccuraces in the document to the course's lead instructor.


Working with OIT

This section covers working with OIT in the context of being a part of a COS 333 project team. There are a couple of ways in which a student would want to communicate with OIT for their project.

Requesting That Your App be CAS-Whitelisted

It's common for COS 333 project applications to use CAS authentication. If you indeed wish your application to use CAS, then this is what you need to know, and how to work with OIT...

Whenever you visit a Princeton-CAS-protected application by entering some URL of the form http(s)://somehost:someport... in a browser, the browser sends a HTTP request to the Princeton CAS server at fed.princeton.edu. The request specifies somehost. If somehost is on the fed.princeton.edu whitelist, then fed.princeton.edu proceeds with CAS authentication. If somehost is not on the fed.princeton.edu whitelist, then fed.princeton.edu rejects the attempt to CAS authenticate.

Before March 2021, localhost was on the fed.princeton.edu whitelist. Also, all hosts of the form something.herokuapp.com automatically were on the fed.princeton.edu whitelist. That configuration was appropriate for COS 333.

Since March 2021, localhost continues to be on the fed.princeton.edu whitelist. However, hosts of the form something.herokuapp.com are not automatically on the fed.princeton.edu whitelist. So application developers must apply to OIT to have something.herokuapp.com applications whitelisted. Generalizing, application developers must apply to OIT to have any non-localhost applications whitelisted. So if your COS 333 application will use Princeton CAS, and you intend (as you should) to deploy your application to any host other than localhost, then you must apply to have your application added to the fed.princeton.edu whitelist.

To apply to have your application added to the fed.princeton.edu whitelist, browse to this website:

https://princeton.service-now.com/service?id=sc_cat_item&table=sc_cat_item&sys_id=edd831664f2c3340f56c0ad14210c7df&recordUrl=com.glideapp.servicecatalog_cat_item_view.do%3Fv%3D1&sysparm_id=edd831664f2c3340f56c0ad14210c7df

Then complete and submit the form. As an example, I (Dondero) entered these data to request that one of my applications (https://pennyall.herokuapp.com) be whitelisted:

Requested by:  Robert Dondero
Service Name:  https://pennyall.herokuapp.com
Technical Contact for Vendor:  unknown
Technical Contact Phone Number:  unknown
Technical Contact Email:  unknown
Service Provider Metadata URL:  unknown
Is this request for a Slack Workspace? No
Is the Service Provider a member of InCommon:  unknown
Does the service provider support SAML2?  unknown
Additional information:  For the COS 333 course.  The name of the faculty sponsor is the COS 333 lead instructor.

In the "Additional information" field it's important to note that your application is for the COS 333 course, and that your faculty sponsor is the current COS 333 lead instructor (for example, Robert Dondero). There may be some delay, so you should apply as soon as you can.

Requesting a Service Account

You also may wish to obtain a service account for your project. A service account is a separate Princeton netid which does not correspond to an actual student or faculty member, but rather is created to be linked with a particular application. It would be much better for a team to have shared login information for a service account rather than the team sharing the login information of one of its team members. In fact, the doing the latter would be a violation of Princeton policies. Additionally, a service account can be made permanent, while student accounts expire after the students graduate. Many Princeton-related APIs require a user to use a netid in order to authenticate themselves as part of the Princeton community, including those APIs in the OIT API Store.

To obtain a service account, contact the COS 333 lead instructor. The lead instructor then will submit the appropriate form to OIT.

Just for your information, the lead instructor will browse to this web page:

https:// princeton.service-now.com/service? sys_id=f44539ab4ff81640f56c0ad14210c77c&id=sc_cat_item&table=sc_cat_item

and fill out the form. On it, the lead instructor will do the following:

Now that a service account has been created, you can move on to consuming APIs in the OIT API Store. The API Store is hosted on this website:

https://api-store.princeton.edu/store/

In order to access this website, you must be either on the Princeton VPN or on the Princeton eduroam WiFi network. Login to the website with the Princeton CAS authentication using the service account you just created. Now, click the Applications tab on the left side of the screen (it should be a green button), and edit the name of the default application into a name suitable for your COS 333 project by clicking the Edit icon. Click on the Update button. If a guided tutorial appears, then escape out of it. (Refresh the page if necessary. Exit your browser and revisit the page if necessary.)

Then, click on the APIs tab on the left side of the screen (it should be a purple button), and you should see two APIs listed: ActiveDirectory and PrincetonInfo (more on what exactly is in each API in the following section APIs in the OIT API Store).

Now you must subscribe to one of those APIs. Click on either one of these APIs, and then click the dropdown tab over to the right side of the screen (that says Select Application...) and choose the application name you just created. Then, click on the button Subscribe to subscribe to the API.

Apart from the ActiveDirectory and PrincetonInfo APIs, there is one more API called MobileApp. This API gives information on courses, dining hall menus, events on campus, and places on campus that are currently open/closed. This API will not be seen at first, and you must ask OIT for explicit access to this API. In order to do this, send an email to George R. Kopf (or whoever the current Director for Software Infrastructure Services is) and ask him to add your service account netid to the approved accounts for the MobileApp API.

Now, let's say that after looking at the available endpoints in each of these APIs, your team decides that what you are looking for is not available in the OIT API Store. In this case, OIT will be willing to work with you to see if they can add a new API for some other Princeton dataset, but start early. In conversations with OIT, it was apparent that the administration has the desire to help out students in this capacity. Whether this is because OIT eventually desires control over all Princeton-related data is currently unclear, but what is known is that OIT is currently training many of its employees to develop APIs on the Store. In order to do this, first send an email to George R. Kopf indicating that you are a COS 333 project team looking for a new API. George will need to know a few things about your request:

It could be possible that OIT does not look over the dataset that you desire, and so permission from the data owner would be required. Typically and if necessary, after the email to George R. Kopf, he can direct you to the appropriate administrator from which this permission needs to be acquired.

UPDATE AFTER THE FALL 2022 SEMESTER: Jonathan Wilding (jonathan.wilding@princeton.edu) from OIT tells us that OIT's goal is to expand the APIs over time to fulfill as many data needs as possible. He also stated that COS 333 students should contact him if they have new data requests.

There are three scenarios that could come up when talking to this administrator:


APIs in the OIT API Store

In the OIT API Store, assuming your service account has already gained access to the StudentApp and WinterSession APIs, there should be 5 APIs listed. In order to see sample code of how to consume these APIs, check out this Github repository:

https://github.com/vr2amesh/COS333-API-Code-Examples

Before delving into the details of each of the endpoint of these 5 APIs, it is important to cover the security protocol used by the OIT API Store. The API Store uses the OAuth2 security protocol in order to protect their endpoints. This protocol includes the use of an access token which needs to be passed into the header of each request to the API. Below is a small code snippet of how to use the access token in the header of a request in Python.

import requests
req = requests.get(
    self.configs.BASE_URL + endpoint,
    params=kwargs if "kwargs" not in kwargs else kwargs["kwargs"],
    headers={
        "Authorization": "Bearer " + self.configs.ACCESS_TOKEN
    },
)
text = req.text

The final value text represents the return value from the endpoint in string form. The variable kwargs is a dictionary of key word arguments that represent the parameters in the request. For example, if a request is made to BASE_URL + endpoint with the parameter fmt=json (in order to perhaps have the return value in JSON format instead of XML), then kwargs would be the dictionary {"fmt": "json"}. The access token only lasts one hour, so it's important to make sure it's up-to-date. In order to retrieve the up-to-date access token for your application, make a request to the following endpoint:

https://api.princeton.edu:443/token

Below is a code snippet in Python to retrieve an access token for your application.

req = requests.post(
   self.REFRESH_TOKEN_URL,
   data=kwargs,
   headers={
       "Authorization": "Basic " + base64.b64encode(bytes(self.CONSUMER_KEY + ":" + self.CONSUMER_SECRET, "utf-8")).decode("utf-8")
   },
)
text = req.text
response = json.loads(text)
self.ACCESS_TOKEN = response["access_token"]

In this case, kwargs should be the dictionary {"grant_type": "client_credentials"} and the header includes the following base64 encoded string: CONSUMER_KEY + ":" + CONSUMER_SECRET. The sample code in the Github repository illustrates further how to use an up-to-date access token for each request made.

Coupled with the access token is the Consumer Key and the Consumer Secret. In order to get these values, browse over to the OIT API Store, and click the Applications tab. Then, click on the application name that you renamed earlier (from the default name). At this point, you should be able to see a series of tabs which are Details, Production Keys, Sandbox Keys, and Subscriptions. The Production Keys are meant to be used in a deployed application context, and the Sandbox Keys are meant to be used in a local development context. To start, use the production keys. Click the Production Keys tab and click the Generate Keys button in order to generate your Consumer Key, Consumer Secret, and Access Token. The Consumer Key and Consumer Secret values do not change throughout the duration of the application, but as stated earlier, the Access Token indeed does change every one hour. Therefore, the Consumer Key and Consumer Secret can be hard coded into your application code, but not the Access Token. Refer to the Github sample code for some examples on how to deal with these three values to consume the APIs on the Store. Please refer to the ReqLib.java/req_lib.py and Configs.java/configs.py files in particular in the ActiveDirectory, MobileApp, and PrincetonInfo folders.

Below is a list of the APIs available on the OIT API Store.

ActiveDirectory

Documentation: https://api-store.princeton.edu/store/apis/info?name=ActiveDirectory&version=1.0.5&provider=gkopf

Base URL: https://api.princeton.edu:443/active-directory/1.0.5

Endpoints:

PrincetonInfo

Documentation: https://api-store.princeton.edu/store/apis/info?name=PrincetonInfo&version=1.0.3&provider=wso2adm

Base URL: https://api.princeton.edu:443/princeton-info/1.0.3

Endpoints:

StudentApp

Documentation: https://api-store.princeton.edu/store/apis/info?name=StudentApp&version=1.0.3&provider=wso2a

Base URL: https://api.princeton.edu:443/student-app/1.0.3

Endpoints:

WinterEvents

Documentation: https://api-store.princeton.edu/store/apis/info?name=WinterEvents&version=1.0.0&provider=wso2adm

Base URL: https://api.princeton.edu:443/winter-events/1.0.0

Endpoints:


Other APIs

Other APIs are also available to you. This section will cover Princeton Art Museum's API and a dataset that shows all plants, trees, and bushes on campus through Princeton Facilities. Each of those Princeton data sources has its own separate security protocol and method for consuming the API. Below is some documentation on the endpoints of these datasets. For further code samples, please refer to the Github repository:

https://github.com/vr2amesh/COS333-API-Code-Examples

Princeton Art Museum API

This is a public API, so there is no security protocol to be aware of in order to consume it. This API is very well documented in the Github repository page:

https://github.com/Princeton-University-Art-Museum/puam-api-docs

Therefore, below are simple explanations of each of the endpoints in the API. In order to know which object ids, maker ids, and packages ids refer to which items, refer to the files objects.json, makers.json, and packages.json in the ArtMuseum folder of the following Github repository:

https://github.com/vr2amesh/COS333-API-Code-Examples

You may need to issue the command export PYTHONIOENCODING=utf-8 at your terminal prompt before executing the example programs in that repository.

BASE URL: https://data.artmuseum.princeton.edu

/objects/{ObjectID}

Returns information related to objects in the Princeton Art Museum's collection. An object is any art piece that is within the Art Museum itself, or any art piece that is around the Princeton campus.

/makers/{ConstituentID}

Returns information related to makers in the Princeton Art Museum's collection. A maker is any painter, sculptor, or architect that has art work on the Princeton University campus.

/packages/{PackageID}

Returns information related to packages in the Princeton Art Museum's collection. A package is any collection of objects in the Art Museum, categorized by some common property. For example, all East Asian Ming Dynasty art could be one package, and all Spanish Renaissance Art could be another.

/search

One can use this search endpoint in order to search for objects according to their type among other things. The parameters are as follows: q (query string), type (this is the type of object, which can be either art objects, makers, packages, or all).

Plants, Trees, and Bushes

This is not an API, but rather a place from which to obtain the Princeton groundskeeping internal database. The database can be imported from the third-party vendor TreePlotter's website as a CSV file into your local computer to be used in whichever way your team sees fit. The CSV file gives the following information about the plants, trees, and bushes: address, common name, date planted, genus name, geometry, latin name, coordinates, species, and current status (alive or dead). If your COS 333 team wishes to use this information, you must follow these steps below:

You could work around this limitation by programmatically following the steps outlined above. That is, programmatically logging in, clicking the export button, and dealing with the CSV file. You could accomplish this by inspecting the elements of the HTML page, and determining which buttons on the page need to be clicked in order to export the CSV file. These buttons could be clicked programmatically using JavaScript and tools such as JQuery. It is a unique programmatic challenge to figure out how to do this, and so it is left up to the COS 333 team if they wish to implement this feature.

If you decide not to implement this programmatically, then it is possible to just download the CSV file once, and use this "snapshot" of the database for the entirety of your project. However, keep in mind that it would mean that the database is not up-to-date. An application that periodically retrieves this CSV file would indeed have an up-to-date database, and would result in a more robust application.


Department Requirements Data

The TigerApps Team and Princeton Ph.D. candidate Barak Nehoran created a repository of machine-readable departmental requirements data. It's at this address:

https://github.com/PrincetonUSG/Princeton-Departmental-Data

Dondero note 12/22/23: Students from the Fall 2023 semester told me that the GitHub repository nicely illustrates the structure of the departmental requirements data, but that those data are stale.

Quoting Barak's announcement email:

This is the departmental requirements data that has been used as part of TigerPath, the web app that thousands of Princeton students use each semester for 4-year course planning (and which started out as my COS333 project years ago).

We've now factored the departmental requirements data out into its own repository because we believe that this would make it easier for others to make use of it to build new apps.

Building up this repository of departmental requirements has taken a tremendous amount of time and effort. We've built it up and fine tuned it over several years, with the input of over a dozen student contributors.

Since the university's registrar doesn't (and refuses to) provide this kind of data, we have had to build it up by hand from the department websites and resources and conversations with department representatives.

In the past, any COS 333 group that wanted to build an app that made use of or incorporated departmental requirements would have come up against the impossibly high start-up cost of putting this data together. Having this repository available means that COS 333 projects can be more easily built without that start-up cost.

The departmental requirements are encoded in YAML format (successor to JSON), which encodes every aspect of the logic of which courses must be taken, and in which combination, to satisfy different departmental requirements.

The repository includes sample code for reading and processing the data, including code for checking which requirements have/haven't been satisfied by a student's list of courses.

All of the university's departments and majors are covered, and the only caveat is that we are still missing information about some certificates, but we are adding more certificates over time and will eventually have all of them.

Some examples of apps that students could potentially build from this data:

I imagine that your students will be creative in finding even more uses and applications that I couldn't have thought of, and so I'm looking forward to seeing what they come up with!