Tue Feb 19 15:32:43 EST 2019
Newer items are at the front.
Meredith Martin, Department of English
There are many digital humanities problems, large and small, that could profit from some CS attention. Here are some ideas.
Large-scale "unsolved problems" 1. full-text search across multiple languages and 2. named-entity recognition within texts that somehow knows not to look at titles and 3. teaching the computer to detect when the OCR is returning something that is not text so I can easily detect typographically unique pages rather than doing so by hand.
Small-scale problems 1. isolating excerpted texts (a single article) in fully-indexed digitized bound-periodicals (which is how HathiTrust gives me data). 2. collapsing multiple reprints so that only the first displays and the user has to "see more editions" for the rest to display 3. building on the (now non-existent) named-entity recognition I dream of above, so we can visualize a cultural field of reference or discourse across texts (as in: this text is mentioned in / referenced by, like Google Scholar except that these scholars don't use citations in the same way). 4. adding other kinds of visualizations to display the collections in more evocative ways 5. adding OCR quality to the search result filters,
Patrick Caddeau, Dean of Forbes College Two projects that would help undergrads with tutoring and advising:
Stephen Kim, Associate Director for Information and Technology The Princeton University Art Museum offers a world-class collection of over 100,000 works of art spanning the world of art from antiquity to the present. While more than 200,000 visitors visit our galleries in a year, we are always eager to develop new ways to engage audiences, especially, YOU, our students. Recently, we've built out new data and images services to power potential innovations like:
Sam Wang, Neuroscience Every 10 years, legislative districts across America must be redrawn after the Census. Redistricters have the task of making sure that diverse communities within a state are fairly represented. But they do not always know where those communities are.
Citizens have opportunities to testify about their communities in public hearings. But that testimony is qualitative, and there is no way to integrate the comments in a unified way. It would be useful to have a graphical application for individuals to (a) draw their communities of interest (COI's) on a state map, (b) store the shapes in a standard format such as GIS, and (c) annotate the shapes with comments. Then, after citizens have participated, it would be useful to display all of the communities of interest in a single map for inspection.
An additional feature might be reduction of redundancy by combining highly overlapping communities in a single consensus graphical display object.
Abby Klionsky '14, Office of the Executive Vice President The decor in Frist -- all the quotes painted on the wall, etc. -- is meant to represent a diversity of ideas, and is one of the places on campus that, theoretically, does this quite well. It's theoretical because we don't know how much people actually pay attention to them, nor whether they know anything about the person being quoted.
There is actually documentation of all of this, in a very old-school, circa-2000 website that pairs photos of the quotes with photos and bios and explanations of the people who they are quoting: http://princeton.edu/frist/iconography.
This also covers the images in Cafe Viv and some of the Princeton-y flotsam that adorns the halls and walls. It would be GREAT if this could actually be a site that made people interested in looking at it!
Could we build a system that showed these images much more dynamically, perhaps with a rotating sequence of pictures that always showed something interesting. For each one, perhaps there could be a QR code that pointed to more details. Or maybe a touch screen would make it easy to get more details. Would it be possible to add new images and new text very easily without having to be an expert? Are there other things that would make the displays more appealing and encourage people to look at them more carefully?
Claire Pinciaro '13, ODUS
Do you ever find yourself overwhelmed by the number of co-curricular opportunities available at Princeton? Do you find yourself wishing that there was an efficient way to find out which groups, teams, and organizations your peers belong to?
Imagine a centralized digital platform in which you and other students can keep track of your co-curricular involvements, search the profiles of other students, and see the membership of student organizations in real time. Think Tigerbook but with a co-curricular section.
We've done a lot of research in this sphere and know that there's real potential for this to be a hit not only at Princeton, but at other schools as well.
Jill Stockwell, McGraw Center Ideas that would greatly improve our organization's efficiency and communication. One is a volunteer application management system for our 150+ applicants each semester; another is a carpooling application for each of the seven facilities where we teach.
Wangyal Shawa, Map and Geospatial Information Center
We are planning two projects to create and manage our scanned maps and create geospatial data. One project is related to creating a batch georeferencing tool that will georeference scanned topographic maps that are the same size and the same scale. There is one system called QUAD-G (open source) to process the United States Geological Survey 1:24,000 scale maps but this software does not work well if you have a smaller scale map series. We need to customize the QUAD-G software to work with smaller scale maps using the same programming language or redesign it with a different programming language using similar workflows.
Another project is to design an open source software system that will extract georeferenced scanned maps to vector geospatial data.
These projects will benefit many researchers and libraries.
Caroline Savage, Office of Sustainability
This project would try to improve Tiger Energy
Nik Voge, McGraw Center
Time is in short supply for Princeton students. This makes scheduling and planning of academic tasks and activities such as completing p-sets, assigned reading, papers, and projects difficult. Because assignments can be quite challenging and time consuming and because they can vary considerably not only from course to course, but also from week to week, it is often difficult for students to accurately predict how much time tasks will require. At the same time, most students, with the encouragement of the university, are involved in extra-curricular, career preparation, and social activities, which results in a relatively small margin for error in planning and scheduling.
In many cases students do not budget adequate time to complete their academic work, leading to unmet grade (and learning) goals and feelings of dissatisfaction. Students often lack sufficient information to effectively plan and schedule their academic work and other aspects of their lives.
One recent innovation is Rice University's Course Workload Estimator. While the Course Workload Estimator has been a useful tool for instructors, it can be improved upon. It can be adapted to Princeton's distinctive academic environment, including its instructional materials and evaluation standards. Another improvement is continuously refining the algorithm by which the estimates are made by collecting input from students in specific courses on the amount of time various tasks demand. Additionally, the corpus of data collected can be analyzed to better understand the academic time demands across campus, an endeavor which has never been undertaken in any systematic manner to my knowledge.
Cliff Wulfman, Library
A very large quantity of the cultural-heritage material that has been digitized is encoded and stored in XML: information about the objects (metadata); information about digital images of the objects (file types; file paths; technical info about the files).
There has been much buzz in the digital cultural-heritage community in recent years about the International Image Interoperability Framework (IIIF). IIIF is a set of specifications for APIs to web services, including an Image API, which deliver images (at various resolutions, orientations, etc.) and a Presentation API, which delivers a structured representation of complex image-based digital objects in JSON-LD (JSON for Linked Data).
Princeton's Digital Library includes a collection of Princeton-area newspapers, including the entire run of The Daily Princetonian from its founding in 1876. The digital representations of these newspapers are encoded in XML – a particular blend of XML schemas called METS/ALTO.
The project would be to create a IIIF-based viewer for The Daily Princetonian historical collection by implementing IIIF APIs:
Jed Marsh, Vice Provost for Institutional Research
There is an increasing interest in student outcomes after the initial
placement -- say 10 years post degree. Currently, these data are
harvested from a hodge-podge of sources, including scraping sites like
LinkedIn. There's a fair amount of staff time spent across
campus googling former students, both graduates and undergrads.
We need tools that:
(1) improve data collection from the web. Could there be an API from
LinkedIn or job search sites?
Could one develop an app to systematically search for and harvest CV's &
resumes posted by Princeton Alumni?
(2) Categorize unstructured employment data (job code, employer, etc.,)
into standardized occupation (SOC) and industry (NACIS) codes.
(3) Store these data in a common repository that could be available for
student outcome studies.
Abby Klionsky '14, Office of the Executive Vice President As a breakout group of the Campus Iconography Committee, the Princeton History Working Group is building a series of themed historical tours of Princeton's campus that will highlight lesser-known histories of the university. These will take shape in the form of a mobile app, which will use wayfinding technology to guide users to sites across campus and showcase associated photos, audio, and video to tell these stories. For some of these sites, we'd like to incorporate augmented reality features -- particularly in places where there may no longer be a physical marker or building still standing. The augmented reality component we're envisioning would likely be a statue for "placement" in one of the statue-hold pedestals in East Pyne courtyard or the front of Frist, a moving image to launch over a picture frame or screen that does exist in reality, or overlaying an old image of a campus map/building over what exists today.