Quick links

FPO

Zexuan Zhong FPO

Date and Time
Thursday, November 21, 2024 - 4:00pm to 6:00pm
Location
Engineering Quadrangle E423 (off campus)
Type
FPO

Zexuan Zhong will present his FPO "Language Models with Retrieval Augmentation and Conditional Computation" on Thursday, November 21, 2024 at 4:00 PM in E-Quad E423 .

Location: E-Quad E423 & Zoom: https://princeton.zoom.us/j/3159486709

The members of Zexuan’s committee are as follows:
Examiners: Danqi Chen (Adviser), Sanjeev Arora, Karthik Narasimhan, Luke Zettlemoyer (University of Washington)
Readers: Kai Li, Sanjeev Arora

A copy of his thesis is available upon request.  Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend his talk.

Abstract follows below:

Recent advancements in large language models (LMs) have primarily focused on scaling up model parameters and training tokens to improve performance across various tasks. However, this scaling increases computational costs significantly. Furthermore, conventional parametric LMs are fundamentally incapable of adapting to unseen domains, editing learned knowledge, retaining long-tail knowledge, and are shown to easily leak private data from the training corpus. This thesis explores alternative approaches to scaling LMs while addressing these limitations.

Firstly, we study LMs with retrieval augmentation, where LMs leverage an external datastore to make predictions. We develop Trime, a novel end-to-end training approach that learns LMs and the retrieval models jointly. Our results show that Trime significantly enhances LM performance with the same model size and computational budget. These end-to-end trained retrieval-augmented LMs also provide users with effective adaptability to domains unseen during training.

Secondly, we focus on a fundamental challenge of LMs: editing facts that are stored in their parameters—a critical problem to address since the world is constantly changing and the knowledge in LMs becomes outdated easily. We examine state-of-the-art knowledge editing methods on LMs and find that existing evaluation paradigms are extremely limited. We propose a novel benchmark MQuAKE, consisting of multi-hop questions that assess whether edited models correctly answer questions where the answer should change as an entailed consequence of edited facts. We demonstrate that existing knowledge editing methods fail on the constructed multi-hop questions. We also propose a simple retrieval-augmented approach that stores all edited facts externally, which outperforms previous methods by a large margin.

Thirdly, we investigate training LMs with conditional computation, which is designed for scaling LMs without a substantial increase in computational costs. We focus on mixture-of-experts (MoE), a widely used conditional computation technique which facilitates efficient scaling. However, training the routing network in MoE introduces the challenge of optimizing a non-differentiable, discrete objective. We present a fully differentiable MoE architecture for autoregressive language model pre-training, Lory, which are based on two key techniques: (1) a causal segment routing strategy for high efficiency of expert merging and (2) a similarity-based data 3batching method for better expert specialization. Despite segment-level routing, Lory models achieve competitive performance compared to state-of-the-art MoE models with token-level routing. We further demonstrate that the trained experts of Lory capture domain-level specialization without supervision.

Overall, our research illuminates a new paradigm in LM scalability, fundamentally addressing critical limitations and advancing the development of more efficient, effective, adaptable, and updatable language models.

 

Lahav Lipson will present his FPO "Fast and Robust 3D Reconstruction" on Thursday, November 21, 2024 at 10:00 AM in Friend 202.

Date and Time
Thursday, November 21, 2024 - 10:00am to 12:00pm
Location
Friend Center 202
Type
FPO

Lahav Lipson will present his FPO "Fast and Robust 3D Reconstruction" on Thursday, November 21, 2024 at 10:00 AM in Friend 202.

Location: Friend 202

The members of Lahav’s committee are as follows:
Examiners: Jia Deng (Adviser), Adam Finkelstein, Szymon Rusinkiewicz
Readers: Anirudha Majumdar, Olga Russakovsky

A copy of his thesis is available upon request.  Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend his talk.

Abstract follows below:

3Dreconstruction from visual data is an important subtask for robotics, autonomous machines, and 3D scene understanding. It involves estimating camera and object motion from images/video, as well as 3D structure. I will introduce an approach known as Optimization Guided Neural Iterations (OGNI), and demonstrate how it can be applied to various 3D reconstruction tasks. In OGNI-based approaches, we mimic classical optimization algorithms by breaking down each task into a series of small revisions predicted by a shallow network. Each revision is supervised independently, and is informed by features conditioned on a running estimate of the solution. This mechanism is surprisingly general, and leads to robust and efficient solutions to 3D reconstruction. Moreover, I introduce several explicit optimization layers which enable one to reformat these challenging problems into easier, low-level vision tasks. On visual SLAM, stereo matching and object pose estimation, I show how this approach leads to state-of-the art accuracy and/or speed. I also discuss potential future directions for this line of work

Matthew Weaver will present his FPO "Bicubical Directed Type Theory" on October 31, 2024 at 11am in CS 301.

Date and Time
Thursday, October 31, 2024 - 11:00am to 1:00pm
Location
Not yet determined.
Type
FPO

Matthew Weaver will present his FPO "Bicubical Directed Type Theory" on October 31, 2024 at 11am in CS 301.

 

The members of his committee are as follows:
Andrew Appel (Advisor)
Daniel Licata (Advisor/Wesleyan University)
Zachary Kincaid (Reader)
David Walker (Examiner)
Aarti Gupta (Examiner)Title: Bicubical Directed Type TheoryDirected type theory is an analogue of homotopy type theory where types represent categories, generalizing groupoids. A classical, bisimplicial approach to directed type theory, developed by Riehl and Shulman, is based on equipping each type with both a notion of path and a separate notion of directed morphism. In this setting, a directed analogue of the univalence axiom asserts that there is a universe of covariant discrete fibrations whose directed morphisms correspond to functions—a higher-categorical analogue of the category of sets and functions. While this approach suffices for formalizing mathematics, for applications to computer science we would like a computational interpretation of directed type theory. In this thesis we give a constructive model of a directed type theory with directed univalence in bicubical sets. We formalize much of this in Agda following the approach of Orton and Pitts. First, building on the cubical techniques used to give computational models of homotopy type theory, we show that there is a universe of covariant discrete fibrations with a partial directed univalence principle asserting that functions are a retract of morphisms in this universe. To complete this retraction into an equivalence, we refine the model using Coquand, Ruch, and Sattler’s work on constructive sheaf models. We introduce the cobar modality and by restricting to fibrant types that are also cobar modal, we are able to complete our construction of directed univalence. We then describe a generalization of the fibrant Riehl-Shulman extension types. We prove this in the setting of an arbitrary presheaf category with respect to a new notion of fibrancy that is given by a generic filling problem. This abstraction is general enough to capture all of the current presheaf models of type theories and their classifications of types specified by filling problems. In addition, this result extends the potential syntax of these type theories to be able to internally express any of these filling problems as fibrant types. We use this to then define a type theory in which the user can internally define classifying universes for any such filling problem. Lastly, we overview our implementation of bicubical directed type theory.

Ziyang Xu will present his FPO "A Fast and Extensible Memory Profiling Framework"

Date and Time
Friday, September 27, 2024 - 10:00am to 12:00pm
Location
Not yet determined.
Type
FPO

Ziyang Zu will present his FPO "A Fast and Extensible Memory Profiling Framework"

Details to follow

Kun Woo Cho will present her FPO "Programmable Smart Radio Environments: From Theory to Hardware Implementation" on  October 1, 2024 at 3pm in Friend Center 004 

Date and Time
Tuesday, October 1, 2024 - 3:00pm to 5:00pm
Location
Not yet determined.
Type
FPO

Kun Woo Cho will present her FPO "Programmable Smart Radio Environments: From Theory to Hardware Implementation" on  October 1, 2024 at 3pm in Friend Center 004 

Examiner: Prof. Kyle Jamieson (Advisor, CS), Prof. Yasaman Ghasempour (ECE), Prof. Andrew Appel (CS)

Reader: Prof. Wyatt Lloyd (CS), Prof. Omid Abari (UCLA), Dr. Ranveer Chandra (MSR)

 

All are welcome to attend.

 

Title: 

Programmable Smart Radio Environments: From Theory to Hardware Implementation

 

Abstract: 

Today’s wireless networks are undergoing a rapid transformation, scaling in traffic vol-

ume, spectral efficiency, and radio count as never seen before. This thesis addresses

the critical challenges emerging from this evolution in next-generation (NextG) wire-

less networks, focusing on realizing three key services: enhanced mobile broadband

(Gbps or above), ultra-reliable low latency communication (order of milliseconds),

and massive machine type communication (up to one million per squared kilometer).

 

To meet these diverse and demanding requirements, this thesis poses a central

question: Can we build a smarter radio environment controlled and learned by soft-

ware, configuring itself in real-time to meet different application needs? Current

approaches to handle uncontrolled wireless signals are end-to-end. Unfortunately,

sending and receiving endpoints are limited in their ability to shape this inherent

propagation behavior. For instance, although multiple antennas can shape the beam

pattern departing the sender, it cannot control how the resulting signals arrive at the

receiver after traversing environmental obstacles. By focusing on changing the envi-

ronment itself rather than the communication endpoints, this thesis offers a significant

shift in design paradigms for modern wireless networks.

 

First, millimeter-wave (mmWave) technology offers multi-Gbps data rates due

to its wide spectral bands. However, the high frequency of mmWave signals makes

them vulnerable to blockage by walls, people, and obstacles, significantly limiting

their practical applications. This thesis introduces mmWall, a programmable smart

surface deployed on buildings to fully control the mmWave radio environment. Com-

prising over 4,000 sub-millimeter meta-materials, mmWall can steer signals through

the surface or reflect them, bringing outdoor mmWave signals indoors and bypassing

obstacles. Also, this thesis proposes Wall-Street, a smart surface installed on vehicles

to provide seamless and reliable mmWave connectivity in high-mobility scenarios.

 

Second, satellite networks uses constellations of thousands of satellites to provide

low latency communication and global coverage. However, these networks face two

significant challenges: outages due to transient blockage, and complications in beam

alignment between users and satellites due to the use of different frequency sub-bands

in the uplink and downlink directions. Extending the concept of a programmable

smart radio to satellite communications, this thesis introduces Wall-E, a dual-band

smart surface that mitigates signal blockage and enhances reliability of satellite-to-

ground links, and Monolith, a smart surface that boosts inter-satellite link capacity.

 

Third, while these smart surfaces address challenges at the physical layer, we

also tackle issues at the link/MAC layer, particularly in massive Internet of Things

(IoT) networks. This thesis introduces Cross-Link Channel Prediction (CLCP), a ma-

chine learning technique that learns the radio environment and accordingly allocates

networking resources to a large number of IoT devices. This AI-driven approach com-

plements my programmable surfaces, creating a comprehensive smart radio solution

for NextG networks.

Samuel Ginzburg will present his FPO "VectorVisor: A Binary Translation Scheme for Throughput-Oriented GPU Acceleration" on Thursday, August 29, 2024 in CS 302 at 1pm.

Date and Time
Thursday, August 29, 2024 - 1:00pm to 3:00pm
Location
Not yet determined.
Type
FPO

Samuel Ginzburg will present his FPO "VectorVisor: A Binary Translation Scheme for Throughput-Oriented GPU Acceleration" on Thursday, August 29, 2024 in CS 302 at 1pm.

 

The members of his FPO committee follows below:

Examiners: Michael Freedman (Advisor), Wyatt Lloyd, and Amit Levy

Readers: Mae Milano and Mohammad Shahrad (UBC)

 

All are welcome to attend.  Please see abstract below.

 

Beyond conventional graphics applications, general-purpose GPU acceleration has had significant impact on machine learning and scientific computing workloads. Yet, it has failed to see widespread use for server-side applications, which we argue is because GPU programming models offer a level of abstraction that is either too low-level (e.g., OpenCL, CUDA) or too high-level (e.g., TensorFlow, Halide), depending on the language. Not all applications fit into either category, resulting in lost opportunities for GPU acceleration.

 

We introduce VectorVisor, a vectorized binary translator that enables new opportunities for GPU acceleration by introducing a novel programming model for GPUs. With VectorVisor, many copies of the same server-side application are run concurrently on the GPU, where VectorVisor mimics the abstractions provided by CPU threads. To achieve this goal, we demonstrate how to (i) provide cross-platform support for system calls and recursion using continuations and (ii) make full use of the excess register file capacity and high memory bandwidth of GPUs. We then demonstrate that our binary translator is able to transparently accelerate certain classes of computebound workloads, gaining significant improvements in throughput-per-dollar of up to 2.9× compared to Intel x86-64 VMs in the cloud, and in some cases match the throughput-per-dollar of native CUDA baselines.

Yue Tan will present her FPO "FAASTEN: An Architecture and implementation for securing cloud applications" on September 5, 2024 at 11am in CS 302.

Date and Time
Thursday, September 5, 2024 - 11:00am to 1:00pm
Location
Not yet determined.
Type
FPO

 

Yue Tan will present her FPO "FAASTEN: An Architecture and implementation for securing cloud applications" on September 5, 2024 at 11am in CS 302.

Zoom Link:  https://princeton.zoom.us/my/yuetan 


The members of her committee are as follows:
Amit Levy (advisor), Maria Apostolaki, Wyatt Lloyd
Readers: Mae Milano and Marcus Peinado (Microsoft Research)

All are welcome to attend. The FPO abstract follows below:

Modern web applications have evolved into intricate networks of micro-applications, posing challenges in ensuring the security of user data confidentiality and integrity. In the current discretionary security paradigm, developers must enforce policies for every microapplication and all data paths that can arise at run-time. A promising alternative paradigm, decentralized information flow control (DIFC), suggests that the underlying system should
enforce policies by tracking information flows. As the cloud has become a standard operating system for developing web applications, this paper proposes the first DIFC design for the cloud.

We present Faasten, an architecture and implementation for securing cloud applications. Faasten includes a DIFC model and a FaaS-inspired system interface that implements the DIFC model. We provide a detailed interface security analysis and showcase how three representative cloud applications benefit from Faasten. We also show that Faasten introduces negligible latencies in controlling information flows and policy storage costs.

Yufei Zheng will present her FPO "Compact Algorithms for Measuring Network Performance" on August 22, 2024 at 12pm in CS 301.

Date and Time
Thursday, August 22, 2024 - 12:00pm to 2:00pm
Location
Computer Science CS/Friend Quad
Type
FPO

Yufei Zheng will present her FPO "Compact Algorithms for Measuring Network Performance" on August 22, 2024 at 12pm in CS 301.

 

The members of Yufei's committee are as follows: 

Examiners: Jennifer Rexford (adviser), Maria Apostolaki, and Mark Braverman

Readers:Huacheng Yu and David Hay (The Hebrew University of Jerusalem)

 

All are welcome to attend.  Please see abstract below.

 

Abstract:

For network administrators, performance monitoring provides valuable insights into network quality and security. The emergence of programmable network devices makes it possible to measure performance metrics in the data plane, where packets fly by. Running measurement tasks directly in the data plane improves eciency, preserves user privacy, and enables the possibility of real-time actions based on the analysis. However, programmable data planes have limited memory size and memory accesses. Meanwhile, measuring performance metrics fundamentally requires a large amount of memory resources, as each packet must be processed relative to its predecessor in the same flow.

 

This dissertation tackles the specific challenges that arise from running performance measurements with limited memory. The first part of this dissertation introduces new data-plane algorithms for monitoring two canonical performance metrics related to Transmission Control Protocol (TCP): delay and TCP packet reordering.

• In measuring delay distribution, existing algorithms often exhibit bias against larger delays. We present fridges, a novel data structure that allows for the correction of the survivorship bias due to hash collisions. The main idea is to keep track of the probability p that a delay sample is collected, and count it as 1/p samples. Using multiple fridges together, we can further improve the accuracy by computing a weighted average of single-fridge estimators.

• We present ecient algorithms for identifying IP prefixes with heavy packet reordering. First, we sample as many flows as possible, regardless of their sizes, but only for a short period at a time. Next, we separately monitor the large flows over long periods, in addition to the flow sampling. Both algorithms measure at the flow level, and aggregate statistics at the prefix level.

 

Existing counter-based algorithms for identifying heavy-hitter flows could also be modified to measure performance metrics. However, many of such algorithms, despite of showing good empirical performance, lack theoretical guarantees. In the second part, we present the first formal analysis for the performance of one such algorithm,

the Random Admission Policy (RAP).

• We show that, for a highly skewed packet stream with k heavy flows, RAP with memory size k stores O(k) heavy flows with constant probability.

Kaifeng Lyu FPO

Date and Time
Friday, August 9, 2024 - 10:30am to 12:30pm
Location
Computer Science 402
Type
FPO

Kaifeng Lyu will present his FPO "Implicit Bias in Deep Learning Optimization: A Mathematical Examination" on Friday, August 9, 2024 at 10:30 AM in CS 402 and Zoom.

Location: Zoom link: https://princeton.zoom.us/j/99448789401

The members of Kaifeng’s committee are as follows:
Examiners: Sanjeev Arora (Adviser), Elad Hazan, Chi Jin
Readers: Danqi Chen, Jason Lee

A copy of his thesis is available upon request.  Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend his talk.

Abstract follows below:
Deep learning has achieved remarkable success in recent years, yet training neural networks often involves a delicate combination of guesswork and hyperparameter tuning. A critical aspect of this process is the “implicit bias” of optimization methods, where minor changes in the optimization setup—without affecting the small training loss at convergence—can drastically shift the solution to which the model converges, thereby affecting test performance. This dissertation presents a collection of results that mathematically characterize this implicit bias in various training regimes.

 

The first part of this dissertation explores how gradient descent, even without explicit regularization, can converge to solutions that maximize the margin. Previous results have established the first-order optimality of margin for homogeneous neural networks in general, but the global optimality of margin is not guaranteed due to their non-convex nature. This dissertation provides in-depth theoretical analyses when data has simple structures: for linearly separable data, we present both positive and negative results on whether the global optimality of margin can be attained. Furthermore, we show how this margin-based view can be used to explain interesting generalization phenomena in training neural networks with or without explicit regularization, including the simplicity bias and grokking phenomena.

The second part of the dissertation presents two results that capture the implicit biases induced by finite learning rate. Many existing analyses, including the margin-based ones in the first part, describe implicit biases that hold even when the learning rate is infinitesimal. However, practical implementations use finite learning rates, which have been empirically observed to benefit generalization. We analyze how full-batch GD with finite learning rates, combined with key training components like normalization layers and weight decay, create a bias towards flatter minima, which are positively correlated with better generalization. Additionally, we study the implicit bias in stochastic optimization and derive rigorous approximations for the dynamics of adaptive gradient methods like Adam and RMSprop via Stochastic Differential Equations (SDEs) to capture the effect of finite learning rates. Based on this, we also derive the square root scaling rule as a practical guideline for adjusting the optimization hyperparameters of adaptive gradient methods when changing batch size.

Dingli Yu FPO

Date and Time
Thursday, July 18, 2024 - 1:30pm to 3:30pm
Location
Computer Science 402
Type
FPO

Dingli Yu will present his FPO "Efficient Scaling of Large Models: Principles in Optimization and Data Aspects" on Thursday, July 18, 2024 at 1:30 PM in CS 402 and Zoom.

Location: Zoom link: https://princeton.zoom.us/j/7188314894?omn=99115109336

The members of Dingli’s committee are as follows:
Examiners: Sanjeev Arora (Adviser), Elad Hazan, Chi Jin
Readers: Danqi Chen, Mark Braverman

A copy of his thesis is available upon request.  Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend his talk.

Abstract follows below:
Deep learning has advanced remarkably in recent decades. Yet, its theoretical foundations, particularly in the realm of large models, still lag behind. This thesis focuses on research that combines strong theoretical foundations with practical applications in efficiently scaling up large models.

In the first part of the thesis, we focus on the training dynamics of neural nets by covering the theory of overparametrized neural nets. We will briefly introduce the theory of Neural Tangent Kernel (NTK), and proceed with Hyperparameter Transfer, an important application of the Tensor Program framework. We cover some of the earliest papers that establish NTK as a research field, along with the limitations of NTK. Hyperparameter Transfer is a novel and efficient paradigm for hyperparameter tuning by providing the optimal strategy for scaling up models. We introduce the characterization of the training dynamics for deep neural nets and offer an efficient hyperparameter selection scheme where optimal hyperparameters selected by tuning on shallow nets also work for deep nets.

In the second part of the thesis, we focus on the data aspect of large model scaling. We will first introduce Skill-Mix, a novel and unique evaluation that sidesteps issues of traditional large language model (LLM) evaluations like data contamination and cramming for leaderboard. Skill-Mix randomly selects k language skills, then prompts the LLM to produce a concise text that demonstrates the chosen skills. The exponentially growing number of skill combinations provably prevent data contamination and can further reveal the novelty of successful answers by powerful LLMs. We then introduce ConceptMix, an extension of Skill-Mix to evaluate the capabilities of text-to-image models to combine k random selected visual concepts. Finally, we uncover the capabilities of LLMs to learn and generalize skill compositions given good responses from Skill-Mix. The results show that a few thousand of such data is enough to significantly improve the model performance in unseen skill combinations, beating models with much larger sizes. It suggests incorporating skill-rich synthetic text into training is an efficient way to scale up the data

 

Follow us: Facebook Twitter Linkedin