Edgar Minasyan FPO
Location:
The members of
Examiners:
Readers:
A copy of
Everyone is invited to attend
Abstract follows below:
In the last century, the problem of controlling a dynamical system has been a core component in numerous applications in autonomous systems, engineering, and sciences. The theoretical foundations built so far include the field of classical control theory as well as recent advances in leveraging regret minimization techniques for guarantees in online control. In either case, however, a linearity assumption is prevalent to enable the derivation of provable guarantees and efficient algorithms. In this thesis, we present several directions to alleviate the linearity assumption in the problem of online control. At the core of all the results lies the regret minimization frame- work: the presented works design new, as well as heavily rely on existing, methods and notions in online convex optimization.
The ultimate goal of this direction is to derive efficient algorithms that perform optimal online control of nonlinear dynamical systems. This goal in full generality is NP-hard hence certain concessions need to be made. We build upon the recent non- stochastic control framework that enables efficient online control of linear dynamical systems: our first advance is to consider time-varying linear dynamical systems which are commonly used to approximate nonlinear dynamics. We argue that the correct performance metric in this setting is the notion of adaptive regret developed for online learning in changing environments and proceed to design an efficient control method for known time-varying linear dynamics. The extension to unknown dynamics proves to be qualitatively harder: our upper/lower bound results show that the system variability of the unknown dynamics is a determining factor on whether meaningful (adaptive) regret guarantees can be achieved. The policies in the described advances enjoy linear/convex parameterizations which can be constraining for complex dynamical systems. Hence, we additionally explore the use of neural network-based policies in online episodic control and derive an efficient regret-minimizing algorithm that in- corporates the interpolation dimension as an expressivity metric for the policy class.