Learning to act in a multiagent environment is a challenging problem.
Optimal behavior for one agent depends upon the behavior of the other
agents, which may be learning as well. Multiagent environments are
therefore non-stationary, violating the traditional assumption
underlying single-agent learning. In addition, agents in complex
tasks may have limitations, such as unintended physical constraints or
designer-imposed approximations of the task that make learning
tractable. Limitations prevent agents from acting optimally, which
complicates the already challenging problem. A learning agent must
effectively compensate for its own limitations while exploiting the
limitations of the other agents. My thesis research focuses on these
two challenges. The novel contributions of my thesis include (1) the
WoLF (Win or Learn Fast) variable learning rate as a new principle
that enables convergence to optimal responses in multiagent learning;
(2) an analysis of the existence of Nash equilibria when agents have
limitations; and (3) GraWoLF as a scalable multiagent learning
algorithm.
In this talk I focus on the contributions of the WoLF principle and
the GraWoLF algorithm. I show that the WoLF variable learning rate
causes learning to converge to optimal responses in settings of
simultaneous learning. I demonstrate this converging effect both
theoretically in a subclass of single-state games and empirically in a
variety of multiple-state domains. I then describe GraWoLF, a
combination of policy gradient techniques and the WoLF principle. I
show compelling results of applying this algorithm to a card game with
an intractably large state space as well as an adversarial robot task.
These results demonstrate that WoLF-based algorithms can effectively
learn in the presence of other learning agents, and do so even in
complex tasks with limited agents.