Reinventing Partially Observable Reinforcement Learning
In this presentation I will describe a new framework for such decision making, learning, and tracking. This framework applies results that we achieved about updating logical formulas (belief states) after deterministic actions. It includes algorithms that represent and update the set of possible action models and world states compactly and tractably. It makes a decision with this set, and updates the set after taking the chosen action. Most importantly, and somewhat surprisingly, the number of actions that our framework takes to achieve a goal is bounded polynomially by the length of an optimal plan in a fully observable, fully known domain, under lax conditions. Finally, our framework leads to a new stochastic-filtering approach that has better accuracy than previous techniques.
* Joint work with Allen Chang, Hannaneh Hajishirzi, Stuart Russell, Dafna Shahaf, and Afsaneh Shirazi (IJCAI'03,IJCAI'05,AAAI'06,ICAPS'06,IJCAI'07,AAAI'07).