The explore/exploit dilemma in human reinforcement learning: Computation, behavior, and neural substrates
Using behavioral analysis and functional neuroimaging in a bandit task, we study how humans approach this dilemma. We assess the fit to participants' trial-by-trial choices of different exploratory strategies from reinforcement learning, and, having validated an algorithmic account of behavior, use it to infer subjective factors such as when subjects are exploring versus exploiting. These estimates are then used to search for neural signals related to these phenomena. The results support the hypothesis that exploration is encouraged by the active override of an exploitative choice system, rather than an alternative, computationally motivated hypothesis under which a single (putatively dopaminergic) choice system integrates information about both the exploitative and exploratory ("uncertainty bonus") values of candidate actions. Although exploration is ubiquitous, it is also difficult to study in a controlled manner: We seize it only through the tight integration of computational, behavioral, and neural methods.