next up previous
Next: Introduction

A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL DYNAMIC PROGRAMMING

Ronald J. Williams and Leemon C. Baird, III
College of Computer Science
Northeastern University
Boston, MA 02115

Abstract:

Combining elements of the theory of dynamic programming with features appropriate for on-line learning has led to an approach Watkins has called incremental dynamic programming. Here we adopt this incremental dynamic programming point of view and obtain some preliminary mathematical results relevant to understanding the capabilities and limitations of actor-critic learning systems. Examples of such systems are Samuel's learning checker player, Holland's bucket brigade algorithm, Witten's adaptive controller, and the adaptive heuristic critic algorithm of Barto, Sutton, and Anderson. Particular emphasis here is on the effect of complete asynchrony in the updating of the actor and the critic across individual states or state-action pairs. The main results are that, while convergence to optimal performance is not guaranteed in general, there are a number of situations in which such convergence is assured.





Leemon Baird
Thu Oct 12 12:44:21 MDT 1995