Next: Introduction
A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR
LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL
DYNAMIC PROGRAMMING
Ronald J. Williams and Leemon C. Baird, III
College of Computer Science
Northeastern University
Boston, MA 02115
Abstract:
Combining elements of the theory of dynamic programming
with features appropriate for
on-line learning has led to an approach Watkins has
called incremental dynamic programming.
Here we adopt this incremental dynamic programming point of view
and obtain some preliminary mathematical results
relevant to understanding the capabilities and limitations of
actor-critic learning systems.
Examples of such systems are Samuel's learning checker player,
Holland's bucket brigade algorithm, Witten's adaptive controller, and
the adaptive heuristic critic algorithm of Barto, Sutton, and Anderson.
Particular emphasis here is on the effect of complete asynchrony
in the updating of the actor and the critic across individual states
or state-action pairs.
The main results are that, while convergence to optimal performance
is not guaranteed in general, there are a number of situations
in which such convergence is assured.
Leemon Baird
Thu Oct 12 12:44:21 MDT 1995