1. 1 REINFORCEMENT LEARNING SYSTEMS
    1. 1.1 MARKOV SEQUENTIAL DECISION PROCESSES
    2. 1.2 REINFORCEMENT LEARNING
  2. 2 RESIDUAL-GRADIENT ALGORITHMS
  3. 3 THE Q-LEARNING ALGORITHM
    1. 3.1 Q-LEARNING (Non-residual-gradient form)
    2. 3.2 RESIDUAL-GRADIENT Q-LEARNING
    3. 3.3 Q-LEARNING IN CONTINUOUS TIME
  4. 4 THE ADVANTAGE UPDATING ALGORITHM
    1. 4.1 ADVANTAGE UPDATING (Non-residual-gradient form)
    2. 4.2 RESIDUAL-GRADIENT ADVANTAGE UPDATING
  5. 5 DIFFERENTIAL GAMES
  6. 6 SIMULATED MISSILE-AIRCRAFT DIFFERENTIAL GAME
    1. 6.1 GAME DEFINITION
    2. 6.2 THE BELLMAN RESIDUAL AND UPDATE EQUATIONS
  7. 7 RESULTS
    1. 7.1 INITIAL RESULTS FOR ADVANTAGE UPDATING
      1. 7.1.1 Experiment 1
    2. 7.2 COMPARATIVE ASSESSMENT
      1. 7.2.1 Experiment 2
      2. 7.2.2 Experiment 3
  8. 8 CONCLUSIONS
    1. 8.1 FUTURE RESEARCH
      1. Acknowledgment
      2. References
  9. Appendix A: Notation
  10. Appendix B: Difficulties in combining function-approximation systems with reinforcement learning algorithms.