- 1 REINFORCEMENT LEARNING SYSTEMS
- 1.1 MARKOV SEQUENTIAL DECISION PROCESSES
- 1.2 REINFORCEMENT LEARNING
- 2 RESIDUAL-GRADIENT ALGORITHMS
- 3 THE Q-LEARNING ALGORITHM
- 3.1 Q-LEARNING (Non-residual-gradient form)
- 3.2 RESIDUAL-GRADIENT Q-LEARNING
- 3.3 Q-LEARNING IN CONTINUOUS TIME
- 4 THE ADVANTAGE UPDATING ALGORITHM
- 4.1 ADVANTAGE UPDATING (Non-residual-gradient form)
- 4.2 RESIDUAL-GRADIENT ADVANTAGE UPDATING
- 5 DIFFERENTIAL GAMES
- 6 SIMULATED MISSILE-AIRCRAFT DIFFERENTIAL GAME
- 6.1 GAME DEFINITION
- 6.2 THE BELLMAN RESIDUAL AND UPDATE EQUATIONS
- 7 RESULTS
- 7.1 INITIAL RESULTS FOR ADVANTAGE UPDATING
- 7.1.1 Experiment 1
- 7.2 COMPARATIVE ASSESSMENT
- 7.2.1 Experiment 2
- 7.2.2 Experiment 3
- 8 CONCLUSIONS
- 8.1 FUTURE RESEARCH
- Acknowledgment
- References
- Appendix A: Notation
- Appendix B: Difficulties in combining function-approximation systems with reinforcement learning algorithms.