Learning to act using real-time dynamic programmingAndrew G. Barto, Satinder Singh, Steven J. Bradtke|Artificial Intelligence|1995Cited by 1.1k
Linear Least-Squares algorithms for temporal difference learningSteven J. Bradtke, Andrew G. Barto|Machine Learning|1996Cited by 639
Adaptive linear quadratic control using policy iterationSteven J. Bradtke, Andrew G. Barto, B. Erik Ydstie|Unknown|2005Cited by 417