This was Dr. Tanaka’s Ph.D. work whose goal was to expand the basic framework of reinforcement learning so that it could deal with a continuous, lifelong learning scenario in which the agent (robot) was given multiple (infinite) learning tasks one by one. The agent was supposed to utilize its past learning experience to solve the current task effectively. This problem was formulated by introducing the distribution of MDPs (Markov Decision Processes) and then a reinforcement learning algorithm was derived.
The original study was conducted about 20 years ago; however, the idea of transfer of knowledge and developmental learning is still important, and is indeed a required element for current robots at large.
- Fumihide Tanaka, Masayuki Yamamura: An Approach to Lifelong Reinforcement Learning through Multiple Environments, Proceedings of the 6th European Workshop on Learning Robots (EWLR-6), pp.93-99, Brighton, UK, August 1997 [pdf, 171KB]
- Fumihide Tanaka, Masayuki Yamamura: Multitask Reinforcement Learning on the Distribution of MDPs, Proceedings of the 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA 2003), pp.1108-1113, Kobe, Japan, July 2003 [pdf, 341KB, IEEE Xplore]
- Fumihide Tanaka, Masayuki Yamamura: Exploiting Value Statistics for Similar Continuing Tasks, Proceedings of the 2003 IEEE International Workshop on Robot and Human Interactive Communication (RO-MAN 2003), pp.271-276, Millbrae, USA, October 2003 [pdf, 309KB, IEEE Xplore]