- Games J2ME做的一个游戏
- 12864chuankou 12864液晶显示屏的串口协议程序编写
- The-Finite-Element-Method-using-MATLAB---Kwon-and This for the programming in matlab to analyze using finite element method. The finite element method can be applied to problems in various fields of science and engineering.
- domimplementationcreatedocument04 The createDocument method should throw a NAMESPACE
- light 实现流水灯的控制以及 程序可以稍作修改 变换为可控源的代码
- glut库文件全 window下opengl的glut库; glut库用于opengl编程在Windows上创建一个简单的窗口(The glut Library of OpenGL under window; The glut library is used for OpenGL programming to create a simple window on Windows)
文件名称:WindyGridWorldQLearning
介绍说明--下载内容来自于网络,使用问题请自行百度
Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Markovian
domains. It amounts to an incremental method for dynamic programming which imposes limited computational
demands. It works by successively improving its evaluations of the quality of particular actions at particular states.
This paper presents and proves in detail a convergence theorem for Q,-learning based on that outlined in Watkins
(1989). We show that Q-learning converges to the optimum action-values with probability 1 so long as all actions
are repeatedly sampled in all states and the action-values are represented discretely. We also sketch extensions
to the cases of non-discounted, but absorbing, Markov environments, and where many Q values can be changed
each iteration, rather than just one.
domains. It amounts to an incremental method for dynamic programming which imposes limited computational
demands. It works by successively improving its evaluations of the quality of particular actions at particular states.
This paper presents and proves in detail a convergence theorem for Q,-learning based on that outlined in Watkins
(1989). We show that Q-learning converges to the optimum action-values with probability 1 so long as all actions
are repeatedly sampled in all states and the action-values are represented discretely. We also sketch extensions
to the cases of non-discounted, but absorbing, Markov environments, and where many Q values can be changed
each iteration, rather than just one.
(系统自动生成,下载前可以参看下载内容)
下载文件列表
WindyGridWorldQLearning.m
本网站为编程资源及源代码搜集、介绍的搜索网站,版权归原作者所有! 粤ICP备11031372号
1999-2046 搜珍网 All Rights Reserved.