搜索资源列表
srcV0624
- 这个代码是policy iteration算法关于强化学习的. 请您用winzip 解压缩-policy iteration algorithm for enhanced learning. Please use winzip decompress
MDPtoolbox
- The MDP toolbox proposes functions related to the resolution of discrete-time Markov Decision Process : finite horizon, value iteration, policy iteration, linear programming algorithms with some variants. The functions (m-functions) were developpe
MDPtoolbox
- 马尔科夫决策过程值迭代算法value iteration,策略迭代等函数代码,从国外网站下载,非常详细,有用。-Markov decision process value iteration algorithm value iteration, policy iteration and so the function code, from the foreign website, very detailed and useful.
PolicyItr
- This is policy iteration learning algorithm
policyi
- Policy iteration algorithm of Howard applied to linear regulator Also known as Newton s method -Policy iteration algorithm of Howard applied to linear regulator Also known as Newton s method
MDP-model-of-MPNP
- 在matlab平台上,针对多周期报童问题,采用值迭代算法、策略迭代算法和强化学习算法求解MDP模型的实例-This is an example presentting how to apply value-iteration algorithm,policy-iteration algorithm and reinforcement learning algorithm to MDP model, which aims to solve the multi-period newsboy prob
mdpPI
- Contains excellent and exact implementation of Markov Dec. Processes using Policy Iteration and Value Itreration from the book Perter Norvig for AI.-Contains excellent and exact implementation of Markov Dec. Processes using Policy Iteration and Value
MachineLearningMazePolicyEvaluation
- Machine Learning Code maze policy iteration value iteration
CleanRobot
- 清洁机器人,确定情况,随机情况,策略迭代,Q值计算,人工智能实验-Cleaning robot to determine the situation, the random case, policy iteration, Q value, artificial intelligence experiment
Policy-iteration
- this code simulates the policy improvement iteration in a 3*3 grid game.
MDPgridworldExample
- 世界是空格自由(0)或障碍物(1)。每转动机器人可以在8个方向移动,或者留在地方。奖励功能,给人一种自由空间,目标定位,高回报。所有其他空格自由具有小的损失,和障碍具有大的负的奖励。值迭代是用来学习的最佳“政策”,即指定一个控制输入到每一个可能的位置的功能。- The world is freespaces (0) or obstacles (1). Each turn the robot can move in 8 directions, or stay in place. A reward
inverted-pendulum-control
- 利用强化学习的自适应动态规划中的值迭代和策略迭代方法,神经网络控制方法,LQR状态调节器最优控制方法,实现了三维倒立摆在飞行器上的稳定控制。鲁棒性很强,进行了高斯白噪声的扰动实验。-Reinforcement learning adaptive dynamic programming in value iteration and policy iteration method, neural network control method, LQR state regulator optimal
MDP_pi.py
- Reinforcement Learning. Policy iteration algorithm. Original coded.
pi.py
- Reinforcement Learning policy iteration algorithm
Dynamic-Programming-master
- 经典的基于策略迭代和值迭代法的动态规划matlab代码,实现机器人的最优运输(The classic dynamic programming matlab code based on policy iteration and value iteration method realizes the optimal transportation of the robot.)