SIA OpenIR  > 机器人学研究室
面向机械臂控制的强化学习方法研究
Alternative TitleReinforcement Learning Algorithms for Robotic Manipulator Control
胡亚洲1,2
Department机器人学研究室
Thesis Advisor王文学
Keyword机器人控制 强化学习 值函数近似 优势函数 自由能原理
Pages120页
Degree Discipline机械电子工程
Degree Name博士
2020-12-01
Degree Grantor中国科学院沈阳自动化研究所
Place of Conferral沈阳
Abstract强化学习(Reinforcement Learning,RL)是通过智能体(Agent)不断地对所处环境(Environment)进行探索和利用并根据奖励(reward)进行的一种经验学习。由于其具有自主学习能力且能够模拟人类和动物的学习过程而被认为是一种能有效解决控制与决策问题的机器学习方法。RL 已被广泛应用在工业制造、游戏博弈、生物医药、经济决策等领域中,尤其在机器人控制方面取得了巨大的成功,如ANYmal 机器人、机械手Dynamixel Claw 等。然而,将强化学习应用在机械臂控制上仍有一些亟待解决的问题,比如维度灾难、动力学转移模型不精确、学习目标不明确以及探索与利用的平衡等等。因此,开展面向机械臂控制的RL 技术方面的研究,具有较为重要的理论意义及工程应用指导价值。本文通过阅读、分析和综合国内外RL、机器人控制等相关的技术报告及文献资料,开展面向机械臂控制的RL 方法的研究。主要的贡献集中在以下几个方面:(1) 提出了基于核函数的强化学习方法。基于模型的强化学习(Model-based Reinforcement Learning, MBRL)具有样本利用率高、学习速度快的特性,但依赖 转移模型,且其学习性能易受模型精度的影响;无模型的强化学习(Model-free Reinforcement Learning, MFRL)无需先验知识和转移模型参与优化,但算法的收敛速度慢、样本需求量大。本研究提出了基于核函数的强化学习方法,建立了描述训练样本内部规律的核函数模型,重新定义了强化学习的状态,构建了机器人动作和状态的转移模型,实现了MBRL 和MFRL 优势的结合。实验验证了该方法在机械臂的跟踪控制任务中具有训练时间少、跟踪误差小、能耗低等优点。(2) 提出了基于神经网络的强化学习方法。根据神经网络函数近似技术,提出了基于神经网络的强化学习方法,构建了一种强化学习神经网框架,实现了强化学习中状态、控制策略和性能评价的在线估计;利用值函数优化技术实现了对状态、动作和评价网络的优化,降低了模型精度和对连续信息离散化导致的维度灾难对学习性能的影响。实验验证了该方法在非线性条件下具有一定的鲁棒性。(3)提出了数据驱动的双优势网络模型预估强化学习方法。基于模型预估控制(Model Predictive Control, MPC)能够通过对未来信息的评估实现对控制策略学习的特点,提出了双优势模型预估强化学习方法,建立了双优势导向网络加快了学习过程;引入双优势导向网络对抗学习机制,打破了训练数据的关联性,避免了陷入局部最优和不收敛的现象。实验验证了该方法能够在安全约束条件下实现较好的跟踪和运动控制。(4)提出了基于自由能的强化学习方法。借鉴人脑的工作机理,结合能够有效平衡探索与利用关系的自由能原理构建了基于贝叶斯的控制策略概率模型和基于交叉熵的自由能目标函数,一定程度上实现了强化学习中探索与利用的平衡关系,并在具有系统参数不确定和外界扰动的条件下通过实验验证了该方法的有效性。
Other AbstractRL is a reward-based empirical learning, which can make continuous exploration and exploitation between agent and environment. Because of its independent learning ability and capacity to simulate the learning process of humans and animals, it is considered to be one kind of machine learning methods that can effectively solve control and decision-making problems. RL has been widely employed in industrial manufacturing, video game, biomedicine, economic decision-making, especially in the field of robotic control, it has made great success, such as ANYmal robot, robotic hand Dynamixel Claw. However, there are still some problems that need to be solved urgently when applying reinforcement learning to robotic control, such as the curse of dimensionality, inaccurate dynamic models, curse of goal specification and balancing the trade-off between exploration and exploitation. All in all, it is pretty important to carry out research on RL technology, which can provide the guidance value of engineering application for RL in robotic control. Therefore, this dissertation carries on research of RL algorithms for robotic control after reading, analysing and synthesizing the RL-related technical reports and literature in both domestic and foreign. The main contributions of this dissertation are concentrated on the following aspects: (1) A kernel trick-based reinforcement learning algorithm is proposed. MBRL has the characteristics of high sample utilization and fast learning speed, but it depends on dynamic model, and its performance for learning is easily affected by the accuracy of model. MFRL does not require prior knowledge and dynamic model during optimization, however, its convergence speed is slow, and a large number of samples are needed for finding the optimal solution. A kernel trick-based reinforcement learning method is proposed in this section. In this method, a Gaussian kernel function model is established to describe the internal laws of training samples. The state of RL is redefined, and the dynamic model and corresponding reward are also generated via this model. The combination of advantages between MBRL and MFRL is realized in this algorithm. Experiments verify that this method has the following advantages in tracking control task of robotic manipulator: less training time, less tracking error and low energy consumption. (2) A neural network-based RL method is formed. According to neural network function approximation technology, a neural network-based RL method is pictured. A RL neural network framework is constructed to realize online estimation for state, control policy and performance evaluation. The state network, action network and critic network are all optimized by value function optimization technology. Subsequently, the impact of model accuracy for learning performance can be reduced, and the curse of dimensionality, caused by discretization of continuous information, can be relieved. The robustness of this proposed method is demonstrated in a nonlinear simulation platform. (3) A data-driven dual advantage model predictive RL algorithm is imaged. MPC can perform the study of control policy by evaluating future information. Combining this characteristic of MPC with RL, a dual-advantage model predictive RL algorithm is formed. An advantage-oriented network is established in order to speed up the learning process. In the meanwhile, a dual advantage-oriented network adversarial learning mechanism is constructed to break the correlation of training data and avoid falling into local optimal and non-convergence phenomena. Simulation experiments show that this proposed method can achieve better tracking and motion control with safety constraints. (4) A free energy-based reinforcement learning method is pictured. By referring to the working mechanism of human brain and combining the principle of free energy that can effectively balance the trade-off between exploration and exploitation, a Bayes-based probability model for control policy and an entropy-based free energy objective function of perceptual state are built. In this way, the balance between exploration and exploitation is fulfilled. Finally, the effectiveness of this method is illustrated in a simulation environment with uncertain system parameters and external disturbances.
Language中文
Contribution Rank1
Document Type学位论文
Identifierhttp://ir.sia.cn/handle/173321/27978
Collection机器人学研究室
Affiliation1.中国科学院沈阳自动化研究所
2.中国科学院大学
Recommended Citation
GB/T 7714
胡亚洲. 面向机械臂控制的强化学习方法研究[D]. 沈阳. 中国科学院沈阳自动化研究所,2020.
Files in This Item:
File Name/Size DocType Version Access License
面向机械臂控制的强化学习方法研究.pdf(4825KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[胡亚洲]'s Articles
Baidu academic
Similar articles in Baidu academic
[胡亚洲]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[胡亚洲]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.