SIA OpenIR  > 工业控制网络与系统研究室
基于 Actor-Critic 算法实现机器人柔性负载系统的振动抑制
Alternative TitleVibration suppression of flexible load system of robot based on Actor-Critic Algorithm
李璐君
Department工业控制网络与系统研究室
Thesis Advisor尚志军
Keyword机器人 Actor-Critic算法 柔性负载 振动抑制 强化学习
Pages70页
Degree Discipline控制工程
Degree Name专业学位硕士
2020-05-26
Degree Grantor中国科学院沈阳自动化研究所
Place of Conferral沈阳
Abstract社会生产水平的发展和提高对机器人装配和搬运工作的精确性 提出了更高的要求。然而,负载在机器人操作过程中往往存在柔性且不可忽略, 继而会在装配和搬运时出现振动现象。振动的存在,一方面会大大降低对目标位置的定位精度,另一方面,在工业生产中,出于对安全性和操作精确性的考虑, 机器人的操作以及后续的生产活动必须等到柔性负载末端的振动停止或衰减至 操作允许范围内之后才能继续进行,这严重地影响了机器人操作效率和控制精度,增加了机器人系统结构设计的复杂性和控制方法实现的困难性。在对机器人装配和搬运工作的精确性与安全性要求很高的领域,柔性负载末端振动必须得到 抑制。尽管柔性负载的振动问题逐渐得到重视,相应的抑振方法也不断丰富,但是,柔性负载模型以及机器人模型的复杂非线性使得传统抑振方法控制器的设计过 程复杂,设计难度也很大。传统的振动抑制方法很难取得非常理想的效果。本文主要研究内容是实现柔性负载的振动抑制,并将振动抑制方法进一步应 用于机器人柔性负载系统的振动抑制上。而 AC(Actor-Critic)方法作为近年来强 化学习方向比较热门的一类方法成为本文研究的重点。Critic 部分判断 agent动作的好坏,Actor 部分根据 Critic 网络给出的判断不断更新权值,修正动作输出 概率,从而学习得到最优策略。这样,只要不断地根据当前的动作概率分布输出 动作,得到相应的环境反馈并修正动作概率分布,就能最终学习到良好的策略而不必考虑柔性负载建模的复杂性与环境的不确定性。然而,AC 方法在每次动作 选择中都是随机选择其估计最优动作的某一侧的动作进行采样并学习。在某些情况下,如果只对单个动作进行采样,很有可能会选到远离实际最优策略方向的动作,并向该方向更新神经网络的参数。直到一次甚至多次的动作选择结束以后, 参数更新的方向才重新向着实际最优方向前进。这种情况下,算法会反复地修改神经网络的参数,无法快速高效地收敛。因此,本文提出了 DSAC(Double Sampling Actor-Critic)算法,在估计最优动作两边进行对称采样,选择评价最好 的动作作为输出,并在此基础上进行参数更新。经证明,相比于原来的 AC 方法,DSAC 算法可以加快收敛,且在相同次数的迭代更新以后,可以得到更好的控制效果。本文针对具有非线性和复杂结构的柔性负载的振动抑制进行了研究,将 AC算法引入抑振控制,并在此基础上提出改进后的 DSAC 算法,将其同自适应模糊 PD控制以及输入预整形控制这样的传统方法进行对比和分析,通过仿真与实验,验证了 AC 算法及其优化算法的有效性和可行性。此外,本文还将 AC 算法 封装进机器人运动函数库中并进行机器人柔性负载系统的抑振实验,得到了比较理想的实验结果,为之后将强化学习方法应用在机器人柔性负载系统的振动抑制上提供了研究思路和研究基础。
Other AbstractThe development and improvement of social production level put forward higher requirements for the accuracy of robot assembly and handling. However, the load is often flexible and can not be ignored in the process of robot operation, and then it will vibrate during assembly and handling. The existence of vibration, on the one hand, will greatly reduce the positioning accuracy of the target position, on the other hand, will seriously affects the operation efficiency and control accuracy of the robot, increase the complexity of the robot system structure design and the difficulty of the realization of the control method because in industrial production, for the sake of safety and operation accuracy, robot operation and subsequent production activities must wait until the vibration at the end of the flexible load stops or declines to the allowable range of operation before continuing. In the field of high accuracy and safety requirements for robot assembly and handling, the vibration at the end of flexible load must be restrained. Although the vibration problem of flexible load has been paid more attention, and the corresponding methods of vibration suppression have been enriched, the complexity and nonlinearity of the flexible load model and the robot model make the design process of the traditional vibration suppression controller complex and difficult to design. It is difficult to get very ideal effect by traditional vibration suppression method. The main content of this thesis is to realize the vibration suppression of flexible load, and apply the vibration suppression method to the vibration suppression of flexible load system of robot. As a popular method of reinforcement learning in recent years, AC method becomes the focus of this thesis. The Critic part judges the action of the Agent, and the Actor part updates the weight according to the judgment given by the Critic network, corrects the output probability of the action, so as to learn the optimal strategy. In this way, as long as the action is outputted according to the current action probability distribution, and the corresponding environment feedback is obtained and the action probability distribution is modified, a good strategy can be finally learned without considering the complexity of flexible load modeling and the uncertainty of environment. However, in each action selection, the AC method randomly selects the action on one side of the estimated optimal action for sampling and learning. In some cases, if a single action is sampled, it is likely to select the action far away from the actual optimal policy direction, and update the parameters of the neural network in that direction. Until the end of one or more action choices, the direction of parameter updating will move to the actual optimal direction again. In this case, the algorithm will repeatedly modify the parameters of the neural network, unable to quickly and efficiently converge. Therefore, DSAC (double sampling actor critical) algorithm is proposed in this thesis. Symmetrical sampling is carried out on both sides of the optimal action, the best action is selected as the output, and parameters are updated on this basis. Compared with the original AC method, DSAC algorithm can accelerate convergence, and get better control effect after the same number of iterations. In this thesis, the vibration suppression of the flexible load with nonlinear and complex structure is studied. The AC algorithm is introduced into the vibration suppression control. On this basis, the improved DSAC algorithm is proposed. It is compared and analyzed with the traditional methods such as adaptive fuzzy PD control and input preshaping control. Through simulation and experiment, the effectiveness and feasibility of the AC algorithm and its optimization algorithm is verified. In addition, the AC algorithm is encapsulated in the robot motion function library and the vibration suppression experiment of the robot flexible load system is carried out. The ideal experimental results are obtained, which provide the research ideas and research basis for the application of reinforcement learning method in the vibration suppression of the robot flexible load system.
Language中文
Contribution Rank1
Document Type学位论文
Identifierhttp://ir.sia.cn/handle/173321/27146
Collection工业控制网络与系统研究室
Affiliation中国科学院沈阳自动化研究所
Recommended Citation
GB/T 7714
李璐君. 基于 Actor-Critic 算法实现机器人柔性负载系统的振动抑制[D]. 沈阳. 中国科学院沈阳自动化研究所,2020.
Files in This Item:
File Name/Size DocType Version Access License
基于Actor-Critic 算法实现机(2902KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[李璐君]'s Articles
Baidu academic
Similar articles in Baidu academic
[李璐君]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[李璐君]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.