SIA OpenIR  > 机器人学研究室
移动机器人自主增量式导航行为学习
Alternative TitleAutonomous Incremental Learning for Mobile Robot Navigation
胡艳明
Department机器人学研究室
Thesis Advisor何玉庆
Keyword移动机器人导航 增量式学习 机器人学习 主动学习 潜意识学习
Pages122页
Degree Discipline模式识别与智能系统
Degree Name博士
2020-05-28
Degree Grantor中国科学院沈阳自动化研究所
Place of Conferral沈阳
Abstract移动机器人在现代社会与经济的发展中扮演着重要的角色,它显著地提高了人类的工作效率与安全性,并加快了科学发展的进程。目前,比较成熟的移动机器人应用主要是面向特定环境下的任务。随着机器人与人工智能技术的快速发展,人类对智能化需求的不断提高,越来越需要能在未知、开放式环境下自主工作,并具有高度适应性的机器人。本论文以移动机器人导航任务为研究对象,以赋予机器人不断适应新环境的能力为目标,针对导航行为的产生、优化与学习等科学问题展开研究。主要研究内容如下:为了使移动机器人能够在未知、开放式环境中安全、平稳的行驶,本文首先对基于优化的规划方法进行了研究,提出基于贝塞尔曲线的路径规划方法。该方法主要包括轨迹规划与速度规划两部分。其中,轨迹规划在考虑机器人动力学与运动学约束的基础上,将轨迹的安全性、平滑性以及一致等指标引入导航行为的优化目标函数中,得到安全、平滑的轨迹。速度规划根据轨迹规划过程中产生的信息来量化环境的危险程度,并作为衰减因子加入速度中,使机器人可以根据当前场景的危险程度自适应调整行驶速度。传统规划方法往往需要通过耗时的人工调整参数过程才能得到特定环境下合适的导航行为,但仍无法使机器人快速适应新的环境。人类在日常生活中处理任务时所表现出的惊人适应性一直是期望中机器人应该具备的能力。这种适应性在人类刚出生时并不存在,它是个体学习的结果。为了使机器人能够不断适应新的环境,本文参照人类的学习特点与方式,并借鉴生物学的陌生度与好奇心机制,赋予机器人自主增量式学习的能力。首先,本文以径向基函数(RBF)网络作为增量式学习模型,并提出基于近似线性独立准则的L2正则项约束核递归最小二乘(ALD-L2KRLS)算法作为增量式学习算法。ALD-L2KRLS可以根据新到来的数据对网络模型的结构与参数进行在线学习。该增量式学习方法使机器人仅依赖每时每刻产生的新数据与历史知识不断学习新的行为,也即具备类人的知识积累、整合与更新能力。然后,融合增量式学习方法与强化学习,提出基于主动探索的增量式学习方法,使机器人能够通过与环境的自主交互学习导航行为。在该方法中,为了提高学习效率并降低交互成本,提出局部$\epsilon$-greedy策略。该策略通过给每个网络节点维护一个变量 ,来计算某一状态的探索概率,并以此评价样本的重要性。局部$\epsilon$-greedy策略使机器人能够主动的选择交互的策略与数据。考虑到现有的机器人仅依赖强化学习学习效率与效果仍无法得到保障,本文最后部分融合增量式学习与示教学习,提出基于人在回路的增量式学习方法,使机器人能够通过与人类的自主交互学习导航行为。为了避免重复学习已有行为并减少人类示教的成本,利用近似线性独立(ALD)条件使机器人能够主动地向人类请求示教。为了缓解人类示教的不确定性与次优性,将基于优化的行为生成方法融合到示教样本的生成模块中,使人类仅需要向机器人提供稀疏的引导点。借鉴人脑基底神经节中纹状边缘区的功能,构建了机器人内部高层回路,使其能够像人脑一样进行潜意识学习,进一步提高机器人的学习效率。本文的研究工作包括了基于传统规划的机器人行为产生与优化方法和机器人自主增量式学习方法。为了验证方法的有效性,本文搭建了实际的室外与室内移动机器人平台以及仿真环境。大量的实验验证了本文提出方法的有效性。本文的目的是使机器人像人类一样不断适应新的环境,在一定程度上解决了传统机器人行为产生与学习方法的局限。但要使机器人达到人类一样的行为能力和智能水平,未来还有许多问题需要解决。
Other AbstractMobile robot plans an important role in the development of modern society and economy. it significantly improves the efficiency and safety of human and speeds up the process of scientific development. In recent years, the mature application of mobile robot is mainly about the task under the specific environment. With the rapid development of robotics and artificial intelligence technology, the demand for the robots that have high adaptability and be able to work in unknown and open environment autonomously is become more eagerly. In this dissertation, navigation task of mobile robot is taken as the research object. The scientific questions of the dissertation contains the generation, optimization and learning of navigation behaviour. We aim to confer the ability that able to adapting to new environment on robot. The main research contents are as follows: In order to achieve a safely and smoothly travel behaviour of mobile robot in unknown and open environment, we study the optimization based path planning method first and propose the B{\'e}zier curve based path planning method. The method includes two parts: trajectory planning and speed planning. Trajectory planning introduces the safety, smoothness and consistency of trajectory into the optimization objective function which is subject to the dynamic and kinematic constraints to obtain a safe and smooth trajectory. Speed planning quantifies the dangerous of environment according to the information generated by trajectory planning, then adds it to the speed formula as an attenuation factor. This speed planning make robot adjust the driving speed adaptively according to the dangerous of the current scene. The time-consuming manual parameters adjustment is often needed in traditional planning methods to get the appropriate navigation in a specific environment. But, traditional planning methods based robot cannot adapt to the new environment quickly. The amazing adaptability of human beings when they deal with tasks in daily life has always been the desired ability that robots should have. This kind of adaptability does not exist at birth, it is the result of individual learning. In order to make robot has the ability of adapting to the new environment autonomously, we confer the ability of autonomous incremental learning on robot by referring to the characteristics and ways of human learning and the mechanism of biological strangeness and curiosity. Firstly, the radial basis function (RBF) network is used as the incremental learning model. Then, the approximate linear dependence based kernel recursive least squares algorithm with L2 regular term constrain (ALD-L2KRLS) is proposed as the incremental learning algorithm. ALD-L2KRLS learns the structure and parameters of the network model on-line according to the new received data. This incremental learning method enables robots to learn new behaviours only depending on the new data and historical knowledge generated at every moment, and makes the robot have the ability of human like knowledge accumulation, integration and update. Then, an active exploration based incremental learning framework is proposed by integrating incremental learning method and reinforcement learning. This framework enables the robot to learn navigation behaviour through autonomous interaction with the environment. In order to improve learning efficiency and reduce the interaction cost, a local $\epsilon$-greedy policy is proposed in the method. The policy calculates the exploration probability for each received state by maintaining a variable for each node in network. Local $\epsilon$-greedy policy based robot can select interactive strategies and data actively. The efficiency and effect of robots that only rely on reinforcement learning cannot be guaranteed. In the last part of this dissertation, an human-in-the-loop based incremental learning framework is proposed by integrating incremental learning method and learning from demonstration (LfD) approach. This framework enables robot to learn navigation behaviour through autonomous interaction with humans. In order to avoid repeated learning of existing behaviours and reduce the cost of human teaching, the approximate linear dependence (ALD) condition is used to enable the robot to request human teaching actively. In order to alleviate the uncertainty and suboptimal of human teaching, the optimization based behaviour generation method is integrated into the generation module of teaching samples, so that human only needs to provide sparse guidance points to the robot. Based on the function of the striated border area in the basal ganglia of human brain, the internal high-level circuit of the robot is constructed to enable it to carry out subconscious learning like human brain, and further improve the learning efficiency of the robot. The dissertation studies includes the traditional path planning based generation and the autonomous incremental learning of robot behaviour. Actual outdoor and indoor mobile robot platform and various simulation environments are built to verify the performance of the proposed methods in this dissertation. The purpose of this dissertation is to make robots have the ability that adapting to the new environment like human beings which has solve the limitations of traditional robot behaviour generation and learning methods. But in order to make robots reach the same level of human behaviour and intelligence, there are still many problems need to be solved in the future.
Language中文
Contribution Rank1
Document Type学位论文
Identifierhttp://ir.sia.cn/handle/173321/27157
Collection机器人学研究室
Affiliation中国科学院沈阳自动化研究所
Recommended Citation
GB/T 7714
胡艳明. 移动机器人自主增量式导航行为学习[D]. 沈阳. 中国科学院沈阳自动化研究所,2020.
Files in This Item:
File Name/Size DocType Version Access License
移动机器人自主增量式导航行为学习.pdf(8482KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[胡艳明]'s Articles
Baidu academic
Similar articles in Baidu academic
[胡艳明]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[胡艳明]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.