SIA OpenIR  > 工业控制网络与系统研究室
面向动作捕捉和定位的视觉惯性融合技术研究
Alternative TitleResearch on Visual-Inertial Based Motion Capturing and Localization
张吟龙
Department工业控制网络与系统研究室
Thesis Advisor谈金东
Keyword单目视觉 惯性测量单元(简称imu) 人体动作捕捉 机动车定位 相机和imu标定
Pages108页
Degree Discipline控制理论与控制工程
Degree Name博士
2018-11-27
Degree Grantor中国科学院沈阳自动化研究所
Place of Conferral沈阳
Abstract

在动作捕捉和定位系统中,视觉传感器和惯性传感器是极为重要的感知单元。而且二者在估计精度和响应速度方面具有互补优势。其中,视觉传感器通过采集周边环境的图像序列,提取并跟踪图像中的角点、边缘等信息来估计自身的位姿,具有采集信息丰富、姿态估计精度高、漂移少的特点;而惯性传感器能够采集自身的加速度、角速度和磁力强度信息来估计自身的姿态,具有采样率高、动态性好、尺度可观的特点。而且随着MEMS技术的发展,视觉模块和惯性模块体积越来越小、成本越来越低,二者的融合在动作捕捉和定位领域将有巨大的应用前景。然而,面向动作捕捉和定位的视觉惯性融合技术还存在如下问题:(1) 现有的视觉惯性融合姿态估计方法,视觉部分相邻图像间的误匹配点移除不够彻底,导致姿态估计不够准确;(2) 现有的视觉惯性融合人体动作捕捉方法,没有充分利用本体自身的关节间距离约束信息,无法实现准确、可靠的运动跟踪;(3) 现有的视觉惯性融合定位和场景动态性分析方法,无法有效区分出图像中的静态场景和动态场景,进而无法给出鲁棒、准确的定位和动态性分析结果;(4) 现有的视觉模块和惯性模块的标定方法,图像和惯性测量值间的时间戳对齐精度不够,导致视觉坐标系和惯性坐标系间相对姿态估计不够准确,使得后续动作捕捉和定位功能大打折扣。本文针对室内场景下的人体动作捕捉和室外场景下的无人车定位这两类典型应用,深入研究视觉惯性融合的姿态估计方法、视觉惯性融合的人体动作捕捉方法、视觉惯性融合的定位和场景动态性分析方法。为了验证本文所提方法的可行性和有效性,在搭建的视觉惯性融合平台进行了实验验证。取得如下创新性成果:(1) 面向人体动作捕捉的视觉惯性融合姿态估计研究。针对视觉部分误匹配点移除不够彻底而且耗时的问题,基于最大后验估计模型,提出了惯性引导视觉采样一致性分析的方法。创新性地采用惯性传感器的测量值作为载体的初始姿态来协助视觉部分移除误匹配点。针对惯性姿态估计中存在的长期漂移问题,基于反馈补偿机制,提出了视觉惯性姿态估计值修正惯性测量值的方法。最后,通过物理实验结果与亚毫米精度的Vicon/OptiTrack动作捕捉系统标准数据的比较,验证了所提方法的有效性。(2) 面向人体动作捕捉的视觉惯性融合手臂运动跟踪研究。针对现有视觉惯性动作捕捉方法存在误差累积、视觉模型对运动的刻画不够准确的问题,基于视觉光度学误差模型,提出了一种具有近邻特征约束的视觉惯性融合方法。该方法能够确定载体的运动模式是一般性运动还是退化性运动。针对一般性运动,利用外极线约束模型;针对退化性运动,采用单应性矩阵模型,来得到载体的初始位姿,再通过迭代方法优化位姿结果。此外,针对关节间的姿态估计漂移问题,基于运动力学链式结构,提出了具有关节间距离约束的动作捕捉模型。通过手臂动作捕捉实验,并与Vicon/OptiTrack动作捕捉系统标准数据作对比,验证了所提方法的有效性。(3) 面向机动车定位和场景动态性分析的视觉惯性融合研究。针对室外机动车感知系统中,视觉模块无法有效区分出视野内场景的动态/静态区域问题,基于非完整性约束模型,提出了视觉惯性相对熵度量方法来判断图像中静态目标区域,提取区域内的静态特征点,来估计车辆自身的运动姿态。此外,针对车辆区域存在自遮挡、特征不明显的问题,基于可变形部分模型(简称DPM模型),提出了前向、侧向、后向视角的梯度直方图描述方法,来分析周围场景的动态性。实验结果验证了面向场景动态性分析的视觉惯性融合方法的准确性和有效性。(4) 视觉惯性单元的标定与实验平台。针对现有的视觉惯性单元模块标定方法耗时、数据对齐精度低的问题,基于李代数模型,提出了四元数插值的标定方法。在此基础上,搭建了视觉惯性实验平台,该平台由单目摄像头、惯性测量单元、高精度OptiTrack/Vicon动作捕捉系统、GPS组成。该平台分别验证了动作捕捉方法和机动车定位方法在计算速度和位姿估计精度方面的性能,支撑了本文提出的视觉惯性融合方法的研究工作。

Other Abstract

In the motion capturing and localization system, the visual sensor and inertial sensor are two of the most important sensing units. They have complementary properties in terms of estimation accuracy and sampling frequencies. The visual sensor collects the sequence of images and tracks the features (e.g., the corners, edges) over the frames to estimate the pose. It features in rich information, high accuracy and low drifts. By comparison, the inertial sensor collects body accelerations, angular velocity and magnetic strength to recover the attitudes. It features in high sampling frequency, quick response and scale observable. As the MEMS techniques are blossoming, the camera and inertial unit are getting smaller and lower costs, which could offer the enormous prospects for visual-inertial fusions in the fields of motion capturing and localization. However, these two applications still pose several challenges to the visual-inertial fusion techniques, which are listed as follows: (1) For the state-of-the-art in visual inertial based attitude estimates, the outliers among the putative correspondences between the successive frames captured by the camera, sometimes, are not utterly removed, which will cause the erroneous effects on pose estimations in the long term. (2) For the state-of-the-art in indoor motion capturing using visual-inertial measurements, the data processing requires large amount of time and there exist long-term error accumulations. It still remains unrealistic to achieve the real-time and effective indoor motion tracking. (3) In the area of visual-inertial based outdoor localization and surrounding environment dynamic analysis, this paper proposes a method that is able to separate both the static scenes as well as the dynamic scenes, which could be potentially applied to autonomous driving perception system. Even in the scenario that moving objects take up a majority area of the captured image, which occasionally occurs during rush hours in downtown areas, our method is still capable of robustly analyzing the vehicular scenes by distinguishing the static features (which lie on stationary objects) and dynamic features (which lie on moving objects). The theoretical framework of our approach consists of three steps. Firstly, the vehicle non-holonomic constraint is applied to pairwise feature matching. The histogram bins consistent with inertial quaternions are selected as static inliers (correct feature matches for static scenes). Secondly, the part-based vehicle detection model is developed to segment the vehicular regions. These regions are matched between image-pair using majority voting strategy. Within these regions, the dynamic inliers (correct feature matches for moving vehicles) are extracted. Eventually, both the ego vehicle dynamics and vehicular scene dynamics within the field of view are analyzed by using static inliers and dynamic inliers respectively. The experiments were performed on the challenging datasets, which were collected at rush hours in downtown areas and interstate highways. The experimental results prove that our approach is able to robustly and accurately perform the vehicular scene dynamic analysis. (4) This part of the search focuses on the visual-inertial system platform design and registration. The camera and imu modules are rigidly fixed and the measurements from them are simultaneously collected and stored in the PC. The registration between the camera frame and inertial frame has been performed. a quaternion based interpolation mechanism is designed to solve the time-consuming issues and misalignment problems. The images related to the calibration board are collected with various viewpoints. The poses between the frames are calculated. Meanwhile, the filter based inertial pose estimation is conducted. Then the transformation between the inertial frame and visual frame is completed in the Lie Group framework, which is able to convert the constraint mathematical issues into the non-constraint Lie-algebra space. The experimental results have validated the accuracy of the proposed method.

Language中文
Contribution Rank1
Document Type学位论文
Identifierhttp://ir.sia.cn/handle/173321/23641
Collection工业控制网络与系统研究室
Recommended Citation
GB/T 7714
张吟龙. 面向动作捕捉和定位的视觉惯性融合技术研究[D]. 沈阳. 中国科学院沈阳自动化研究所,2018.
Files in This Item:
File Name/Size DocType Version Access License
面向动作捕捉和定位的视觉惯性融合技术研究(5758KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[张吟龙]'s Articles
Baidu academic
Similar articles in Baidu academic
[张吟龙]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[张吟龙]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.