SIA OpenIR  > 机器人学研究室
基于全局信息关联推理的行为理解与预测方法研究
Alternative TitleResearch on Action Understanding and Prediction Method based on Global Information Association Reasoning
陈博
Department机器人学研究室
Thesis Advisor何玉庆
Keyword行为理解 行为预测 图神经网络 知识图谱 关联推理
Pages113页
Degree Discipline模式识别与智能系统
Degree Name博士
2021-05-27
Degree Grantor中国科学院沈阳自动化研究所
Place of Conferral沈阳
Abstract视频行为分析作为一项借助计算机对视频数据中人当前行为进行自主识别以及对未来行为预测的技术,其在安防监控、人机交互、网络视频大数据分析和无人驾驶等领域中都有广泛而深入的应用价值。但由于视频行为属性不仅与人的运动模态相关,而且受他人、环境、交互物体等多层次信息约束,当前的算法在行为理解与预测的准确率与泛化能力上仍难以达到实际应用的标准。针对多层次关联信息融合与行为推理预测中的研究难点,本文以基于全局信息关联推理的行为理解与预测方法为题展开研究,探索适合于复杂情境的多元特征关联融合与推理的理解预测方法。主要研究内容包括:1)针对视频行为受光照、视角、行为习惯等影响,提取特征会存在较大差异性,有效时空特征提取困难,且易引入干扰大量干扰,难以学习不同行为的有效分类边界的问题。针对这些问题,本文提出了基于全局信息的分层行为理解方法,分别从姿态信息层、个体动作层和场景交互层提取对于行为理解最为关键的三个维度特征,并依据三者之间的关联与共现关系实现对于当前行为类别的推理。其中,在姿态信息层,提出了基于个人特征库在线训练的手势识别算法,以降低行为习惯差异对手势姿态识别的影响。在个体动作层,采用姿态估计算法去除背景干扰获取姿态运动信息,并借助时空图卷积神经网络完成对于肢体动作的分类。在场景交互层,通过场景理解模型获取交互场景语义信息,之后基于姿态、动作、交互场景信息之间的耦合映射关系修正彼此的模糊识别结果,并通过三个维度信息与行为类别之间的映射关系推理行为类别,从而降低无关信息干扰,提升行为理解的精度与泛化能力。2)针对时空多模态环境特征对行为演变影响机理复杂,难以有效预测复杂场景中动态行为的问题,提出时空关联推理网络模型,分别从空间和时间维度上探索环境-行为之间复杂的层次化、多元化作用关系,挖掘行为内在的驱动特点,提升动态行为预测的准确性和鲁棒性。具体的,在空间维度上,结合图模型构建分层注意力网络,将具有不同属性的各类环境特征,以点-边-图的形式逐级去差异化聚合,并通过位置-语义-时序等多重注意力机制进行相互关联,进而在学习过程中自动获取各环境特征对行为演变的层次化影响作用,实现对各级特征的优化组合,提升行为预测方法对复杂环境的适应能力。在时间维度上,基于行为之间的共线性和转换关系,建立概率转移矩阵,并结合时空扩散卷积网络对时序行为进行关联建模和协同推演,进而获得行为序列的内在驱动特征,为行为预测提供有效先验的同时,降低了潜在行为种类随时间延展持续发散的可能,增强行为预测结果的可靠性。3)针对时间维度上场景约束与行为演变因果关系复杂,行为预测结果随时间推移易产生较大迭代误差的问题,本文探索并挖掘了行为类别与轨迹预测的内在耦合关系,提出了基于隐式与显式耦合的两种行为类别与轨迹联合预测框架,将轨迹预测作为一种耦合约束,加入行为预测的框架,提升行为预测的结果。其中,基于隐式耦合的联合预测方法借助行为特征编码与解码过程中的中间层特征,通过行为轨迹的联合预测损失,帮助模型学习捕捉和利用行为类别与轨迹间的耦合约束关系,提升对长期行为的预测精度。另一方面,基于行为类别与轨迹在运动模态上的显式耦合关系,提出了基于运动模态关联的显式耦合联合预测模型,并搭建一种行为类别与轨迹的迭代预测框架,实现了行为类别与轨迹在场景约束与因果关联信息共同作用下的细粒度迭代推演,在增强了行为类别预测精度的同时减低了轨迹预测的误差。
Other AbstractVideo action analysis, as a technology of self identification of human current action and prediction of future action in video data, has a wide and in-depth application value in security monitoring, human-computer interaction, network video big data analysis and unmanned driving. However, as the video action attribute is not only related to the human motion mode, but also constrained by the multi-level information of others, environment, interactive objects, the current algorithm is still difficult to meet the practical application standards in the accuracy and generalization of action understanding and prediction. In view of the research difficulties in the multi-level association information fusion and action reasoning prediction, this paper studies the action understanding and prediction method based on Global Information Association reasoning, and explores the understanding and prediction methods suitable for the multi-level feature association fusion and reasoning in complex situations. The main research contents include: 1) Because of the influence of lighting, visual angle and action habits, the video action will have great differences. It is difficult to extract effective space-time features, and it is easy to introduce a lot of interference, and it is difficult to learn the effective classification boundary of different actions. In view of these problems, this paper proposes a hierarchical action understanding method based on global information. The three dimension features which are the most critical to action understanding are extracted from attitude information layer, individual action layer and scene interaction layer respectively, and the reasoning of current action category is realized based on the relationship and co occurrence relationship between the three. In the attitude information layer, a gesture recognition algorithm based on personal feature library online training is proposed to reduce the influence of actional habits on gesture recognition. In the individual action layer, the attitude estimation algorithm is used to remove background interference to obtain the attitude motion information, and the classification of limb movements is completed by using convolution neural network of time-space graph. In the scene interaction layer, the semantic information of the interaction scene is obtained through the scene understanding model, and then the fuzzy recognition results of each other are modified based on the coupling mapping relationship between attitude, action and interaction scene information. The action category is deduced through the mapping relationship between the three dimension information and the action category, thus reducing the interference of irrelevant information, Improve the accuracy and generalization ability of action understanding. 2) Aiming at the problem that the spatial-temporal multimodal environment features have complex influence on action evolution and is difficult to predict dynamic actions in complex scenes effectively, a spatial-temporal correlation reasoning network model is proposed, which explores the complex hierarchical and diversified relationship between environment and action from the spatial and temporal dimensions, and explores the driving characteristics of the action, The accuracy and robustness of dynamic action prediction are improved. Specifically, in the spatial dimension, the hierarchical attention network is constructed by combining the graph model, which will remove the diversity aggregation in the form of point edge graph, and then associate them with each other through multiple attention mechanisms such as location semantic time sequence, and then automatically obtain the hierarchical effect of environmental characteristics on action evolution in the learning process, The optimization combination of all levels of characteristics is realized, and the adaptability of action prediction method to complex environment is improved. In the time dimension, based on the collinearity and transformation relationship between actions, the probability transfer matrix is established, and the temporal and spatial diffusion convolution network is used to model and deduce the temporal action in a coordinated way, thus obtaining the intrinsic driving characteristics of the action sequence, providing an effective prior for action prediction, and reducing the possibility of the potential action types spreading continuously with time, The reliability of action prediction results is enhanced. 3) In view of the complex causal relationship between scene constraints and action evolution in time dimension, and the result of action prediction is prone to generate large iterative error over time, this paper explores and excavates the internal coupling relationship between action category and trajectory prediction, and proposes two action categories and trajectory joint prediction framework based on implicit and explicit coupling, and takes trajectory prediction as a coupling constraint, The framework of action prediction is added to improve the results of action prediction. Among them, the implicit coupling joint prediction method can help the model learn to capture and utilize the coupling constraint relationship between action categories and tracks, and improve the prediction accuracy of long-term action by using the middle layer features in the process of action feature coding and decoding, and through the joint prediction loss of action trajectory. On the other hand, based on the explicit coupling relationship between action category and trajectory in motion mode, an explicit coupling joint prediction model based on motion mode association is proposed. An iterative prediction framework of action category and trajectory is built, which realizes fine-grained iterative deduction of action category and trajectory under the action of scene constraint and causal correlation information, The accuracy of action category prediction is enhanced, and the error of trajectory prediction is reduced.
Language中文
Contribution Rank1
Document Type学位论文
Identifierhttp://ir.sia.cn/handle/173321/29011
Collection机器人学研究室
Affiliation中国科学院沈阳自动化研究所
Recommended Citation
GB/T 7714
陈博. 基于全局信息关联推理的行为理解与预测方法研究[D]. 沈阳. 中国科学院沈阳自动化研究所,2021.
Files in This Item:
File Name/Size DocType Version Access License
基于全局信息关联推理的行为理解与预测方法(8073KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[陈博]'s Articles
Baidu academic
Similar articles in Baidu academic
[陈博]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[陈博]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.