SIA OpenIR  > 其他
Alternative TitleResearch on Human Action Detection Method Based on Deep Learning
Thesis Advisor王宏玉
Keyword人体行为检测 深度学习 时序行为候选 实时检测 人-物交互
Degree Discipline机械电子工程
Degree Name硕士
Degree Grantor中国科学院沈阳自动化研究所
Place of Conferral沈阳
Abstract近年来,随着安防监控系统的普及,使得基于人体行为检测技术的智能监控系统在人机交互、智慧城市以及养老监护等领域有着广阔的应用前景。同时随着互联网技术的发展和智能终端的普及,网络平台上涌现出大量的有关人体行为的短视频,自动获取并分析其中的语义信息成为亟待解决的问题。鉴于人体行为检测任务巨大的应用前景和经济价值,使其迅速成为计算机视觉社区的研究热点。人体行为检测任务的传统方法需要根据特定行为手工设计特征,工作量巨大且泛化能力不好,因此本文利用深度学习技术对人体行为检测任务展开研究。由于不同的应用场景对行为检测算法的要求也不同,因此本文探讨了三种不同应用场景下的人体行为检测方法。一是基于离线条件下的人体行为检测方法,二是基于实时条件下的人体行为检测方法,三是基于空间交互的人体行为检测方法。本课题的主要工作总结如下:(1)针对离线条件下的人体行为检测方法中的候选生成阶段,提出了一种基于边界敏感网络的时序行为候选生成算法。本文将更深层的卷积神经网络引入到边界敏感网络中的时序评估模块和候选评估模块,能够更好地提取视频的时序特征,并在后处理阶段非极大值抑制算法中设计了新的置信度分数衰减策略,最终得到了高质量的行为候选集合。所提算法在公开的人体行为数据集ActivityNet上取得了75.71%的平均召回率。(2)针对养老监护场景,构建了包含跌倒、下蹲、行走三类行为的数据集,并基于该数据集设计了实时人体行为检测模型MonitorNet网络。该模型首先实时地从视频流中采样16帧图片,然后利用2D卷积网络提取它们的外观特征,最后将这16个特征图按时间顺序送入3D卷积网络提取时序特征并进行行为预测。所提模型在构建的数据集上取得了97.1%的平均准确率,并能够以每秒25帧的速度在带有GeForce GTX 1060显卡的笔记本上实时检测人体行为。(3)基于Faster R-CNN算法以及以实例为中心的注意力网络提出了一种人-物交互行为检测网络。首先利用Faster R-CNN检测出输入图像中的人和物体,两两组合得到若干个人-物对,然后利用交互行为判别网络识别每个人-物对之间的交互行为。本文将3*3卷积引入到以实例为中心的注意力网络,来更好地综合上下文的信息,同时优化了交叉熵损失函数和训练细节。最终所提模型在大型数据集V-COCO上实现了47.06%的平均准确率。此外,本文收集了一些实际监控场景下的数据,所提模型在该数据集上实现了94.98%的平均准确率以及69.85%的平均召回率。
Other AbstractIn recent years, with the popularization of security monitoring systems, intelligent monitoring systems based on human action detection technology have broad application prospects in the fields of human-computer interaction, smart cities, and elderly care. At the same time, with the development of Internet technology and the popularization of intelligent terminals, a large number of short videos about human actions have emerged on the network platform, and the automatic acquisition and analysis of semantic information has become an urgent problem to be solved. In view of the huge application prospect and economic value of human action detection task, it has quickly become a research hotspot in the computer vision community. The traditional method of human action detection task needs to manually design features according to specific actions, and the workload is huge and the generalization ability is not good, so this paper uses deep learning technology to study human action detection task. Because different application scenarios have different requirements for action detection algorithms, this paper discusses the human action detection methods in three different application scenarios. One is the human action detection method based on offline condition, the second is the human action detection method based on real-time condition, and the third is the human action detection method based on spatial interaction. The main work of this topic is summarized as follows: (1) Aiming at the proposal generation stage of the human action detection method under offline condition, a temporal action proposal generation algorithm based on boundary sensitive network is proposed. In this paper, a deeper convolutional neural network is introduced into the temporal evaluation module and proposal evaluation module in the boundary sensitive network, which can better extract the temporal features of the video. At the same time, in the post-processing stage, a new confidence score attenuation strategy is introduced into the NMS algorithm, and finally a high-quality action proposal set is obtained. The proposed algorithm achieved an average recall rate of 75.71% on the public human action dataset ActivityNet. (2) Aiming at the elderly monitoring scenario, this paper constructs a dataset that includes three types of actions: falling down, squats and walking, and a real-time human action detection model MonitorNet network is designed based on this dataset. The model first samples 16 frames from the video stream in real time, then uses a 2D convolutional network to extract their appearance features, and finally sends these 16 feature maps to the 3D convolutional network in time sequence to extract temporal feature and conduct action prediction. The proposed model achieves an average accuracy of 97.1% on the constructed dataset and is able to detect human action in real time on a notebook with a GeForce GTX 1060 graphics card at 25 frames per second. (3) Based on the Faster R-CNN algorithm and the instance-centric attention network, a human-object interaction action detection network is proposed. First, people and objects in the input image are detected by Faster R-CNN, and some human-object pairs are combined in pairs, and then the interaction action discrimination network is used to identify the interaction action between each human-object pair. In this paper, 3*3 convolution is introduced into the instance-centric attention network to better synthesize the context information, and at the same time optimize the cross-entropy loss function and training details. Finally, the proposed model achieves an average accuracy of 47.06% on the large dataset V-COCO. In addition, this paper collects some data in actual monitoring scenarios, and the proposed model achieves an average accuracy rate of 94.98% and an average recall rate of 69.85% on this dataset.
Contribution Rank1
Document Type学位论文
Recommended Citation
GB/T 7714
高松. 基于深度学习的人体行为检测方法研究[D]. 沈阳. 中国科学院沈阳自动化研究所,2020.
Files in This Item:
File Name/Size DocType Version Access License
基于深度学习的人体行为检测方法研究.pd(4768KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[高松]'s Articles
Baidu academic
Similar articles in Baidu academic
[高松]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[高松]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.