SIA OpenIR  > 机器人学研究室
多源数据异常检测及应用研究
Alternative TitleMulti-Source Data Anomaly Detection Method and Application Research
侯冬冬
Department机器人学研究室
Thesis Advisor徐晓伟 ; 丛杨
Keyword异常检测 工业数据 时序数据 视频数据 多视图数据
Pages92页
Degree Discipline模式识别与智能系统
Degree Name博士
2021-05-19
Degree Grantor中国科学院沈阳自动化研究所
Place of Conferral沈阳
Abstract异常检测是从现有数据集中找到与预期模式不一致的对象或事件。作为数据挖掘的一个重要应用,异常检测旨在预判和检测生产系统中的异常事件,维护系统的正常运行。异常检测在工业自动化、金融欺诈、网络安全以及社会突发事件检测等领域有着广泛的应用。近年来,随着传感技术、数据预处理技术、物联网等技术的发展和应用,异常检测应用场景越来越多样化,如智能化工园区、智能家居系统、自动化生产线,收集到的数据种类越来越多、数据结构越来越复杂。现阶段应用系统的数据存在如下几个显著特点:数据维度高、系统状态不稳定、多数据源。基于上述数据特点,本文针对离散数据、时序数据、视频数据等多种常见数据类型,基于传统与深度学习结合的机器学习方法,对多源数据的异常检测及应用展开研究。本文的主要研究内容及成果包括以下几个方面:(1) 基于l_2,0范数约束的高维数据异常检测。本文采用基于稀疏重构的异常检测模型解决高维数据的异常检测问题。现有基于稀疏重构的异常检测模型,采用l_2,1范数约束实现字典选择、样本重构及异常检测。但是基于l_2,1范数约束的字典选择存在字典尺度控制难、字典内部信息易丢失的问题。针对上述问题,本文提出了基于l_2,0范数约束的高维数据异常检测模型。首先利用堆叠自动编码器(Stacked Autoencoder)对原始高维数据进行降维,再基于l_2,0范数约束进行字典选择及样本稀疏重构,最后将稀疏重构误差作为样本异常程度的评估指标。本文在大量真实数据及合成数据集上进行了对比实验,实验结果验证了基于l_2,0范数约束的字典选择模型可以准确控制字典尺度,且基于l_2,0范数约束的高维数据异常检测结果优于现有主流模型。(2)基于注意力机制引导的视频异常目标检测。如何根据工业场景的监控视频有效检测出异常目标是工业机器人系统的重要应用问题。工业机器人应用场景中,机器人工作空间固定,移动物体易与机器人发生碰撞,因此本文将该场景中的移动物体定义为异常目标。为防止机器人在运行状态下与移动目标发生碰撞,本章利用高质量的成像设备收集场景中的视频信息,并采用光流估计模型计算视频相邻帧中对应像素的相对位移,实现移动目标的实时检测。但是,大多数现有光流估计模型仅基于相邻帧的信息估计像素的相对位移,忽略了视频数据的时序性且不考虑像素的语义信息,导致光流估计结果中移动目标的上下文信息及目标整体性缺失。针对以上问题,本文提出了基于注意力机制引导的光流估计模型。该模型由前景目标估计子网络及光流估子网络构成。前景目标估计子网络估计移动目标的空间位置信息,并用前景目标估计结果强化光流估计子网络中异常目标的特征表达。实验结果表明,提出的基于注意力机制引导的光流估计模型可以用较小的模型获得准确的光流估计结果。基于上述模型,基于Kinect2.0采集的固定场景图像及深度信息,本文实现了固定场景中移动目标的实时三维点云重建。(3)基于随机置换的时序数据异常检测模型。现有工业时序数据异常检测为保证异常检测的快速性,多采用传统的异常检测方法对时序数据进行异常检测。在真实工业场景中,由于工业进程精准控制难,系统易发生状态随机切换。传统的异常检测方法在时序数据出现新的稳定状态时缺乏该状态的先验信息,易将该状态误识别为异常样本点。针对这一问题,本文提出了基于随机置换的时序数据异常检测方法。首先,将数据进行多次随机置换,降低数据的时序依赖。然后,在每一次随机置换的数据上采用滑窗处理计算测试样本的全局差异性。最终,将多次随机置换后获得的全局差异性的均值作为样本异常程度评价指标。本文提出的模型在真实工业时序数据及合成时序数据上进行了对比实验。实验结果验证了本文提出的基于随机置换的时序数据异常检测模型可以实现时序数据中异常样本的有效检测。(4)基于深度编码器的多源数据快速异常检测。随着数据预处理、物联网技术的发展及推广应用,同一系统可以用多传感器获得的大规模多源异构数据对同一系统进行描述。多源数据可以表示为多视图数据,其中单一视图用于表示一个数据源获得的信息。因为多源异构数据用于描述同一个系统,因此多视图数据存在潜在的一致性分布,同时不同视图数据也存在差异性分布表示不同数据源的差异。为了快速有效实现大规模多源异构数据的异常检测,本文提出了基于深度编码器的多源异构数据快速异常检测模型。本文将多视图数据映射到多模态共享空间,将多视图数据分解表示为多模态共享矩阵、多模态差异性矩阵以及单视图数据重构误差矩阵。其中多模态的差异性矩阵作为视图间的异常程度的评估指标,单视图数据重构误差作为单视图内数据异常程度的评估指标。提出的模型综合视图间及单视图的异常程度,实现异常数据的检测。本文提出的模型在常规多视图数据集、大规模多视图数据集上进行了对比实验。实验结果验证了提出的模型受数据规模、视图数目影响小,其异常检测结果优于现有多视图异常检测模型。
Other AbstractAnomaly detection aims to identify the patterns in data that do not conform to expected behavior. As an important application of data mining, the goal of anomaly detection in real world applications is detecting the anomaly events of the system and maintaining the operation of the system. Anomaly detection is widely applied in industrial automation, financial fraud, internet security, and social emergency detection. With the development of sensor technology, data processing technology, and the Internet of Things (IoT), the data collected from real systems become more and more complicated. There are several main characteristics of the complicated dataset collected from real system: high-dimensional, uncontrollable, and multi-source. Based on data characteristics mentioned before, we combine traditional methods and deep learning methods for anomaly detection of discrete data, sequential data, video, and multi-source data. Specifically, the main work of this dissertation includes the following aspects: (1) l_2,0 norm based anomaly detection for high-dimension data. In this paper, we utilize spare reconstruction to solve the anomaly detection problem for high-dimension data. Sparse representation is widely utilized in anomaly detection, which can significantly decrease the computational complexity. However, most of the current sparse representation based methods use the l_2,1 norm term as a constraint in dictionary selection and instance reconstruction, which could not control the scale of the dictionary accurately and destroy the information of the dictionary. Therefore, we propose an l_2,0 norm based anomaly detection for high-dimension data. More specifically, we first utilize a stacked autoencoder to calculate the compressed feature of the original data. Then, we execute dictionary selection, data sparse reconstruction with the l_2,0 norm constraint. Finally, the sparse reconstruction error is utilized as a criterion for anomaly data. The comparison experiments are executed on the real and synthetic anomaly detection datasets. The experimental results verify that l_2,0 norm based dictionary selection method can accurately control the scale of the dictionary and achieve a more sparse dictionary. The experiments also demonstrate that our proposed model is superior to the state-of-the-art models in anomaly detection. (2) Attention network guided video anomaly detection. Detecting anomaly objects from surveillance video is an important problem of industrial robot application. The industrial robot is fixed on the functional platform in the application. The moving objects are likely to collide with robots, therefore the moving objects are defined as anomaly objects in robot application scenarios. Based on high-quality surveillance video, we here utilize optical flow estimation method to capture the relative motion of the objects in video. However, most of the current optical flow estimation methods neglect sequential information of objects and do not consider the semantic information of pixels in the current image. Thus, anomaly objects are missing and blur in partial frames. To solve the former problems, we propose an attention network guided optical flow estimation model, which is constructed with a foreground estimation subnetwork and an optical flow estimation subnetwork. Because foreground objects usually have larger relative motivation than other regions with the observer, we here assume the foreground objects share spatial information with the abnormal objects. In this work, we first calculate the foreground mask with the foreground estimation subnetwork. Then, the foreground mask is utilized to highlight feature expression of the moving objects in optical flow estimation subnetwork. The experimental results verify that our proposed optical flow estimation model can achieve accurate results with a simplified network structure with the help of the attention guidance. Based on the proposed framework, we realize the 3D point cloud reconstruction of moving targets in fixed scenes with the color and depth information collected with the Kinect2.0. (3) Random shuffle based sequential data anomaly detection. In application to guarantee the timeliness of the anomaly detection, most of the systems utilize traditional anomaly detection methods. However, precise control of industrial processes is difficult, that the real system usually has random state switching. Since the traditional anomaly detection methods do not have enough prior knowledge of the new state, they would recognize the instances from new states as outliers. To solve this problem, we proposed a random shuffle based sequential data anomaly detection model. Specifically, we first randomly shuffle the original sequential data multiple times to reduce the temporal dependence of sequential data. Then a sliding window is used to process each shuffled data and calculate the global difference of each instance. Finally, the average value of the global difference obtained for multiple shuffled data is used as criterion. The proposed model is compared with the state-of-the-art models on real and synthesis datasets. The experimental results verify that our proposed model can decrease the temporal dependence of the sequential data and achieve better results than comparisons. (4) Fast anomaly detection of multi-source heterogeneous data based on deep encoder. With the development of the data processing and interesting of things (IoT), a system can be described with a large-scale multi-source heterogeneous dataset obtained from multiple sensors. Multi-source heterogeneous data can be represented as multi-view data, where each view is utilized to represents the data collected from a sensor. Since multi-view data is collected from a same system, the data in multiple views has consistency of potential distributions and has different data distribution in multiple views, which could uncover the difference among multiple views. To fast handle the anomaly detection problem in large-scale multi-view datasets, we introduce the deep encoder into our model. More specifically, we first select a small subset from the original dataset and project them into a shared feature space, and decompose the original data as a multi-view sharing matrix, a view-specific discriminative matrix, and a view-specific reconstruction error matrix. The view-specific discriminative matrix can describe the difference among multiple views, and the view-specific reconstruction error matrix represents the noise in the current view. Then, the view-specific discriminative matrix and the view-specific reconstruction error matrix are connected as abnormal codes to evaluate the anomaly of instances. Finally, a deep encoder is introduced into our model to speed up the anomaly detection for rest instances of original dataset. The comparison experiments are executed on common and large-scale multi-view datasets. The experimental results verify that our proposed model is superior to the state-of-the-art methods, and isn’t affect by the number of data view and scale of dataset.
Language中文
Contribution Rank1
Document Type学位论文
Identifierhttp://ir.sia.cn/handle/173321/29002
Collection机器人学研究室
Affiliation中国科学院沈阳自动化研究所
Recommended Citation
GB/T 7714
侯冬冬. 多源数据异常检测及应用研究[D]. 沈阳. 中国科学院沈阳自动化研究所,2021.
Files in This Item:
File Name/Size DocType Version Access License
多源数据异常检测及应用研究.pdf(5504KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[侯冬冬]'s Articles
Baidu academic
Similar articles in Baidu academic
[侯冬冬]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[侯冬冬]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.