SIA OpenIR  > 光电信息技术研究室
黎曼流形在图像分类识别中的应用研究
Alternative TitleResearch on the Application of Riemannian Manifolds for Image Classification and Recognition
刘天赐1,2
Department光电信息技术研究室
Thesis Advisor史泽林
Keyword黎曼流形 深度学习 几何优化 图像分类与识别
Pages120页
Degree Discipline模式识别与智能系统
Degree Name博士
2019-05-14
Degree Grantor中国科学院沈阳自动化研究所
Place of Conferral沈阳
Abstract本文以无人飞行器的自动目标识别为研究背景,以非欧几何与黎曼流形为理论基础,并结合深度学习方法,开展黎曼流形在图像分类与识别中的应用研究,对智能学习框架下的多层特征空间进行流形建模,该模型不仅具有更好的普适性,从识别效果上看,对新一代自动目标识别方法的识别准确性与识别效率均有显著性提升。论文对基于黎曼流形的几何目标识别方法和基于黎曼流形的深度学习方法展开研究,取得的相关研究内容和成果包括:(1)基于SL(3; R)群的图像几何变换可视化方法。图像的几何变换是影响计算机视觉中目标跟踪和识别任务有效性的关键因素。虽然掌握图像几何变换的过程具有重要意义,但几何变换的数值关系无法直接从图像本身中揭示出来。针对SL(3; R) 群的黎曼对数映射没有解析表达式这一问题,提出新的方法来求解其上的测地线距离;提出了一种流形可视化方法,将抽象的图像几何变换具体可视化为三维空间中的变化轨迹,使得目标的几何变换过程更加易于理解。(2)基于Grassmann流形核稀疏表征的聚类方法。将稀疏子空间聚类模型拓展至Grassmann流形,综合核函数方法和稀疏编码的优点,找到一种合适的映射核函数,在利用数据的自表达性和黎曼流形的几何特性的同时,学习到Grassmann流形的核稀疏表示;提出了一种实用且高效的聚类算(GKSSC),实验表明,该算法不仅具有良好的鲁棒性,在实时性、准确性和其他聚类指标上均优于现存的方法。(3)基于Grassmann流形的几何降维方法。由于很多视觉数据往往具有流形结构,但这些流形时常具有很高的维数。之前的现存方法都是通过切空间、核函数映射等方法,先将黎曼流形映射到欧氏空间上,再在欧氏空间中进行降维。本论文提出一种基于流形几何性质的几何降维方法,直接从高维流形到低维流形进行降维,在降维的同时还能学习到具有更强分类能力的流形数据,进而提高分类准确率。该算法框架适用于任何的流形测度,而并不只是局限于某一种特定的距离。实验表明,该方法在大大减少计算量的同时,对分类效果也有显著性提升。(4)基于Grassmann流形的几何深度网络。现如今大多数深度学习网络都是以数据的欧氏结构为前提的,而流形学习方法却是以数据的几何结构入手,以黎曼流形理论为基础的一种机器学习方法。本论文构建深度学习框架下多层特征空间的流形模型,设计了一种面向非欧数据的深度学习算法,以Grassmann流形作为网络的输入数据,提出一种深度图像集识别网络。同时在模型训练过程中,使用基于矩阵链式法则的反向传播算法来更新模型,并将权值的优化过程转换为Grassmann流形上的优化问题。在动态人脸表情识别中的实验结果表明,所提出的网络与现有的先进算法相比,不仅在结果上识别准确率得到了提高,同时在训练和测试速度上也有一个数量级的提升。
Other AbstractIn this dissertation, the automatic target recognition of UAV is taken as the research background. Based on the non-Euclidean geometry and Riemannian manifold, combining with the deep learning methods, this dissertation studies the application of Riemannian manifold in the image classification and recognition. The hierarchical feature space of multiple layers is used for the modeling of manifolds. The model not only has better universality, but also has a significant improvement on the recognition accuracy and recognition efficiency for the intelligent target recognition methods. The dissertation researches the geometric methods for the target recognition based on Riemannian manifolds and the deep learning methodology combined with Riemannian manifolds. The relevant research contents and achievements include as follows: (1) The visualization method of the image geometric transformation based on the SL(3; R) group. The geometric transformation of images is a key factor affecting the effectiveness of target tracking and recognition tasks in computer vision. Although it is of great significance to grasp the process of image geometric transformation, the numerical relationship of the geometric transformation cannot be directly revealed from the images themselves. As there does not exist the analytical expression for the Riemann logarithmic mapping of the SL(3; R) group, a novel method is proposed to approximate the geodesic distance on the manifold. The framework of the manifold visualization is proposed to visualize the abstract image geometric transformation into the three-dimensional space. The visualized trajectory makes it easier to comprehend the geometric transformation process of targets. (2) The clustering method based on the kernel sparse representation of the Grassmann manifold. We extends the sparse subspace clustering model to the Grassmann manifold while combining the advantages of kernel methods with the sparse coding. We find a suitable mapping kernel function to learn the kernel sparse representation of Grassmann manifold with the self-expression property of data and the geometric characteristics of Riemannian manifolds. Furthremore, a practical and efficient clustering algorithm (GKSSC) is proposed. The experiments show that the presented algorithm is not only robust, but also outperforms the existing methods in real-time capability, accuracy and other clustering measurements. (3) Geometric dimensionality reduction method based on the Grassmann manifold. Since many visual data always own the manifold structure, these manifolds are notoriously with high dimensionality. Previously, the existing methods map the Riemannian manifolds to the Euclidean space through the tangent space or the kernel methods, and then reduce the dimensionality of data in the Euclidean space. In this disseration, we propose a geometric method for dimensionality reduction based on the propertys of Riemannian geometry, which can directly reduce the dimensionality from the high-dimensional Grassmann manifold to the low-dimensional Grassmannian. When the dimensionality reduction is achieved, manifold data with stronger classification ability can be learned to improve the classification accuracy. The algorithm framework is suit able for any metric on the Grassmann manifold rather than being limited to some certain one. Experiments demonstrate that our approach leads to a significant accuracy gain over the classification accuracy while greatly reducing the complexity of computation. (4) Geometric deep network based on the Grassmann manifold. Existing deep networks are almost designed based on the precondition that the visual data reside on the Euclidean space data. However, manifold learning is one of the machine learning methods which considers the geometric structure of data based on the theory of Riemannian manifolds. This thesis constructs a manifold model of multi-level feature space under the deep learning framework, and designs a deep learning algorithm for non-Euclidean data. This disseration devises a deep network based on the non-Euclidean structure of the manifold-valued data, which combines the differential geometry and deep learning methods theoretically. We propose a deep network for image-set recognition based on the Grassmann manifold. In the training process, the model is updated by the use of the backpropagation algorithm derived from the matrix chain rule. Learning of the weights can be transformed as the optimization problem on the Grassmannian. Experiments on acted facial expressions demonstrate that our method not only gets a gain on the accuracy, but also accelerates the training and test process in one magnitude.
Language中文
Contribution Rank1
Document Type学位论文
Identifierhttp://ir.sia.cn/handle/173321/25150
Collection光电信息技术研究室
Affiliation1.中国科学院沈阳自动化研究所
2.中国科学院大学
Recommended Citation
GB/T 7714
刘天赐. 黎曼流形在图像分类识别中的应用研究[D]. 沈阳. 中国科学院沈阳自动化研究所,2019.
Files in This Item:
File Name/Size DocType Version Access License
黎曼流形在图像分类识别中的应用研究.pd(7339KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[刘天赐]'s Articles
Baidu academic
Similar articles in Baidu academic
[刘天赐]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[刘天赐]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.