SIA OpenIR  > 机器人学研究室
室外场景下图像视频恢复和目标跟踪算法研究
Alternative TitleImage/Video Restoration and Object Tracking in Outdoor Environments
任卫红
Department机器人学研究室
Thesis Advisor唐延东
Keyword稀疏先验 双反射模型 人群密度 目标跟踪 卷积神经网络
Pages114页
Degree Discipline模式识别与智能系统
Degree Name博士
2019-11-20
Degree Grantor中国科学院沈阳自动化研究所
Place of Conferral沈阳
Abstract本论文从图像生成机制和坏天气形成机理出发,结合深度学习理论,对复杂光照、坏天气和拥挤场景进行了深入的分析和研究,同时根据不同的应用背景,提出了相应的计算机视觉算法。本文的研究内容主要包括两个方面:图像/视频恢复和目标跟踪。对于图像/视频恢复算法,论文着重对图像去模糊算法、高光去除算法和雨雪去除算法进行研究;对于目标跟踪算法,则致力于解决拥挤场景下单目标和多目标踪的问题。图像/视频恢复算法的研究是对坏天气和复杂场景的理解,目标跟踪算法研究是在对环境理解的基础上进行的扩展和应用。本文创新点如下:(1)基于非局部相似性和l0 稀疏特性的图像盲复原算法。传统的盲复原方法大都使用图像的梯度稀疏先验进行建模,并且取得了较大的成功,但是它们对噪声比较敏感。为了抑制噪声,本文提出了同时使用图像的非局部相似性、低秩特性和梯度的稀疏先验对模糊图像进行联合建模。低秩特性的应用可以有效地抑制图像噪声,而联合使用非局部相似性和稀疏先验可以提高模糊核估计的准确度,并且可以恢复出图像细节。此外,论文同时提出了一种数值优化的方法对所提出的模型进行求解。大量在合成数据和真实数据上的试验验证了文章所提出算法的有效性。(2)基于彩色线约束的镜面反射分离算法。根据双色反射模型,先前的算法通常都使用局部扩散的方式进行镜面反射分离。由于缺乏全局信息,这些方法不能够完全去除图像中的镜面反射成份,同时它们也会破坏图像的纹理。本文从双色反射模型出发,推导出了全局的彩色线性约束,它能够准确的恢复出一幅图像的镜面反射和漫反射成份。我们发现在正则化的RGB 颜色空间,图像中的每一个像素都位于三维空间的一条直线上,而这些不同的直线代表了不同的漫反射色度。巧合的是这些直线正好相交于一点,而这一点是光源色度。对分布于同一直线上的所有像素来说,它们散落在整幅图像上,且它们到光源的距离能够准确地反映其含有镜面反射成份的多少。通过使用全局的漫反射信息,本文提出的方法可以并行处理逐个像素,并且可以完全达到实时性的要求。与其它算法相比,论文中的方法在处理合成图像和自然图像方面均具有优势。(3)基于低秩矩阵分解的视频雨雪去除算法。对于雨雪较大的场景或者动态场景,目前的雨雪去除方法经常会失效。其中一个原因是因为它们假设雨雪在场景中是稀疏的,此外它们也不能够准确地辨别出运动物体和雨雪。为了解决上述问题,本文提出了一种基于矩阵分解的视频雨雪去除方法。首先,文章将雨雪分为两部分:稀疏雨雪(可见的)和稠密雨雪(不可见的)。通过背景波动和光流信息,运动物体和稀疏雨雪的检测可以被建模为多标签的马尔可夫随机场问题。对于稠密雨雪,它们被假设为高斯分布。背景上的所有雨雪包括稀疏的和稠密的都能够通过背景的低秩表达来去除。此外,我们提出了一种组稀疏的滤波方式来滤除运动物体上的雨雪。实验证明,在处理较大雨雪场景方面,本文所提出的方法比其它算法更有优越性。(4)基于人群密度融合的拥挤场景下目标跟踪算法目前,视觉跟踪算法在稀疏场景下已经取得了很大的进展,但是拥挤场景仍然是一个挑战,因为它存在严重的遮挡、较高的人流密度和剧烈的光照变化。为了解决上述问题,本文首先设计了一种稀疏的核卷积滤波方法(S-KCF)来抑制由遮挡、光照变化和干扰目标造成的目标响应噪声。然后,使用卷积神经网络将稀疏的S-KCF 响应图和人群密度图相互融合,得到矫正后的最终目标响应图。根据矫正后的响应图,就可以估计目标位置,并对S-KCF 进行更新。为了训练融合的卷积神经网络,文章使用了两步训练策略来逐步优化网络参数。第一步是使用成批的训练图片获得一个初始模型,然后第二步是以在线的方式逐帧地继续训练初始模型。论文所提出的人群密度融合框架可以显著地提高拥挤场景下目标跟踪性能,并且具有很强的扩展性。(5)基于人群密度图和网络最小流的多目标跟踪算法传统的多目标跟踪算法都是基于先检测后跟踪的框架,也就是将检测器输出的结果逐帧地进行关联以获取跟踪轨迹。但是在拥挤场景下,由于存在严重的遮挡和高密度人群,检测算法也常常会失效。为了解决上述问题,本文提出了一个新的多目标跟踪算法框架即通过计数来跟踪,这个框架对拥挤人群目标跟踪尤其有效。基于人群密度图, 本文将计数,检测和跟踪建模在同一个网络图上,通过优化所建立的网络图,就可以同时获得在整个视频序列上全局最优的检测和跟踪结果;而传统的基于检测的多目标跟踪算法往往忽略人群密度的影响,在拥挤场景下难免会产生漂移。此外,基于检测的跟踪器都是使用两步的跟踪策略(检测和关联),因此也只能够得到次优解。论文中算法的性能已经在多个不同类型的数据集上得到了验证。
Other AbstractBased on image imaging mechanism and deep learning theory, we deeply analyzed complex illuminations, bad weathers and crowd scenes in computer vision, and then proposed different algorithms to solve the problems mentioned above. In summary, this thesis includes two main contents: image/video restoration and object tracking. For image/video restoration, we solve the problem of image deblurring, highlight removal and video snow/rain removal. As for object tracking, we aim to solve single-object and multi-object tracking in crowd scenes. The research of image/video recovery algorithm is to process image/video taken in complex illuminations and bad weathers, while object tracking algorithm is an application based on the environment understanding. The contributions of this thesis are as follows: (1) Blind Deconvolution With Nonlocal Similarity and l0 Sparsity for Noisy Image. The blind image deconvolution techniques with sparsity prior in gradient domain are sensitive to noise, even a small amount of noise. To address this problem, we propose a novel blind deconvolution model that combines low-rank property, nonlocal similarity, and l0 sparsity prior. Low-rank property makes the proposed deblurring model robust to image noise. The joint utilization of nonlocal similarity and l0 sparsity prior improves the accuracy of blur kernel estimation and restores the fine image details. A numerical method is also given to solve the proposed problem. Experimental results on synthetic and real data show that our algorithm performs better against with the state-of-the-art methods for both noise and noise-free images. (2) Specular Reflection Separation With Color-Lines Constraint. According to dichromatic reflection model, the previous methods of specular reflection separation in image processing often separate specular reflection from a single image using patch-based priors. Due to lack of global information, these methods often cannot completely separate the specular component of an image and are incline to degrade image textures. In this thesis, we derive a global colorlines constraint from dichromatic reflection model to effectively recover specular and diffuse reflection. Our key observation is from that each image pixel lies along a color line in normalized RGB space and the different color lines representing distinct diffuse chromaticities intersect at one point, namely, the illumination chromaticity. For pixels along the same color line, they spread over the entire image and their distances to the illumination chromaticity reflect the amount of specular reflection components. With global (non-local) information from these color lines, our method can effectively separate specular and diffuse reflection components in a pixel-wise way for a single image, and it is suitable for realtime applications. Our experimental results on synthetic and real images show that our method performs better than the state-of-the-art methods to separate specular reflection. (3) Video Desnowing and Deraining Based on Matrix Decomposition. The existing snow/rain removal methods often fail for heavy snow/rain and dynamic scene. One reason for the failure is due to the assumption that all the snowflakes/rain streaks are sparse in snow/rain scenes. The other is that the existing methods often can not differentiate moving objects and snowflakes/rain streaks. In this thesis, we propose a model based on matrix decomposition for video desnowing and deraining to solve the problems mentioned above. We divide snowflakes/rain streaks into two categories: sparse ones and dense ones. With background fluctuations and optical flow information, the detection of moving objects and sparse snowflakes/rain streaks is formulated as a multi-label Markov Random Fields (MRFs). As for dense snowflakes/rain streaks, they are considered to obey Gaussian distribution. The snowflakes/rain streaks, including sparse ones and dense ones, in scene backgrounds are removed by low-rank representation of the backgrounds. Meanwhile, a group sparsity term in our model is designed to filter snow/rain pixels within the moving objects. Experimental results show that our proposed model performs better than the state-of-the-art methods for snow and rain removal.(4) Fusing Crowd Density Maps and Visual Object Trackers for People Tracking in Crowd Scenes. While visual tracking has been greatly improved over the recent years, people tracking in crowd scenes still remain particularly challenging due to heavy occlusions, high crowd density, and significant appearance variation. To address these challenges, we first design a Sparse Kernelized Correlation Filter (S-KCF) to suppress target response variations caused by occlusions and illumination changes, and spurious responses due to similar distractor objects. We then propose a people tracking framework that fuses the S-KCF response map with an estimated crowd density map using a convolutional neural network (CNN), yielding a refined response map. To train the fusion CNN, we propose a two-stage strategy to gradually optimize the parameters. The first stage is to train a preliminary model in batch mode with image patches selected around the targets, and the second stage is to fine-tune the preliminary model using the real frame-by-frame tracking process. Our density fusion framework can significantly improve people tracking in crowd scenes, and can also be combined with other trackers to improve the tracking performance. We validate our framework on two crowd video datasets. (5) Tracking-by-Counting: Using Network Flows on Crowd Density Maps for Tracking Multiple Targets. State-of-the-art multi-object tracking (MOT) methods follow the tracking by-detection paradigm, where object trajectories are obtained by associating per frame outputs of object detectors. In crowded scenes, however, detectors often fail to obtain accurate detections due to heavy occlusions and high crowd density. In this paper, we propose a new MOT paradigm, tracking-by-counting, tailored for crowded scenes. Using crowd density maps, we jointly model detection, counting,
Language中文
Contribution Rank1
Document Type学位论文
Identifierhttp://ir.sia.cn/handle/173321/25933
Collection机器人学研究室
Recommended Citation
GB/T 7714
任卫红. 室外场景下图像视频恢复和目标跟踪算法研究[D]. 沈阳. 中国科学院沈阳自动化研究所,2019.
Files in This Item:
File Name/Size DocType Version Access License
室外场景下图像视频恢复和目标跟踪算法研究(7516KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[任卫红]'s Articles
Baidu academic
Similar articles in Baidu academic
[任卫红]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[任卫红]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.