SIA OpenIR  > 光电信息技术研究室
自动驾驶场景中的三维目标检测方法研究
Alternative TitleResearch on 3D Object Detection for Autonomous Driving
李培玄
Department光电信息技术研究室
Thesis Advisor赵怀慈
Keyword自动驾驶 单目三维目标检测 双目三维目标检测 多模态融合感知三维目标检测 可微几何约束
Pages106页
Degree Discipline模式识别与智能系统
Degree Name博士
2021-05-24
Degree Grantor中国科学院沈阳自动化研究所
Place of Conferral沈阳
Abstract本文以自动驾驶场景中的三维目标检测为研究背景,以摄影几何和深度学习为基础,建立图像与三维世界的映射关系,预测场景中的三维目标信息。本文的主要贡献如下:(1)基于对偶二次曲面的单目三维目标检测方法。针对仅使用深度学习方法的弱空间感知能力,以及三维物体一次表征方式中与图像映射关系的不确定性,提出将物体定义为一个更加紧凑的表示形式,即在三维场景中表示为二次曲面(椭球),在图像中表示为二次曲线(椭圆),其可以提供曲面到曲线的强几何约束。进一步提出将此约束关系表达为对偶空间的非线性优化问题,其因此只需在现有的二维检测网络中增加两个额外的视觉感知分支,就可以恢复稳定并且准确的目标三维参数。实验结果表明物体的二次表征形式可以有效的解决投影关系的不确定性,从而可以提高最终检测的准确性,同时提出的方法也进一步扩展了基于学习方法的几何感知能力。(2)基于关键点的实时单目三维目标检测方法。针对二维包围盒只能提供给三维目标检测四个弱的几何约束项,以及基于锚框的两阶段检测器较为耗时的问题。提出预测图像空间中三维包围盒的九个投影关键点,然后利用三维透视投影的逆几何约束关系来优化物体的三维大小、位置和方向。该方法在关键点估计噪声较大的情况下,也能稳定地预测目标的参数。为了预测关键点,提出一种不需要锚框的单阶段检测器,因此能以较小的网络结构获得较快的检测速度。实验表明该方法可以在没有任何额外的实例分割标签、三维模型先验和深度图的帮助下也能准确检测出三维物体。该方法也是第一个用于单目图像三维检测的实时系统(FPS>24),同时在现有的基准测试中实现了最先进的性能。(3)基于可微几何嵌入和半监督训练方法的单目三维目标检测方法。针对学习方法的强语义感知能力和几何方法的强空间计算能力。提出结合深度神经网络和几何约束来协同估计外观和空间相关的信息。该方法设计了一个一阶段完全卷积模型来预测目标对象的关键点、尺寸和方向,并将这些与透视投影几何约束相结合来计算位置属性。另外将透视投影几何约束重新设计为可微的版本,并将其嵌入到网络学习中,以减少运行时间,同时以端到端的训练方式保证模型输出的一致性。此外,提出了一种有效的半监督训练策略,用于在训练标签数据较为稀缺时使用。实验表明该方法在效率和精度上同时取得了巨大进步。并且仅利用有标记的数据即可达到以往大多数完全监督方法的性能。(4)基于四维特征一致性嵌入空间的实时双目三维目标检测方法。针对双目检测中伪激光点云法对深度预测器的强依赖性造成的效率和精度上的不足,设计了一个新的四维特征一致性嵌入空间作为三维场景的中间表示。其不需要深度图作为监督,仅通过探索双目图像对多尺度特征一致性来编码物体的轮廓信息。三维检测网络可以直接在该空间进行,并使最终三维参数的误差梯度传递回紧邻图像输入的模型参数,因此可以以端到端的方式进行又快又准的三维目标检测。实验表明该方法是第一个实时的双目三维目标检测系统(FPS>24),同时与目前最先进的方法相比,其平均精度提高了。(5)基于单目、双目相机和激光雷达点云融合感知的三维目标检测方法。针对点云特征提取感受野与图像的二维感受野不对齐,以及由于图像信息的过早引入,使得激光雷达与图像的数据增强难以同步应用的问题。通过将三维集合特征提取扩展为二维集合特征提取,统一多模态数据的感受野。对于第二个问题,设计了一个新的两阶段三维检测框架,其增加两个辅助网络用来分开学习图像特征和点云特征。实验表明该方法可以有效融合多模态数据信息,从而利用各传感器的优点达到全场景感知能力。
Other AbstractIn this paper, 3D object detection in automatic driving scene is taken as the research background. Based on photographic geometry and deep learning, the mapping relationship between image and 3D world is established, and the 3D object information in scene is predicted. The main contributions of this paper are as follows: (1) Monocular 3D object detection using dual quadric for autonomous driving. Aiming at the weak spatial perception ability of using depth learning method only and the uncertainty of the relationship between the primary representation of 3D objects and the image mapping, this paper proposes that the object is defined as a more compact representation, that is, it is represented as a quadric surface (ellipsoid) in the 3D scene and a quadratic curve (ellipse) in the image. It can provide a more powerful geometric approximation of the surface to curve Bundle. The constraint relationship is further expressed as a nonlinear optimization problem in dual space. Therefore, only three additional sensing branches are added to the existing two-dimensional detection network, and the stable and accurate 3D parameters of the target can be restored. The experimental results show that the secondary representation of the object can effectively solve the uncertainty of the projection relationship, thus improving the accuracy of the final detection, and further extending the geometric perception ability based on learning method. (2) Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving. For the two-dimensional bounding box can only provide four weak geometric constraints for 3D target detection, it has been a time-consuming problem for two-stage detector based on anchor frame. The paper proposes 9 perspective key points to predict the 3D bounding box in image space, and then uses the inverse geometric constraint relationship of 3D perspective to restore the 3D size, position and direction of the object. The method can also predict the parameters of the target stably when the noise is high. In order to predict the key points, a single-stage detector without anchor frame is proposed, so it can get a fast detection speed with a small network structure. The experiment shows that the method can detect 3D objects accurately without any additional examples of segmentation labels, 3D model priori and depth maps. The method is also the first real-time system for 3D detection of single image (FPS > 24), and the most advanced performance is realized in the existing benchmark test. (3) Monocular 3D Detection with Geometric Constraints Embedding and Semi- supervised Training. The strong semantic perception ability of learning method and strong spatial computing ability of geometric method are discussed. The paper proposes a combination of depth neural network and geometric constraints to estimate the appearance and spatial information. A complete convolution model is designed to predict the key points, dimensions and directions of the object, and the position attributes are calculated by combining these with perspective geometric constraints. In addition, the geometric constraints of perspective projection are re expressed as micro version, and embedded in network learning to reduce the running time, and the end-to-end training mode ensures the consistency of model output. In addition, an effective semi supervised training strategy is proposed to be used when the training label data is scarce. The experiment shows that the method has made great progress in efficiency and accuracy. And only 13% of the data is used to achieve the most complete supervision methods. (4) Real-Time Stereo 3D Detection From 4D Feature-Consistency Embedding Space for Autonomous Driving. For the strong semantic perception ability of learning methods and the strong spatial computing ability of geometric methods. The depth neural network and geometric constraints are combined to estimate the appearance and spatial information. In this method, a one-stage complete convolution model is designed to predict the key points, dimensions and directions of the target object, and these are combined with the perspective projection geometric constraints to calculate the position attributes. In addition, the perspective projection geometric constraint is redesigned as a differentiable version and embedded into the network learning to reduce the running time, and the consistency of model output is ensured by end-to-end training. In addition, an effective semi supervised training strategy is proposed to be used when the training tag data is scarce. Experiments show that this method has made great progress in both efficiency and accuracy. Only 13% of the labeled data can achieve the performance of most of the previous methods. (5) 3D Object Detection from Monocular, Stereo and Point Cloud for Autonomous Driving. The two-dimensional sensing field is not aligned with the two-dimensional sensing field of the image, and the problem of simultaneous application of lidar and image data enhancement is difficult to be applied synchronously due to the premature introduction of image information. By extending the 3D set extraction to 2D collection extraction, the sense field of multimodal data is unified. For the second problem, a new two-stage three-dimensional detection framework is designed, which adds two auxiliary networks to separate the learning image features and point cloud features. The experiment shows that the method can effectively fuse multimodal data information, and then make use of the advantages of each sensor to achieve the whole scene perception.
Language中文
Contribution Rank1
Document Type学位论文
Identifierhttp://ir.sia.cn/handle/173321/28995
Collection光电信息技术研究室
Affiliation中国科学院沈阳自动化研究所
Recommended Citation
GB/T 7714
李培玄. 自动驾驶场景中的三维目标检测方法研究[D]. 沈阳. 中国科学院沈阳自动化研究所,2021.
Files in This Item:
File Name/Size DocType Version Access License
自动驾驶场景中的三维目标检测方法研究.p(5715KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[李培玄]'s Articles
Baidu academic
Similar articles in Baidu academic
[李培玄]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[李培玄]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.