SIA OpenIR  > 数字工厂研究室
面向自然语言理解的图像语义分析方法研究
Alternative TitleResearch on Semantic Analysis Method of Image based on Natural Language Understanding
温亚1,2
Department数字工厂研究室
Thesis Advisor南琳
ClassificationTP391.41
Keyword图像描述 计算机视觉 自然语言理解 深度学习 多模态学习
Call NumberTP391.41/W59/2017
Pages59页
Degree Discipline机械制造及其自动化
Degree Name硕士
2017-05-24
Degree Grantor中国科学院沈阳自动化研究所
Place of Conferral沈阳
Abstract自动生成图像描述连接了计算机视觉和自然语言处理两个领域,一直以来,都是图像理解、人工智能的长远目标。它不仅需要更深层的理解图像语义,还需要合理的生成自然语言来表达。近些年来,随着计算能力的提升、数据资源的丰富、深度学习的发展,该任务已经取得了巨大的进步,但仍然面临着许多未解决的问题和挑战。 本文全面研究了自动生成图像描述的相关问题,首先,说明了视觉和语言两个领域的相关技术,如深度学习、语言理解、多模态学习等。其次,详细的介绍了解决该任务的极具代表性的方法。再者,在基线模型的基础上,我们从两个不同的角度,对模型做了改进:第一,开发了一个深度双向门限循环单元图像描述模型,试图在解码阶段,全面挖掘文本描述更深层次的语义;第二,我们提出了双向引导图像描述生成模型,在图像编码阶段,加入文本信息引导图像过滤。在文本解码阶段,加入图像属性信息引导语言生成,使得模型能够更全面挖掘图像和文本的关键信息,削弱信息转换的不平衡影响。 最后,在公共评测集MSCOCO上,评估了改进的模型的性能,本文提出的方法无论使用通用的评价指标BLEU、METEOR等,还是使用其他人工评价指标,都比目前已有的相关工作有着较为显著的提高,有力验证了模型的有效性。
Other AbstractAutomatically generate image descriptions that connect both computer vision and natural language processing. It has always been the long-term goal of image understanding and artificial intelligence. It not only requires a deeper understanding of the image semantics, but also need to generate a natural language to express. In recent years, with the improvement of computing power, the richness of data resources, the development of deep learning, the task has made great progress, but there are still many unresolved problems and challenges. This paper comprehensively studies the related problems of automatic generation of image description. Firstly, it explains the related technologies of visual and language fields, such as deep learning, language comprehension and multimodal learning. Secondly, The classic method of solving the task is described in detail. Furthermore, on the basis of the baseline model, we have improved the model from a different perspective. First, a deep bi-directional Gated Recurrent Unit image description model is developed, which tries to dig deeper into the language of the description language in the decoding phase. Second, we propose a bi-directional guided image description generation model, In the image coding phase, the text information is added to guide the image filtering. In the text decoding stage, adding the image attribute information to guide the language generation, so that the model can more fully excavate the key information of image and text, weaken the impact of information conversion imbalance. Finally, in the public evaluation set MSCOCO, the performance of the improved model is evaluated. The method proposed in this paper has a significant improvement with the existing work, whether using the common evaluation index BLEU, METEOR, or using other artificial evaluation indexes. So the test results validate the effectiveness of the model.
Language中文
Contribution Rank1
Document Type学位论文
Identifierhttp://ir.sia.cn/handle/173321/20526
Collection数字工厂研究室
Affiliation1.中国科学院沈阳自动化研究所
2.中国科学院大学
Recommended Citation
GB/T 7714
温亚. 面向自然语言理解的图像语义分析方法研究[D]. 沈阳. 中国科学院沈阳自动化研究所,2017.
Files in This Item:
File Name/Size DocType Version Access License
面向自然语言理解的图像语义分析方法研究.(2121KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[温亚]'s Articles
Baidu academic
Similar articles in Baidu academic
[温亚]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[温亚]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.