中国科学院沈阳自动化研究所机构知识库
Advanced  
SIA OpenIR  > 数字工厂研究室  > 学位论文
题名: 基于机器学习的生物医学数据处理方法研究
其他题名: Machine Learning Approaches to Biomedical Data Analysis
作者: 杨秀锋
导师: 彭慧
分类号: TP18
关键词: 机器学习 ; 生物医学数据分析 ; 核函数 ; 降维 ; 监督学习
索取号: TP18/Y27/2014
页码: 61页
学位专业: 控制工程
学位类别: 硕士
答辩日期: 2014-05-28
授予单位: 中国科学院沈阳自动化研究所
作者部门: 数字工厂研究室
中文摘要: 生物医学信息学的发展十分依赖其相关领域的发展。随着信息技术的快速发展,人们开始集中于考虑如何将先进的信息技术应用到生物医学信息的研究领域当中。目前,机器学习技术已经成为了数学信息学和计算机科学中的研究热点,而且也已经被成功地应用到了很多研究领域当中。本文的主要研究是发展合适的机器学习技术并将其应用到生物医学信息数据的分析当中。 数据降维技术是机器学习中的一个很重要的方面,其中流形学习已经得到了极大的关注。基于局部线性和全局非线性的假设,流形学习算法可以保持非线性数据的本质结构。然而,当考虑到分类任务时,传统的流形学习算法会面临很多的缺点:例如无监督问题、样本大小问题、样本外点问题和易受噪声影响的问题。自从Vapnik 在1995年提出了基于统计学习理论和核戏法的支持向量机算法,核方法的研究已经成为了机器学习中的一个热点。同时,支持向量机也被广泛的应用于图像处理、生物医学数据分析和文本分类当中。本文的研究主要集中于设计合适的支持向量机的核函数和解决流形学习等距特征映射算法中的无监督问题,已经如何将机器学习技术应用到生物医学数据分析任务中。 1. 基因表达数据分类与可视化需要解决高维的问题。传统的等距特征映算法不能应用于多个类簇的数据,降维后不能够产生从高维到低维的映射矩阵。文中利用近邻元分析方法取代多维尺度分析法,并且引入特征向量作为输入矩阵,提出一种以基因表达数据分类为目的的等距特征映射算法(NC-ISOMAP)。降维时获取理想的低维投影矩阵,降维后类间数据更加分开,类内数据更加紧凑。实验结果表明NC-ISOMAP在基因表达数据的可视化与分类任务中优于ISOMAP。 2. 核函数是一种非常重要的非线性映射方法,也是支持向量机算法能广泛应用的重要条件。核函数的研究可以极大的提升传统支持向量机算法性能。本文基于核函数的研究提出了一种多核核函数来提升支持向量机的泛华和学习能力,通过在生物医学数据集中的仿真实验表明提出的混合核函数有着比传统核函数更好的性能。
英文摘要: The development of computer science and information technology has greatly infacilitated the development of biomedical science. As the information science are developing so fast, recently researchers are focusing on considering how to apply information technology and mathematical science to the biomedial researches. Nowdays machine learning techniques have become the hot point of the information technology and mathematical science , which also have been successfully used to other related research areas. This dissertation is focusing on developing suitable machine learning algorithms and apply them into biomedical analysis. The thesis Mainly discuss two parts : research on developing Manifold learning algorithms in dimensionality reduction and Support Vector Machine in pattern classification and their application in medical data analysis. Dimensionality reduction is one important aspact of machine learning . Also, Manifold learning has been one of the hot point in recent years. Based on the property of local linearity and global non-linearity, manifold learning has been applied to many research fields such like , face recognition and bioinformation. Machine fold learning is a non-linear dimensionality reduction algorithm, which can explore and preserve the inherit structure of non-linear distributed data. However, when encountering the classification task, the original manifold learning methods generally show many shortcomings, such as ,unsupervised learning, sample size, out-of-sample and sensitivity to noise. Since Vapnik proposed the Support Vector Machine (SVM) based on Statistical Learning Theory and kernel trick in 1995, kernel methods based on machine learning algorithm has been developed rapidly. It becomes one of the hot points in academic research now and has been widely used in image processing, biomedical information analysis, and text classification. This thesis mainly focuses on designing suitable kernel functions for SVM and solving unsupervised learning problem for manifold learning in medical data analysis. 1. In order to improve the classification accuracy of gene expression data and solve the high-dimensional problem. This paper proposed an improved ISOMAP for gene expression data visualization and classification, which Neighborhood Component Analysis (NCA) is used to replace the multidimensional scaling analysis (MDS) in traditional ISOMAP algorithm. In the process of dimensionality reduction , NC-ISOMAP can obtain an ideal low dimensional project matrix, which lower dimensional dataset become more compact within class and more separate between class. The experiment results of several biomedical datasets demonstrate that the proposed algorithm has better performance in dimensionality reduction and higher classification accuracy than traditional ISOMAP . So the proposed method was proved adequately effective. 2. Kernel function as one of the most important ways of non-linear mapping, is the essential part of Support Vector Machines (SVM) with such wide application. An independent discipline called Kernel Methods has been formed especially for kernel functions. Research of kernel functions doesn’t only improve the usage of Support Vector Machines , but also gives support to Artificial Intelligence and Machine Learning themselves. Based on the research of kernel functions , multiple kernel function is proposed due to learning problems involving multiple and heterogeneous data sources. Choosing different parameters of different kernel functions or kernel functions according to different properties to improve learning ability and generalization of kernels, and prove the legitimacy of the new kernel.
语种: 中文
产权排序: 1
内容类型: 学位论文
URI标识: http://ir.sia.cn/handle/173321/14787
Appears in Collections:数字工厂研究室_学位论文

Files in This Item:
File Name/ File Size Content Type Version Access License
基于机器学习的生物医学数据处理方法研究.pdf(1573KB)----限制开放 联系获取全文

Recommended Citation:
杨秀锋.基于机器学习的生物医学数据处理方法研究.[硕士学位论文].中国科学院沈阳自动化研究所.2014
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[杨秀锋]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[杨秀锋]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2016  中国科学院沈阳自动化研究所 - Feedback
Powered by CSpace