SIA OpenIR  > 数字工厂研究室
高维数据分析中的密度聚类算法的研究
其他题名Research on Density Clustering Algorithm in High-dimensional Data Analysis
张涛1,2
导师刘昶
分类号TP311.13
关键词聚类 高维数据 密度 核心点 自适应
索取号TP311.13/Z34/2018
页数73页
学位专业控制工程
学位名称硕士
2018-05-17
学位授予单位中国科学院沈阳自动化研究所
学位授予地点沈阳
作者部门数字工厂研究室
摘要针对目前聚类算法不能有效的处理模糊边界点的问题,提出了一种基于真实核心点的RDBSCAN (Real-density-Based Spatial Clustering of Applications with Noise)聚类算法。提出真实核心点的概念,首先在密度聚类过程中的核心点进一步处理分类,把影响聚类效果的伪核心点剔除,将剩下的真实核心点根据密度可达原则进行聚类;然后提出密度合并判定定理:相同类簇内点的真实密度远大于不同类簇的点,以此为指导判断真实核心点的真实密度,使类簇内各点的相似性更大。通过人工数据集与UCI数据集聚类实验看出,RDBSCAN算法降低了模糊边界点的干扰,而且出现了若干新颖的类簇分类,在密度不规则的数据集中聚类更加准确。针对目前聚类算法参数过多、相互干涉的问题,提出了一种无参数的密度聚类算法ACBD (automatic clustering based on density)。首先设计了一种适用于高维数据的无参数的密度计算方式,根据数据集的规模与特点计算所有点到其最近点的距离的平均值,以该值为参数来科学有效的计算每个点的密度,很好地诠释了高维数据的密度情况;其次给出一种新的自适应邻域定义,根据数据自动确定邻域半径;最后提出邻域搜索聚类方法:从决策图中选择若干密度中心,依次以密度中心为起点进行邻域内核心点搜索,直到邻域内没有核心点。通过人工数据集与UCI数据集聚类实验看出,ACBD算法无需人工设置和测试参数且聚类准确率较高,最终在手写数字识别和人脸识别等高维数据中也有很高的聚类准确率,不失为一种有效简单使用的聚类算法。
其他摘要Aiming at the problem that the current clustering algorithm cannot effectively deal with fuzzy boundary points, an RDBSCAN (Real-density-Based Spatial Clustering of Applications with Noise) clustering algorithm based on real core points is proposed. The concept of a real core point is proposed. First, the core points in the process of density clustering are further processed for classification, the pseudo core points affecting the clustering effect are eliminated, and the remaining real core points are clustered according to the density reachability principle; Density Merging Decision Theorem: The true density of the points within the same cluster is much larger than that of different clusters, and the true density of the real core points is judged by this guidance, and the similarities of the points in the cluster are greater. According to the clustering experiments with experimental dataset and UCI dataset, the RDBSCAN algorithm reduces the interference of fuzzy boundary points, and several novel cluster classifications emerge. Clustering is more accurate in density-independent datasets. To solve the problem of too many parameters and mutual interference among clustering algorithms, a density-based parameter-free clustering ACBD algorithm (automatic clustering based on density) is proposed. A parameter-free density calculation method is proposed, the average value of the distance from all points to its nearest point is calculated according to the scale and characteristics of the data set. The value is used as a parameter to scientifically and efficiently calculate the density of each point; the new adaptive neighborhood definition automatically determines the radius of the neighborhood according to the data. Finally, a neighborhood search clustering method is proposed: several density centers are selected from the decision graph, and the core points in the neighborhood are searched in order from the density center until the neighborhood is all searched with no core point left. Through artificial data sets and UCI data set clustering experiments, it can be seen that ACBD algorithm does not require manual setting and testing of parameters and the clustering accuracy is outstanding. It also has high clustering accuracy in handwritten digit recognition and face recognition. Above all, ACBD can be regarded as a kind of effective simple and practical clustering algorithm.
语种中文
产权排序1
文献类型学位论文
条目标识符http://ir.sia.cn/handle/173321/21797
专题数字工厂研究室
作者单位1.中国科学院沈阳自动化研究所
2.中国科学院大学
推荐引用方式
GB/T 7714
张涛. 高维数据分析中的密度聚类算法的研究[D]. 沈阳. 中国科学院沈阳自动化研究所,2018.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
高维数据分析中的密度聚类算法的研究.pd(1481KB)学位论文 开放获取CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[张涛]的文章
百度学术
百度学术中相似的文章
[张涛]的文章
必应学术
必应学术中相似的文章
[张涛]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。