SIA OpenIR  > 数字工厂研究室
高维数据分析中的密度聚类算法的研究
Alternative TitleResearch on Density Clustering Algorithm in High-dimensional Data Analysis
张涛1,2
Department数字工厂研究室
Thesis Advisor刘昶
ClassificationTP311.13
Keyword聚类 高维数据 密度 核心点 自适应
Call NumberTP311.13/Z34/2018
Pages73页
Degree Discipline控制工程
Degree Name硕士
2018-05-17
Degree Grantor中国科学院沈阳自动化研究所
Place of Conferral沈阳
Abstract针对目前聚类算法不能有效的处理模糊边界点的问题,提出了一种基于真实核心点的RDBSCAN (Real-density-Based Spatial Clustering of Applications with Noise)聚类算法。提出真实核心点的概念,首先在密度聚类过程中的核心点进一步处理分类,把影响聚类效果的伪核心点剔除,将剩下的真实核心点根据密度可达原则进行聚类;然后提出密度合并判定定理:相同类簇内点的真实密度远大于不同类簇的点,以此为指导判断真实核心点的真实密度,使类簇内各点的相似性更大。通过人工数据集与UCI数据集聚类实验看出,RDBSCAN算法降低了模糊边界点的干扰,而且出现了若干新颖的类簇分类,在密度不规则的数据集中聚类更加准确。针对目前聚类算法参数过多、相互干涉的问题,提出了一种无参数的密度聚类算法ACBD (automatic clustering based on density)。首先设计了一种适用于高维数据的无参数的密度计算方式,根据数据集的规模与特点计算所有点到其最近点的距离的平均值,以该值为参数来科学有效的计算每个点的密度,很好地诠释了高维数据的密度情况;其次给出一种新的自适应邻域定义,根据数据自动确定邻域半径;最后提出邻域搜索聚类方法:从决策图中选择若干密度中心,依次以密度中心为起点进行邻域内核心点搜索,直到邻域内没有核心点。通过人工数据集与UCI数据集聚类实验看出,ACBD算法无需人工设置和测试参数且聚类准确率较高,最终在手写数字识别和人脸识别等高维数据中也有很高的聚类准确率,不失为一种有效简单使用的聚类算法。
Other AbstractAiming at the problem that the current clustering algorithm cannot effectively deal with fuzzy boundary points, an RDBSCAN (Real-density-Based Spatial Clustering of Applications with Noise) clustering algorithm based on real core points is proposed. The concept of a real core point is proposed. First, the core points in the process of density clustering are further processed for classification, the pseudo core points affecting the clustering effect are eliminated, and the remaining real core points are clustered according to the density reachability principle; Density Merging Decision Theorem: The true density of the points within the same cluster is much larger than that of different clusters, and the true density of the real core points is judged by this guidance, and the similarities of the points in the cluster are greater. According to the clustering experiments with experimental dataset and UCI dataset, the RDBSCAN algorithm reduces the interference of fuzzy boundary points, and several novel cluster classifications emerge. Clustering is more accurate in density-independent datasets. To solve the problem of too many parameters and mutual interference among clustering algorithms, a density-based parameter-free clustering ACBD algorithm (automatic clustering based on density) is proposed. A parameter-free density calculation method is proposed, the average value of the distance from all points to its nearest point is calculated according to the scale and characteristics of the data set. The value is used as a parameter to scientifically and efficiently calculate the density of each point; the new adaptive neighborhood definition automatically determines the radius of the neighborhood according to the data. Finally, a neighborhood search clustering method is proposed: several density centers are selected from the decision graph, and the core points in the neighborhood are searched in order from the density center until the neighborhood is all searched with no core point left. Through artificial data sets and UCI data set clustering experiments, it can be seen that ACBD algorithm does not require manual setting and testing of parameters and the clustering accuracy is outstanding. It also has high clustering accuracy in handwritten digit recognition and face recognition. Above all, ACBD can be regarded as a kind of effective simple and practical clustering algorithm.
Language中文
Contribution Rank1
Document Type学位论文
Identifierhttp://ir.sia.cn/handle/173321/21797
Collection数字工厂研究室
Affiliation1.中国科学院沈阳自动化研究所
2.中国科学院大学
Recommended Citation
GB/T 7714
张涛. 高维数据分析中的密度聚类算法的研究[D]. 沈阳. 中国科学院沈阳自动化研究所,2018.
Files in This Item:
File Name/Size DocType Version Access License
高维数据分析中的密度聚类算法的研究.pd(1481KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[张涛]'s Articles
Baidu academic
Similar articles in Baidu academic
[张涛]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[张涛]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.