SIA OpenIR  > 信息服务与智能控制技术研究室
针对多聚类中心大数据集的加速K-means聚类算法
Alternative TitleAccelerate K-means for multi-center clustering of big datasets
张顺龙; 库涛; 周浩
Department信息服务与智能控制技术研究室
Source Publication计算机应用研究
ISSN1001-3695
2016
Volume33Issue:2Pages:413-416
Indexed ByCSCD
CSCD IDCSCD:5629682
Contribution Rank1
Funding Organization国家科技支持计划资助项目(2012BAH15F05) ; 吉林省科技型中小企业技术创新基金资助项目(12C26212201399) ; 国家自然科学基金资助项目(612033161,51205389)
KeywordDiack 加速k-means 聚类 三角定理
Abstract随着数据量、数据维度成指数发展以及实际应用中聚类中心个数的增多,传统的K-means聚类算法已经不能满足实际应用中的时间和内存要求。针对该问题提出了一种基于动态类中心调整和Elkan三角判定思想的加速K-means聚类算法。试验结果证明,当数据规模达到10万条,聚类个数达到20个以上时,本算法相比Elkan算法具有更快的收敛速度和更低的内存开销。
Other AbstractThe k-means algorithm is the most popular cluster algorithm. but for big dataset clustering with many clusters. it will take a lot of time to find all the clusters. This paper proposed a new acceleration method based on the thought of dynamical and immediate adjustment of the center K-means with triangle inequality. The triangle inequality is used to avoid redundant distance computations; But unlike Elkan’s algorithm. the centers are divided into outer-centers and inner-centers for each data point in the first place. and only the tracks of the lower bounds to inner-centers are kept; On the other hand. by adjusting the data points cluster by cluster and updating the cluster center immediately right after finishing each cluster’s adjustment. the number of iteration is effectively reduced. The experiment results show that our algorithm runs much faster than Elkan’s algorithm with much less memory consumption when the cluster center number is larger than 20 and the dataset records number is greater than 10 million. and the speedup becomes better when the k increases.
Language中文
Citation statistics
Cited Times:7[CSCD]   [CSCD Record]
Document Type期刊论文
Identifierhttp://ir.sia.cn/handle/173321/17319
Collection信息服务与智能控制技术研究室
Corresponding Author张顺龙
Affiliation1.中国科学院沈阳自动化研究所
2.中国科学院大学
3.吉化集团吉林市软信技术有限公司
Recommended Citation
GB/T 7714
张顺龙,库涛,周浩. 针对多聚类中心大数据集的加速K-means聚类算法[J]. 计算机应用研究,2016,33(2):413-416.
APA 张顺龙,库涛,&周浩.(2016).针对多聚类中心大数据集的加速K-means聚类算法.计算机应用研究,33(2),413-416.
MLA 张顺龙,et al."针对多聚类中心大数据集的加速K-means聚类算法".计算机应用研究 33.2(2016):413-416.
Files in This Item: Download All
File Name/Size DocType Version Access License
针对多聚类中心大数据集的加速K_mean(340KB)期刊论文作者接受稿开放获取ODC PDDLView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[张顺龙]'s Articles
[库涛]'s Articles
[周浩]'s Articles
Baidu academic
Similar articles in Baidu academic
[张顺龙]'s Articles
[库涛]'s Articles
[周浩]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[张顺龙]'s Articles
[库涛]'s Articles
[周浩]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 针对多聚类中心大数据集的加速K_means聚类算法.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.