中国科学院沈阳自动化研究所机构知识库
Advanced  
SIA OpenIR  > 信息服务与智能控制技术研究室  > 期刊论文
题名: 针对多聚类中心大数据集的加速K-means聚类算法
其他题名: Accelerate K-means for multi-center clustering of big datasets
作者: 张顺龙; 库涛; 周浩
作者部门: 信息服务与智能控制技术研究室
关键词: DIACK ; 加速K-means ; 聚类 ; 三角定理
刊名: 计算机应用研究
ISSN号: 1001-3695
出版日期: 2016
卷号: 33, 期号:2, 页码:413-416
收录类别: CSCD
产权排序: 1
项目资助者: 国家科技支持计划资助项目(2012BAH15F05) ; 吉林省科技型中小企业技术创新基金资助项目(12C26212201399) ; 国家自然科学基金资助项目(612033161,51205389)
摘要: 随着数据量、数据维度成指数发展以及实际应用中聚类中心个数的增多,传统的K-means聚类算法已经不能满足实际应用中的时间和内存要求。针对该问题提出了一种基于动态类中心调整和Elkan三角判定思想的加速K-means聚类算法。试验结果证明,当数据规模达到10万条,聚类个数达到20个以上时,本算法相比Elkan算法具有更快的收敛速度和更低的内存开销。
英文摘要: The k-means algorithm is the most popular cluster algorithm. but for big dataset clustering with many clusters. it will take a lot of time to find all the clusters. This paper proposed a new acceleration method based on the thought of dynamical and immediate adjustment of the center K-means with triangle inequality. The triangle inequality is used to avoid redundant distance computations; But unlike Elkan’s algorithm. the centers are divided into outer-centers and inner-centers for each data point in the first place. and only the tracks of the lower bounds to inner-centers are kept; On the other hand. by adjusting the data points cluster by cluster and updating the cluster center immediately right after finishing each cluster’s adjustment. the number of iteration is effectively reduced. The experiment results show that our algorithm runs much faster than Elkan’s algorithm with much less memory consumption when the cluster center number is larger than 20 and the dataset records number is greater than 10 million. and the speedup becomes better when the k increases.
语种: 中文
Citation statistics:
内容类型: 期刊论文
URI标识: http://ir.sia.cn/handle/173321/17319
Appears in Collections:信息服务与智能控制技术研究室_期刊论文

Files in This Item: Download All
File Name/ File Size Content Type Version Access License
针对多聚类中心大数据集的加速K_means聚类算法.pdf(340KB)期刊论文作者接受稿开放获取View Download

Recommended Citation:
张顺龙,库涛,周浩. 针对多聚类中心大数据集的加速K-means聚类算法[J]. 计算机应用研究,2016,33(2):413-416.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[张顺龙]'s Articles
[库涛]'s Articles
[周浩]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[张顺龙]‘s Articles
[库涛]‘s Articles
[周浩]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
文件名: 针对多聚类中心大数据集的加速K_means聚类算法.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2016  中国科学院沈阳自动化研究所 - Feedback
Powered by CSpace