SIA OpenIR  > 工业控制网络与系统研究室
改进集成学习算法在电商推荐中的研究与应用
其他题名Research and application of improved ensemble learning algorithm in e-commerce recommendation
孙靖哲1,2
导师石刚
分类号TP181
关键词集成学习 推荐系统 梯度提升决策树 特征工程
索取号TP181/S96/2018
页数61页
学位专业检测技术与自动化装置
学位名称硕士
2018-05-17
学位授予单位中国科学院沈阳自动化研究所
学位授予地点沈阳
作者部门工业控制网络与系统研究室
摘要

作为国内乃至全球最大的电子商务公司,阿里巴巴在2016年开放了一批天猫商城的真实用户行为的访问数据,希望通过解析用户历史中4个月的脱敏登录日志,发现用户的爱好,进而为用户推荐其喜好的商品。本文研究了如何在对用户和商品的具体信息缺乏了解的情况下,通过用户的历史行为和商品的属性,实现个性化的电商推荐。针对现有的用户行为数据过于稀疏,缺乏用户对商品评分等客观条件,协同过滤等经典算法的效果较差的问题。本文从数据分析、数据清洗、多维度特征工程、特征选择对数据进行了细致的处理,并针对以上问题提出了一种基于梯度提升决策树(GBDT)的集成学习方法,将电商推荐问题转化为预测用户是否会购买商品的二分类问题。首先针对GBDT易于过拟合且训练速度慢的问题,本文提出了解决方案并在其中主要做了三个工作:(1)提出了一种基于模型参数的复杂度的正则化方法,来降低boosting的过拟合的程度;(2)提出了一种基于牛顿法思想的优化方法,用该方法替代现有的梯度下降寻优方式,经过实验可以证明,该方法可以更快的收敛到最优解附近;(3)提出了一种基于分位点的近似算法,用该方法替代目前的精确算法,实验证明该方法在精确率上逼近了精确算法,并且比原始算法降低了一定的时间开销。最后根据实验表明,在数据集较大的情况下,使用改进的GBDT实现的推荐方法效果要明显好于传统的协同过滤方法和一些经典的集成学习算法。本文的研究成果为协同过滤算法和GBDT单模型在复杂数据中效率低、推荐效果不足的问题提供了新的思路和有效的方法,使得推荐效果有了较大提升。

其他摘要

As the largest e-commerce company in China and in the world, Alibaba opened a batch of access data of the real user behavior of Tmall mall in 2016. It hopes to find the user's hobby by analyzing the desensitization log of 4 months in the consumer's history, and then recommends its preferences for the user. This paper studies how to realize personalized e-commerce recommendation through the user's historical behavior and the property of the goods under the lack of understanding of the specific information of users and goods. In view of the fact that the existing user behavior data are too sparse and lack of objective conditions such as user ratings, collaborative filtering and other classical algorithms are not effective. This paper deals with data from data analysis, data cleaning, multi-dimensional Feature Engineering and feature selection, and puts forward an integrated learning method based on gradient lifting decision tree for the above problems, which transforms the recommendation problem into a problem that predicts the availability of the purchase of goods by the user. In order to solve the problem that GBDT is easy to fit and train slowly, this paper proposes a solution and makes three main tasks in it. (1) a regularization method based on model parameters is proposed to reduce the degree of overfitting of boosting. (2) an optimization method based on Newton method is proposed. This method is used to replace the existing gradient descent optimization method. Through experiments, it is proved that the method can converge to the optimal solution faster. (3) an approximate algorithm based on the loci is proposed, which is used to replace the present exact algorithm. The experiment shows that the method approximated the exact algorithm in the accuracy rate. Compared with the original algorithm, a certain amount of time cost is reduced. Finally, the experimental results show that the improved GBDT implementation is better than the traditional collaborative filtering method and some classical integrated learning algorithms. The research results of this paper provide new ideas and effective methods for the collaborative filtering algorithm and the GBDT single model in the complex data, which has low efficiency and lack of recommendation effect, which makes the recommendation effect a great improvement.

语种中文
产权排序1
文献类型学位论文
条目标识符http://ir.sia.cn/handle/173321/21769
专题工业控制网络与系统研究室
作者单位1.中国科学院沈阳自动化研究所
2.中国科学院大学
推荐引用方式
GB/T 7714
孙靖哲. 改进集成学习算法在电商推荐中的研究与应用[D]. 沈阳. 中国科学院沈阳自动化研究所,2018.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
改进集成学习算法在电商推荐中的研究与应用(4723KB)学位论文 开放获取CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[孙靖哲]的文章
百度学术
百度学术中相似的文章
[孙靖哲]的文章
必应学术
必应学术中相似的文章
[孙靖哲]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。