SIA OpenIR  > 工业控制网络与系统研究室
改进集成学习算法在电商推荐中的研究与应用
Alternative TitleResearch and application of improved ensemble learning algorithm in e-commerce recommendation
孙靖哲1,2
Department工业控制网络与系统研究室
Thesis Advisor石刚
ClassificationTP181
Keyword集成学习 推荐系统 梯度提升决策树 特征工程
Call NumberTP181/S96/2018
Pages61页
Degree Discipline检测技术与自动化装置
Degree Name硕士
2018-05-17
Degree Grantor中国科学院沈阳自动化研究所
Place of Conferral沈阳
Abstract

作为国内乃至全球最大的电子商务公司,阿里巴巴在2016年开放了一批天猫商城的真实用户行为的访问数据,希望通过解析用户历史中4个月的脱敏登录日志,发现用户的爱好,进而为用户推荐其喜好的商品。本文研究了如何在对用户和商品的具体信息缺乏了解的情况下,通过用户的历史行为和商品的属性,实现个性化的电商推荐。针对现有的用户行为数据过于稀疏,缺乏用户对商品评分等客观条件,协同过滤等经典算法的效果较差的问题。本文从数据分析、数据清洗、多维度特征工程、特征选择对数据进行了细致的处理,并针对以上问题提出了一种基于梯度提升决策树(GBDT)的集成学习方法,将电商推荐问题转化为预测用户是否会购买商品的二分类问题。首先针对GBDT易于过拟合且训练速度慢的问题,本文提出了解决方案并在其中主要做了三个工作:(1)提出了一种基于模型参数的复杂度的正则化方法,来降低boosting的过拟合的程度;(2)提出了一种基于牛顿法思想的优化方法,用该方法替代现有的梯度下降寻优方式,经过实验可以证明,该方法可以更快的收敛到最优解附近;(3)提出了一种基于分位点的近似算法,用该方法替代目前的精确算法,实验证明该方法在精确率上逼近了精确算法,并且比原始算法降低了一定的时间开销。最后根据实验表明,在数据集较大的情况下,使用改进的GBDT实现的推荐方法效果要明显好于传统的协同过滤方法和一些经典的集成学习算法。本文的研究成果为协同过滤算法和GBDT单模型在复杂数据中效率低、推荐效果不足的问题提供了新的思路和有效的方法,使得推荐效果有了较大提升。

Other Abstract

As the largest e-commerce company in China and in the world, Alibaba opened a batch of access data of the real user behavior of Tmall mall in 2016. It hopes to find the user's hobby by analyzing the desensitization log of 4 months in the consumer's history, and then recommends its preferences for the user. This paper studies how to realize personalized e-commerce recommendation through the user's historical behavior and the property of the goods under the lack of understanding of the specific information of users and goods. In view of the fact that the existing user behavior data are too sparse and lack of objective conditions such as user ratings, collaborative filtering and other classical algorithms are not effective. This paper deals with data from data analysis, data cleaning, multi-dimensional Feature Engineering and feature selection, and puts forward an integrated learning method based on gradient lifting decision tree for the above problems, which transforms the recommendation problem into a problem that predicts the availability of the purchase of goods by the user. In order to solve the problem that GBDT is easy to fit and train slowly, this paper proposes a solution and makes three main tasks in it. (1) a regularization method based on model parameters is proposed to reduce the degree of overfitting of boosting. (2) an optimization method based on Newton method is proposed. This method is used to replace the existing gradient descent optimization method. Through experiments, it is proved that the method can converge to the optimal solution faster. (3) an approximate algorithm based on the loci is proposed, which is used to replace the present exact algorithm. The experiment shows that the method approximated the exact algorithm in the accuracy rate. Compared with the original algorithm, a certain amount of time cost is reduced. Finally, the experimental results show that the improved GBDT implementation is better than the traditional collaborative filtering method and some classical integrated learning algorithms. The research results of this paper provide new ideas and effective methods for the collaborative filtering algorithm and the GBDT single model in the complex data, which has low efficiency and lack of recommendation effect, which makes the recommendation effect a great improvement.

Language中文
Contribution Rank1
Document Type学位论文
Identifierhttp://ir.sia.cn/handle/173321/21769
Collection工业控制网络与系统研究室
Affiliation1.中国科学院沈阳自动化研究所
2.中国科学院大学
Recommended Citation
GB/T 7714
孙靖哲. 改进集成学习算法在电商推荐中的研究与应用[D]. 沈阳. 中国科学院沈阳自动化研究所,2018.
Files in This Item:
File Name/Size DocType Version Access License
改进集成学习算法在电商推荐中的研究与应用(4723KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[孙靖哲]'s Articles
Baidu academic
Similar articles in Baidu academic
[孙靖哲]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[孙靖哲]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.