Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (5): 71-82     https://doi.org/10.11925/infotech.2096-3467.2020.1050
     研究论文 本期目录 | 过刊浏览 | 高级检索 |
融合标签和内容信息的矩阵分解推荐方法*
马莹雪,甘明鑫(),肖克峻
北京科技大学经济管理学院 北京 100083
A Matrix Factorization Recommendation Method with Tags and Contents
Ma Yingxue,Gan Mingxin(),Xiao Kejun
School of Economics and Management, University of Science and Technology Beijing, Beijing 100083, China
全文: PDF (1646 KB)   HTML ( 29
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 针对推荐系统的异构信息融合问题,提出融合标签和内容数据的矩阵分解方法TCMF,减小预测误差,克服评分数据稀疏问题,提升矩阵分解算法鲁棒性。【方法】 使用Embedding实现内容文本数据的结构化,使用卷积神经网络(CNN)提取深层次内容特征,利用深度神经网络(DNN)融合内容与标签信息得到综合特征,基于矩阵分解算法提出TCMF评分预测方法。在真实电影数据集上的实验进一步探究了不同特征融合方式、不同电影内容和正则化参数对算法预测性能的影响。【结果】 在MovieLens-20m数据集上的实验显示,TCMF降低了电影评分预测误差,实现的最低RMSE为0.829 5,最低MAE为0.618 9,相比于对比方法在RMSE和MAE上的最高降幅达到9.62%和14.17%。【局限】 由于缺少用户信息,TCMF在表征用户的个性化特征上有所欠缺。【结论】 融合异构的标签和内容信息不仅能够降低用户评分预测误差,而且可以提高预测算法的鲁棒性。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
马莹雪
甘明鑫
肖克峻
关键词 推荐算法矩阵分解深度学习异构信息    
Abstract

[Objective] This paper proposes a matrix factorization method (TCMF) integrating tags and contents, aiming to address the issue of heterogeneous information fusion in recommendation system. It tries to reduce prediction errors, overcome the problem of data sparsity, and improve the robustness of matrix factorization algorithm. [Methods] We transformed textual message to structured data with the help of embedding. Then, we extracted hidden features with CNN. Third, we merged the features of movie contents and tags with DNN to obtain comprehensive features. Finally, we proposed the TCMF based on matrix factorization algorithm and evaluated its performance with movie rating dataset (MovieLens-20m). [Results] The TCMF reduced the error of movie rating predictions (with the lowest RMSE of 0.829 5 and the lowest MAE of 0.618 9). Compared with the exisiting methods, the maxium reduction of RMSE and MAE were 9.62% and 14.17%. [Limitations] Due to the lack of information, the TCMF cannot characterize users’ personalized features. [Conclusions] The proposed model not only reduces the error of rating prediction, but also improves robustness of algorithm.

Key wordsRecommendation    Algorithm    Matrix    Factorization    Deep    Learning    Heterogeneous    Information
收稿日期: 2020-10-26      出版日期: 2021-05-27
ZTFLH:  TP391  
基金资助:*本文系国家自然科学基金项目的研究成果之一(71871019);本文系国家自然科学基金项目的研究成果之一(71471016)
通讯作者: 甘明鑫     E-mail: ganmx@ustb.edu.cn
引用本文:   
马莹雪,甘明鑫,肖克峻. 融合标签和内容信息的矩阵分解推荐方法*[J]. 数据分析与知识发现, 2021, 5(5): 71-82.
Ma Yingxue,Gan Mingxin,Xiao Kejun. A Matrix Factorization Recommendation Method with Tags and Contents. Data Analysis and Knowledge Discovery, 2021, 5(5): 71-82.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.1050      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I5/71
Fig.1  电影的标签和内容信息示例
Fig.2  基于异构信息融合的矩阵分解算法流程
数据类型 特征名称 数据示例
标签数据 年份 2011
类别 Crime | Drama | Mystery | Thriller
内容数据 摘要 This series focuses on the NYPD’s Major Case Squad, a force of detectives who investigate high-profile cases, whilst also showing parts of the crime from the criminal's point of view to the audience.
故事情节 This show centers on the NYPD’s Major Case Squad (and the offbeat, Sherlock Holmes-like Detective Robert Goren) in its efforts to stop the worst criminal offenders in New York. It also puts a new twist to the “Law & Order” formula: now, in each episode, we see the crimes as they are planned and committed.
演员介绍 Kathryn Erbe was born on July 5, 1965 in Newton, Massachusetts, USA as Kathryn Elsbeth Erbe. She is known for her work on Law & Order: Criminal Intent (2001), Stir of Echoes (1999) and What About Bob? (1991). She was previously married to Terry Kinney.
用户评论 After seeing this show and having watched the other 2 L&O shows, I must say that this one has made me think the most and always has me gripping right to the end just like the other two. All 3 have become excellent shows and each stands out has forged its own identity. D'Onofrio is so good it will give you chills at times. 5 out of 5.
Table 1  标签和内容数据示例(示例电影:Law & Order: Criminal Intent)
Fig.3  TCMF方法的结构
Fig.4  内容数据预处理步骤
预测方法 λu λo
NMF 0.02 0.002
PMF 0.02 0.002
ConvMF 0.02 0.02
CNMF 0.02 0.02
CDMF 0.02 0.02
TCMF 0.02 0.02
Table 2  不同方法的实验参数
对比方法 RMSE MAE
NMF 0.917 8 0.721 1
PMF 0.875 0 0.654 8
CNMF 0.861 7 0.642 2
ConvMF 0.848 1 0.638 7
CDMF 0.834 6 0.630 1
TCMF 0.829 5 0.618 9
Table 3  不同方法的实验结果对比
Fig.5  ConvMF, CDMF和TCMF模型在噪声实验中的结果对比
内容实验 摘要 故事情节 演员介绍 用户评论 RMSE MAE
C-1 0.839 8 0.621 4
C-2 0.840 1 0.630 2
C-3 0.833 2 0.622 1
C-4 0.837 7 0.630 0
只使用一种内容信息时的平均预测效果 0.837 7 0.625 9
C-5 0.847 8 0.632 1
C-6 0.837 7 0.629 8
C-7 0.841 0 0.630 1
C-8 0.836 0 0.623 1
C-9 0.832 6 0.620 3
C-10 0.830 1 0.619 3
使用两种内容信息时的平均预测效果 0.837 5 0.625 8
C-11 0.843 4 0.631 0
C-12 0.840 2 0.629 8
C-13 0.829 9 0.619 3
C-14 0.835 5 0.620 9
使用三种内容信息时的平均预测效果 0.837 3 0.625 3
C-15 0.829 5 0.618 9
Table 4  不同内容文本的实验结果对比
Fig.6  不同特征融合方式的实验对比
Fig.7  不同参数下的实验对比
[1] Li Y, Liu T, Hu J, et al. Topical Co-Attention Networks for Hashtag Recommendation on Microblogs[J]. Neurocomputing, 2019,331:356-365.
doi: 10.1016/j.neucom.2018.11.057
[2] Lee W P, Chen C T, Huang J Y, et al. A Smartphone-based Activity-aware System for Music Streaming Recommendation[J]. Knowledge-Based Systems, 2017,131:70-82.
doi: 10.1016/j.knosys.2017.06.002
[3] 顾军华, 李新晨, 张亚娟 , 等. 融合标签信息的卷积矩阵分解推荐算法[J]. 计算机应用与软件, 2020, 37(3):278-285,320.
[3] ( Gu Junhua, Li Xinchen, Zhang Yajuan, et al. Convolutional Matrix Factorization Recommendation Algorithm Fusing Social Tagging[J]. Computer Applications and Software, 2020,37(3):278-285,320.) 计算机应用与软件, 2020, 37(3):278-285,320.
[4] Khan Z, Iltaf N, Afzal H, et al. Enriching Non-negative Matrix Factorization with Contextual Embeddings for Recommender Systems[J]. Neurocomputing, 2020,380:246-258.
doi: 10.1016/j.neucom.2019.09.080
[5] Gan M, Ma Y, Xiao K. CDMF: A Deep Learning Model Based on Convolutional and Dense-layer Matrix Factorization for Context-Aware Recommendation[C]// Proceedings of the 52nd Hawaii International Conference on System Sciences, Hawaii, USA. USA: IEEE, 2019: 1126-1133.
[6] Lei C, Liu D, Li W, et al. Comparative Deep Learning of Hybrid Representations for Image Recommendations[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA. USA: IEEE, 2016: 2545-2553.
[7] 涂海丽, 唐晓波. 基于标签的商品推荐模型研究[J]. 数据分析与知识发现, 2017,1(9):28-39.
[7] ( Tu Haili, Tang Xiaobo. Building Product Recommendation Model Based on Tags[J]. Data Analysis and Knowledge Discovery, 2017,1(9):28-39.)
[8] Krizhevsky A, Sutskever I, Hinton G. ImageNet Classification with Deep Convolutional Neural Networks[C]// Proceedings of the 25th International Conference on Neural Information Processing Systems, Granada, Spain. USA: Curran Associates Inc., 2012: 1097-1105.
[9] Khan M A, Sharif M I, Raza M, et al. Skin Lesion Segmentation and Classification: A Unified Framework of Deep Neural Network Features Fusion and Selection[J]. Expert Systems, 2019(1):e12497.
[10] Koren Y, Bell R, Volinsky C. Matrix Factorization Techniques for Recommender Systems[J]. Computer, 2009,42(8):30-37.
[11] Lee P C, Long D, Ye B, et al. Dynamic BIM Component Recommendation Method Based on Probabilistic Matrix Factorization and Grey Model[J]. Advanced Engineering Informatics, 2020,43:101024.
doi: 10.1016/j.aei.2019.101024
[12] Cui L, Huang W, Yan Q, et al. A Novel Context-aware Recommendation Algorithm with Two-level SVD in Social Networks[J]. Future Generation Computer Systems, 2018,86:1459-1470.
doi: 10.1016/j.future.2017.07.017
[13] Gu Y, Yang X, Peng M, et al. Robust Weighted SVD-type Latent Factor Models for Rating Prediction[J]. Expert Systems with Applications, 2020,141:112885.
doi: 10.1016/j.eswa.2019.112885
[14] Bao H, Li Q, Liao S S, et al. A New Temporal and Social PMF-based Method to Predict Users’ Interests in Micro-blogging[J]. Decision Support Systems, 2013,55(3):698-709.
doi: 10.1016/j.dss.2013.02.007
[15] Zhang W, Zhang X, Wang H, et al. A Deep Variational Matrix Factorization Method for Recommendation on Large Scale Sparse Dataset[J]. Neurocomputing, 2019,334:206-218.
doi: 10.1016/j.neucom.2019.01.028
[16] Liu D, Ye X. A Matrix Factorization Based Dynamic Granularity Recommendation with Three-way Decisions[J]. Knowledge-Based Systems, 2020,191:105243.
doi: 10.1016/j.knosys.2019.105243
[17] Pujahari A, Sisodia D S. Pair-wise Preference Relation Based Probabilistic Matrix Factorization for Collaborative Filtering in Recommender System[J]. Knowledge-Based Systems, 2020,196:105798.
doi: 10.1016/j.knosys.2020.105798
[18] Shen R P, Zhang H R, Yu H, et al. Sentiment Based Matrix Factorization with Reliability for Recommendation[J]. Expert Systems with Applications, 2019,135:249-258.
doi: 10.1016/j.eswa.2019.06.001
[19] 崔春生, 王辉, 李群. 基于用户标签和信任关系的协同过滤推荐算法研究[J]. 系统科学与数学, 2019,39(3):437-448.
[19] ( Cui Chunsheng, Wang Hui, Li Qun. Research on Collaborative Filtering Recommendation Algorithm Based on User Tags[J]. Journal of Systems Science and Mathematical Sciences, 2019,39(3):437-448.)
[20] 邢长征, 杨晓婷. 基于SVD++与标签的跨域推荐模型[J]. 计算机工程, 2018,44(4):225-230.
[20] ( Xing Changzheng, Yang Xiaoting. Cross-domain Recommendation Model Based on SVD++ and Tag[J]. Computer Engineering, 2018,44(4):225-230.)
[21] 叶佳鑫, 熊回香. 基于标签的跨领域资源个性化推荐研究[J]. 数据分析与知识发现, 2019,3(2):21-32.
[21] ( Ye Jiaxin, Xiong Huixiang. Recommending Personalized Contents from Cross-Domain Resources Based on Tags[J]. Data Analysis and Knowledge Discovery, 2019,3(2):21-32.)
[22] Birant D, Kut A. ST-DBSCAN: An Algorithm for Clustering Spatial-temporal Data[J]. Data & Knowledge Engineering, 2007,60(1):208-221.
doi: 10.1016/j.datak.2006.01.013
[23] Sun P, Wang L, Xia Q. The Keyword Extraction of Chinese Medical Web Page Based on WF-TF-IDF Algorithm[C]// Proceedings of the 2017 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), Nanjing, China. USA: IEEE, 2017: 193-198.
[24] 马闻锴, 李贵, 李征宇, 等. 一种基于标签的Top-N个性化推荐算法[J]. 计算机科学, 2019,46(S2):224-229.
[24] ( Ma Wenkai, Li Gui, Li Zhengyu, et al. Top-N Personalized Recommendation Algorithm Based on Tag[J]. Computer Science, 2019,46(S2):224-229.)
[25] 朱峙成, 刘佳玮, 阎少宏. 多标签学习在智能推荐中的研究与应用[J]. 计算机科学, 2019,46(S2):189-193.
[25] ( Zhu Zhicheng, Liu Jiawei, Yan Shaohong. Research and Application of Multi-label Learning in Intelligent Recommendation[J]. Computer Science, 2019,46(S2):189-193.)
[26] 文俊浩, 袁培雷, 曾骏 , 等. 基于标签主题的协同过滤推荐算法研究[J]. 计算机工程, 2017, 43(1): 247-252,258.
[26] ( Wen Junhao, Yuan Peilei, Zeng Jun, et al. Research on Collaborative Filtering Recommendation Algorithm Based on Topic of Tags[J]. Computer Engineering, 2017,43(1):247-252,258.)
[27] Da’u A, Salim N, Rabiu I, et al. Weighted Aspect-based Opinion Mining Using Deep Learning for Recommender System[J]. Expert Systems with Applications, 2020,140:112871.
doi: 10.1016/j.eswa.2019.112871
[28] Liu H, Wang Y, Peng Q, et al. Hybrid Neural Recommendation with Joint Deep Representation Learning of Ratings and Reviews[J]. Neurocomputing, 2020,374:77-85.
doi: 10.1016/j.neucom.2019.09.052
[29] Chen L, Zhang L, Cao S, et al. Personalized Itinerary Recommendation: Deep and Collaborative Learning with Textual Information[J]. Expert Systems with Applications, 2020,144:113070.
doi: 10.1016/j.eswa.2019.113070
[30] Wu H, Zhang Z, Yue K, et al. Dual-regularized Matrix Factorization with Deep Neural Networks for Recommender Systems[J]. Knowledge-Based Systems, 2018,145:46-58.
doi: 10.1016/j.knosys.2018.01.003
[31] Kim D, Park C, Oh J, et al. Convolutional Matrix Factorization for Document Context-Aware Recommendation[C]// Proceedings of the 10th ACM Conference on Recommender Systems, Boston Massachusetts, USA. USA: ACM, 2016: 233-240.
[32] Dong M, Li Y, Tang X, et al. Variable Convolution and Pooling Convolutional Neural Network for Text Sentiment Classification[J]. IEEE Access, 2020,8:16174-16186.
doi: 10.1109/Access.6287639
[33] Yu D, Chen N, Jiang F, et al. Constrained NMF-based Semi-supervised Learning for Social Media Spammer Detection[J]. Knowledge-Based Systems, 2017,125:64-73.
doi: 10.1016/j.knosys.2017.03.025
[34] Ren X, Song M, E H, et al. Context-aware Probabilistic Matrix Factorization Modeling for Point-of-interest Recommendation[J]. Neurocomputing, 2017,241(C):38-55.
doi: 10.1016/j.neucom.2017.02.005
[35] Pei X, Wu T. Convex-nonnegative Matrix Factorization with Structure Constraints[C]// Proceedings of the 10th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2013), Shenyang, China. USA: IEEE, 2013: 456-460.
[1] 周泽聿,王昊,赵梓博,李跃艳,张小琴. 融合关联信息的GCN文本分类模型构建及其应用研究*[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[2] 王勤洁, 秦春秀, 马续补, 刘怀亮, 徐存真. 基于作者偏好和异构信息网络的科技文献推荐方法研究*[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[3] 赵丹宁,牟冬梅,白森. 基于深度学习的科技文献摘要结构要素自动抽取方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[4] 徐月梅, 王子厚, 吴子歆. 一种基于CNN-BiLSTM多特征融合的股票走势预测模型*[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[5] 黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[6] 钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[7] 张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[8] 常城扬,王晓东,张胜磊. 基于深度学习方法对特定群体推特的动态政治情感极性分析*[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[9] 冯勇,刘洋,徐红艳,王嵘冰,张永刚. 融合近邻评论的GRU商品推荐模型*[J]. 数据分析与知识发现, 2021, 5(3): 78-87.
[10] 胡昊天,吉晋锋,王东波,邓三鸿. 基于深度学习的食品安全事件实体一体化呈现平台构建*[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
[11] 张琪,江川,纪有书,冯敏萱,李斌,许超,刘浏. 面向多领域先秦典籍的分词词性一体化自动标注模型构建*[J]. 数据分析与知识发现, 2021, 5(3): 2-11.
[12] 吕学强,罗艺雄,李家全,游新冬. 中文专利侵权检测研究综述*[J]. 数据分析与知识发现, 2021, 5(3): 60-68.
[13] 成彬,施水才,都云程,肖诗斌. 基于融合词性的BiLSTM-CRF的期刊关键词抽取方法[J]. 数据分析与知识发现, 2021, 5(3): 101-108.
[14] 李丹阳, 甘明鑫. 基于多源信息融合的音乐推荐方法 *[J]. 数据分析与知识发现, 2021, 5(2): 94-105.
[15] 余传明, 张贞港, 孔令格. 面向链接预测的知识图谱表示模型对比研究*[J]. 数据分析与知识发现, 2021, 5(11): 29-44.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn