Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (5): 71-82    DOI: 10.11925/infotech.2096-3467.2020.1050
Current Issue | Archive | Adv Search |
A Matrix Factorization Recommendation Method with Tags and Contents
Ma Yingxue,Gan Mingxin(),Xiao Kejun
School of Economics and Management, University of Science and Technology Beijing, Beijing 100083, China
Download: PDF (1646 KB)   HTML ( 29
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a matrix factorization method (TCMF) integrating tags and contents, aiming to address the issue of heterogeneous information fusion in recommendation system. It tries to reduce prediction errors, overcome the problem of data sparsity, and improve the robustness of matrix factorization algorithm. [Methods] We transformed textual message to structured data with the help of embedding. Then, we extracted hidden features with CNN. Third, we merged the features of movie contents and tags with DNN to obtain comprehensive features. Finally, we proposed the TCMF based on matrix factorization algorithm and evaluated its performance with movie rating dataset (MovieLens-20m). [Results] The TCMF reduced the error of movie rating predictions (with the lowest RMSE of 0.829 5 and the lowest MAE of 0.618 9). Compared with the exisiting methods, the maxium reduction of RMSE and MAE were 9.62% and 14.17%. [Limitations] Due to the lack of information, the TCMF cannot characterize users’ personalized features. [Conclusions] The proposed model not only reduces the error of rating prediction, but also improves robustness of algorithm.

Key wordsRecommendation      Algorithm      Matrix      Factorization      Deep      Learning      Heterogeneous      Information     
Received: 26 October 2020      Published: 27 May 2021
ZTFLH:  TP391  
Fund:The work is supported by the National Natural Science Foundation of China(71871019);The work is supported by the National Natural Science Foundation of China(71471016)
Corresponding Authors: Gan Mingxin     E-mail: ganmx@ustb.edu.cn

Cite this article:

Ma Yingxue,Gan Mingxin,Xiao Kejun. A Matrix Factorization Recommendation Method with Tags and Contents. Data Analysis and Knowledge Discovery, 2021, 5(5): 71-82.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.1050     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I5/71

Example of Tag and Content Information of a Movie
Process of Matrix Factorization Algorithm Based on Heterogeneous Information Fusion
数据类型 特征名称 数据示例
标签数据 年份 2011
类别 Crime | Drama | Mystery | Thriller
内容数据 摘要 This series focuses on the NYPD’s Major Case Squad, a force of detectives who investigate high-profile cases, whilst also showing parts of the crime from the criminal's point of view to the audience.
故事情节 This show centers on the NYPD’s Major Case Squad (and the offbeat, Sherlock Holmes-like Detective Robert Goren) in its efforts to stop the worst criminal offenders in New York. It also puts a new twist to the “Law & Order” formula: now, in each episode, we see the crimes as they are planned and committed.
演员介绍 Kathryn Erbe was born on July 5, 1965 in Newton, Massachusetts, USA as Kathryn Elsbeth Erbe. She is known for her work on Law & Order: Criminal Intent (2001), Stir of Echoes (1999) and What About Bob? (1991). She was previously married to Terry Kinney.
用户评论 After seeing this show and having watched the other 2 L&O shows, I must say that this one has made me think the most and always has me gripping right to the end just like the other two. All 3 have become excellent shows and each stands out has forged its own identity. D'Onofrio is so good it will give you chills at times. 5 out of 5.
Example of Tag and Content Data (Example Movie: Law & Order: Criminal Intent)
The Structure of TCMF
The Procedures of Content Data Preprocessing
预测方法 λu λo
NMF 0.02 0.002
PMF 0.02 0.002
ConvMF 0.02 0.02
CNMF 0.02 0.02
CDMF 0.02 0.02
TCMF 0.02 0.02
Parameters of Different Methods
对比方法 RMSE MAE
NMF 0.917 8 0.721 1
PMF 0.875 0 0.654 8
CNMF 0.861 7 0.642 2
ConvMF 0.848 1 0.638 7
CDMF 0.834 6 0.630 1
TCMF 0.829 5 0.618 9
Results of Different Methods
Results of ConvMF, CDMF and TCMF Model in Noise Experiments
内容实验 摘要 故事情节 演员介绍 用户评论 RMSE MAE
C-1 0.839 8 0.621 4
C-2 0.840 1 0.630 2
C-3 0.833 2 0.622 1
C-4 0.837 7 0.630 0
只使用一种内容信息时的平均预测效果 0.837 7 0.625 9
C-5 0.847 8 0.632 1
C-6 0.837 7 0.629 8
C-7 0.841 0 0.630 1
C-8 0.836 0 0.623 1
C-9 0.832 6 0.620 3
C-10 0.830 1 0.619 3
使用两种内容信息时的平均预测效果 0.837 5 0.625 8
C-11 0.843 4 0.631 0
C-12 0.840 2 0.629 8
C-13 0.829 9 0.619 3
C-14 0.835 5 0.620 9
使用三种内容信息时的平均预测效果 0.837 3 0.625 3
C-15 0.829 5 0.618 9
Results of Different Description Contents
Results of Different Feature Fusion Methods
Results of Different Parameters
[1] Li Y, Liu T, Hu J, et al. Topical Co-Attention Networks for Hashtag Recommendation on Microblogs[J]. Neurocomputing, 2019,331:356-365.
doi: 10.1016/j.neucom.2018.11.057
[2] Lee W P, Chen C T, Huang J Y, et al. A Smartphone-based Activity-aware System for Music Streaming Recommendation[J]. Knowledge-Based Systems, 2017,131:70-82.
doi: 10.1016/j.knosys.2017.06.002
[3] 顾军华, 李新晨, 张亚娟 , 等. 融合标签信息的卷积矩阵分解推荐算法[J]. 计算机应用与软件, 2020, 37(3):278-285,320.
[3] ( Gu Junhua, Li Xinchen, Zhang Yajuan, et al. Convolutional Matrix Factorization Recommendation Algorithm Fusing Social Tagging[J]. Computer Applications and Software, 2020,37(3):278-285,320.) 计算机应用与软件, 2020, 37(3):278-285,320.
[4] Khan Z, Iltaf N, Afzal H, et al. Enriching Non-negative Matrix Factorization with Contextual Embeddings for Recommender Systems[J]. Neurocomputing, 2020,380:246-258.
doi: 10.1016/j.neucom.2019.09.080
[5] Gan M, Ma Y, Xiao K. CDMF: A Deep Learning Model Based on Convolutional and Dense-layer Matrix Factorization for Context-Aware Recommendation[C]// Proceedings of the 52nd Hawaii International Conference on System Sciences, Hawaii, USA. USA: IEEE, 2019: 1126-1133.
[6] Lei C, Liu D, Li W, et al. Comparative Deep Learning of Hybrid Representations for Image Recommendations[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA. USA: IEEE, 2016: 2545-2553.
[7] 涂海丽, 唐晓波. 基于标签的商品推荐模型研究[J]. 数据分析与知识发现, 2017,1(9):28-39.
[7] ( Tu Haili, Tang Xiaobo. Building Product Recommendation Model Based on Tags[J]. Data Analysis and Knowledge Discovery, 2017,1(9):28-39.)
[8] Krizhevsky A, Sutskever I, Hinton G. ImageNet Classification with Deep Convolutional Neural Networks[C]// Proceedings of the 25th International Conference on Neural Information Processing Systems, Granada, Spain. USA: Curran Associates Inc., 2012: 1097-1105.
[9] Khan M A, Sharif M I, Raza M, et al. Skin Lesion Segmentation and Classification: A Unified Framework of Deep Neural Network Features Fusion and Selection[J]. Expert Systems, 2019(1):e12497.
[10] Koren Y, Bell R, Volinsky C. Matrix Factorization Techniques for Recommender Systems[J]. Computer, 2009,42(8):30-37.
[11] Lee P C, Long D, Ye B, et al. Dynamic BIM Component Recommendation Method Based on Probabilistic Matrix Factorization and Grey Model[J]. Advanced Engineering Informatics, 2020,43:101024.
doi: 10.1016/j.aei.2019.101024
[12] Cui L, Huang W, Yan Q, et al. A Novel Context-aware Recommendation Algorithm with Two-level SVD in Social Networks[J]. Future Generation Computer Systems, 2018,86:1459-1470.
doi: 10.1016/j.future.2017.07.017
[13] Gu Y, Yang X, Peng M, et al. Robust Weighted SVD-type Latent Factor Models for Rating Prediction[J]. Expert Systems with Applications, 2020,141:112885.
doi: 10.1016/j.eswa.2019.112885
[14] Bao H, Li Q, Liao S S, et al. A New Temporal and Social PMF-based Method to Predict Users’ Interests in Micro-blogging[J]. Decision Support Systems, 2013,55(3):698-709.
doi: 10.1016/j.dss.2013.02.007
[15] Zhang W, Zhang X, Wang H, et al. A Deep Variational Matrix Factorization Method for Recommendation on Large Scale Sparse Dataset[J]. Neurocomputing, 2019,334:206-218.
doi: 10.1016/j.neucom.2019.01.028
[16] Liu D, Ye X. A Matrix Factorization Based Dynamic Granularity Recommendation with Three-way Decisions[J]. Knowledge-Based Systems, 2020,191:105243.
doi: 10.1016/j.knosys.2019.105243
[17] Pujahari A, Sisodia D S. Pair-wise Preference Relation Based Probabilistic Matrix Factorization for Collaborative Filtering in Recommender System[J]. Knowledge-Based Systems, 2020,196:105798.
doi: 10.1016/j.knosys.2020.105798
[18] Shen R P, Zhang H R, Yu H, et al. Sentiment Based Matrix Factorization with Reliability for Recommendation[J]. Expert Systems with Applications, 2019,135:249-258.
doi: 10.1016/j.eswa.2019.06.001
[19] 崔春生, 王辉, 李群. 基于用户标签和信任关系的协同过滤推荐算法研究[J]. 系统科学与数学, 2019,39(3):437-448.
[19] ( Cui Chunsheng, Wang Hui, Li Qun. Research on Collaborative Filtering Recommendation Algorithm Based on User Tags[J]. Journal of Systems Science and Mathematical Sciences, 2019,39(3):437-448.)
[20] 邢长征, 杨晓婷. 基于SVD++与标签的跨域推荐模型[J]. 计算机工程, 2018,44(4):225-230.
[20] ( Xing Changzheng, Yang Xiaoting. Cross-domain Recommendation Model Based on SVD++ and Tag[J]. Computer Engineering, 2018,44(4):225-230.)
[21] 叶佳鑫, 熊回香. 基于标签的跨领域资源个性化推荐研究[J]. 数据分析与知识发现, 2019,3(2):21-32.
[21] ( Ye Jiaxin, Xiong Huixiang. Recommending Personalized Contents from Cross-Domain Resources Based on Tags[J]. Data Analysis and Knowledge Discovery, 2019,3(2):21-32.)
[22] Birant D, Kut A. ST-DBSCAN: An Algorithm for Clustering Spatial-temporal Data[J]. Data & Knowledge Engineering, 2007,60(1):208-221.
doi: 10.1016/j.datak.2006.01.013
[23] Sun P, Wang L, Xia Q. The Keyword Extraction of Chinese Medical Web Page Based on WF-TF-IDF Algorithm[C]// Proceedings of the 2017 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), Nanjing, China. USA: IEEE, 2017: 193-198.
[24] 马闻锴, 李贵, 李征宇, 等. 一种基于标签的Top-N个性化推荐算法[J]. 计算机科学, 2019,46(S2):224-229.
[24] ( Ma Wenkai, Li Gui, Li Zhengyu, et al. Top-N Personalized Recommendation Algorithm Based on Tag[J]. Computer Science, 2019,46(S2):224-229.)
[25] 朱峙成, 刘佳玮, 阎少宏. 多标签学习在智能推荐中的研究与应用[J]. 计算机科学, 2019,46(S2):189-193.
[25] ( Zhu Zhicheng, Liu Jiawei, Yan Shaohong. Research and Application of Multi-label Learning in Intelligent Recommendation[J]. Computer Science, 2019,46(S2):189-193.)
[26] 文俊浩, 袁培雷, 曾骏 , 等. 基于标签主题的协同过滤推荐算法研究[J]. 计算机工程, 2017, 43(1): 247-252,258.
[26] ( Wen Junhao, Yuan Peilei, Zeng Jun, et al. Research on Collaborative Filtering Recommendation Algorithm Based on Topic of Tags[J]. Computer Engineering, 2017,43(1):247-252,258.)
[27] Da’u A, Salim N, Rabiu I, et al. Weighted Aspect-based Opinion Mining Using Deep Learning for Recommender System[J]. Expert Systems with Applications, 2020,140:112871.
doi: 10.1016/j.eswa.2019.112871
[28] Liu H, Wang Y, Peng Q, et al. Hybrid Neural Recommendation with Joint Deep Representation Learning of Ratings and Reviews[J]. Neurocomputing, 2020,374:77-85.
doi: 10.1016/j.neucom.2019.09.052
[29] Chen L, Zhang L, Cao S, et al. Personalized Itinerary Recommendation: Deep and Collaborative Learning with Textual Information[J]. Expert Systems with Applications, 2020,144:113070.
doi: 10.1016/j.eswa.2019.113070
[30] Wu H, Zhang Z, Yue K, et al. Dual-regularized Matrix Factorization with Deep Neural Networks for Recommender Systems[J]. Knowledge-Based Systems, 2018,145:46-58.
doi: 10.1016/j.knosys.2018.01.003
[31] Kim D, Park C, Oh J, et al. Convolutional Matrix Factorization for Document Context-Aware Recommendation[C]// Proceedings of the 10th ACM Conference on Recommender Systems, Boston Massachusetts, USA. USA: ACM, 2016: 233-240.
[32] Dong M, Li Y, Tang X, et al. Variable Convolution and Pooling Convolutional Neural Network for Text Sentiment Classification[J]. IEEE Access, 2020,8:16174-16186.
doi: 10.1109/Access.6287639
[33] Yu D, Chen N, Jiang F, et al. Constrained NMF-based Semi-supervised Learning for Social Media Spammer Detection[J]. Knowledge-Based Systems, 2017,125:64-73.
doi: 10.1016/j.knosys.2017.03.025
[34] Ren X, Song M, E H, et al. Context-aware Probabilistic Matrix Factorization Modeling for Point-of-interest Recommendation[J]. Neurocomputing, 2017,241(C):38-55.
doi: 10.1016/j.neucom.2017.02.005
[35] Pei X, Wu T. Convex-nonnegative Matrix Factorization with Structure Constraints[C]// Proceedings of the 10th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2013), Shenyang, China. USA: IEEE, 2013: 456-460.
[1] Wang Hanxue,Cui Wenjuan,Zhou Yuanchun,Du Yi. Identifying Pathogens of Foodborne Diseases with Machine Learning[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[2] Chen Donghua,Zhao Hongmei,Shang Xiaopu,Zhang Runtong. Optimizing Large Hospital Operating Rooms with Data Analytics[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[3] Li Wenna,Zhang Zhixiong. Research on Knowledge Base Error Detection Method Based on Confidence Learning[J]. 数据分析与知识发现, 2021, 5(9): 1-9.
[4] Che Hongxin,Wang Tong,Wang Wei. Comparing Prediction Models for Prostate Cancer[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[5] Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[6] Ma Jiangwei, Lv Xueqiang, You Xindong, Xiao Gang, Han Junmei. Extracting Relationship Among Military Domains with BERT and Relation Position Features[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
[7] Wang Ruolin, Niu Zhendong, Lin Qika, Zhu Yifan, Qiu Ping, Lu Hao, Liu Donglei. Disambiguating Author Names with Embedding Heterogeneous Information and Attentive RNN Clustering Parameters[J]. 数据分析与知识发现, 2021, 5(8): 13-24.
[8] Tan Ying, Tang Yifei. Extracting Citation Contents with Coreference Resolution[J]. 数据分析与知识发现, 2021, 5(8): 25-33.
[9] Jiang Yaren, Le Xiaoqiu. Continual Learning for One-to-many Entity Relationship Generation with Small Samples[J]. 数据分析与知识发现, 2021, 5(8): 45-53.
[10] Wang Qinjie, Qin Chunxiu, Ma Xubu, Liu Huailiang, Xu Cunzhen. Recommending Scientific Literature Based on Author Preference and Heterogeneous Information Network[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[11] Su Qiang, Hou Xiaoli, Zou Ni. Predicting Surgical Infections Based on Machine Learning[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[12] Gu Yaowen, Zhang Bowen, Zheng Si, Yang Fengchun, Li Jiao. Predicting Drug ADMET Properties Based on Graph Attention Network[J]. 数据分析与知识发现, 2021, 5(8): 76-85.
[13] Xu Liangchen, Guo Chonghui. Predicting Survival Rates for Gastric Cancer Based on Ensemble Learning[J]. 数据分析与知识发现, 2021, 5(8): 86-99.
[14] Xu Yuemei, Wang Zihou, Wu Zixin. Predicting Stock Trends with CNN-BiLSTM Based Multi-Feature Integration Model[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[15] Yang Hanxun, Zhou Dequn, Ma Jing, Luo Yongcong. Detecting Rumors with Uncertain Loss and Task-level Attention Mechanism[J]. 数据分析与知识发现, 2021, 5(7): 101-110.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn