Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (5): 71-82    DOI: 10.11925/infotech.2096-3467.2020.1050
A Matrix Factorization Recommendation Method with Tags and Contents
Ma Yingxue,Gan Mingxin(),Xiao Kejun
School of Economics and Management, University of Science and Technology Beijing, Beijing 100083, China
[Objective] This paper proposes a matrix factorization method (TCMF) integrating tags and contents, aiming to address the issue of heterogeneous information fusion in recommendation system. It tries to reduce prediction errors, overcome the problem of data sparsity, and improve the robustness of matrix factorization algorithm. [Methods] We transformed textual message to structured data with the help of embedding. Then, we extracted hidden features with CNN. Third, we merged the features of movie contents and tags with DNN to obtain comprehensive features. Finally, we proposed the TCMF based on matrix factorization algorithm and evaluated its performance with movie rating dataset (MovieLens-20m). [Results] The TCMF reduced the error of movie rating predictions (with the lowest RMSE of 0.829 5 and the lowest MAE of 0.618 9). Compared with the exisiting methods, the maxium reduction of RMSE and MAE were 9.62% and 14.17%. [Limitations] Due to the lack of information, the TCMF cannot characterize users’ personalized features. [Conclusions] The proposed model not only reduces the error of rating prediction, but also improves robustness of algorithm.

Received: 26 October 2020      Published: 27 May 2021
Fund:The work is supported by the National Natural Science Foundation of China(71871019);The work is supported by the National Natural Science Foundation of China(71471016)
Ma Yingxue,Gan Mingxin,Xiao Kejun. A Matrix Factorization Recommendation Method with Tags and Contents. Data Analysis and Knowledge Discovery, 2021, 5(5): 71-82.

Example of Tag and Content Information of a Movie
Process of Matrix Factorization Algorithm Based on Heterogeneous Information Fusion
数据类型 特征名称 数据示例
标签数据 年份 2011
类别 Crime | Drama | Mystery | Thriller
内容数据 摘要 This series focuses on the NYPD’s Major Case Squad, a force of detectives who investigate high-profile cases, whilst also showing parts of the crime from the criminal's point of view to the audience.
故事情节 This show centers on the NYPD’s Major Case Squad (and the offbeat, Sherlock Holmes-like Detective Robert Goren) in its efforts to stop the worst criminal offenders in New York. It also puts a new twist to the “Law & Order” formula: now, in each episode, we see the crimes as they are planned and committed.
演员介绍 Kathryn Erbe was born on July 5, 1965 in Newton, Massachusetts, USA as Kathryn Elsbeth Erbe. She is known for her work on Law & Order: Criminal Intent (2001), Stir of Echoes (1999) and What About Bob? (1991). She was previously married to Terry Kinney.
用户评论 After seeing this show and having watched the other 2 L&O shows, I must say that this one has made me think the most and always has me gripping right to the end just like the other two. All 3 have become excellent shows and each stands out has forged its own identity. D'Onofrio is so good it will give you chills at times. 5 out of 5.
Example of Tag and Content Data (Example Movie: Law & Order: Criminal Intent)
The Structure of TCMF
The Procedures of Content Data Preprocessing
预测方法 λu λo
NMF 0.02 0.002
PMF 0.02 0.002
ConvMF 0.02 0.02
CNMF 0.02 0.02
CDMF 0.02 0.02
TCMF 0.02 0.02
Parameters of Different Methods
NMF 0.917 8 0.721 1
PMF 0.875 0 0.654 8
CNMF 0.861 7 0.642 2
ConvMF 0.848 1 0.638 7
CDMF 0.834 6 0.630 1
TCMF 0.829 5 0.618 9
Results of Different Methods
Results of ConvMF, CDMF and TCMF Model in Noise Experiments
内容实验 摘要 故事情节 演员介绍 用户评论 RMSE MAE
C-1 0.839 8 0.621 4
C-2 0.840 1 0.630 2
C-3 0.833 2 0.622 1
C-4 0.837 7 0.630 0
只使用一种内容信息时的平均预测效果 0.837 7 0.625 9
C-5 0.847 8 0.632 1
C-6 0.837 7 0.629 8
C-7 0.841 0 0.630 1
C-8 0.836 0 0.623 1
C-9 0.832 6 0.620 3
C-10 0.830 1 0.619 3
使用两种内容信息时的平均预测效果 0.837 5 0.625 8
C-11 0.843 4 0.631 0
C-12 0.840 2 0.629 8
C-13 0.829 9 0.619 3
C-14 0.835 5 0.620 9
使用三种内容信息时的平均预测效果 0.837 3 0.625 3
C-15 0.829 5 0.618 9
Results of Different Description Contents
Results of Different Feature Fusion Methods
Results of Different Parameters
