Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (8): 128-137     https://doi.org/10.11925/infotech.2096-3467.2022.0775
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
集成因子分解机及其在论文推荐中的应用研究*
杨辰,郑若桢,王楚涵,耿爽(),王楠
深圳大学管理学院 深圳 518060
Ensemble Factorization Machine and Its Application in Paper Recommendation
Yang Chen,Zheng Ruozhen,Wang Chuhan,Geng Shuang(),Wang Nan
College of Management, Shenzhen University, Shenzhen 518060, China
全文: PDF (781 KB)   HTML ( 5
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】针对现有论文推荐方法在处理论文作者映射关系稀疏和特征表达时存在成效不足的问题,开发一种基于因子分解机和集成学习的新型论文推荐框架。【方法】使用卷积神经网络、网络嵌入等方法处理数据获取特征表示,将特征矩阵输入因子分解机,引入随机子空间法集成训练模型,最后通过投票机制协同后输出推荐结果。【结果】基于CiteULike数据集的实验结果表明,本文方法的推荐精确率、准确率和F度量分别为72.6%、69.7%和76.2%,分别比基准算法提升高于20个百分点、15个百分点和9个百分点。【局限】 负采样过程中缺乏正负样本语义相似性的考虑,在模型的输入构造、特征处理模式方面有待进一步探究。【结论】集成因子分解机能在数据稀疏情况下实现特征的有效表示和利用,从而提升推荐效果。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
杨辰
郑若桢
王楚涵
耿爽
王楠
关键词 论文推荐因子分解机集成学习    
Abstract

[Objective] This study proposes an improved paper recommendation framework based on Ensemble Learning and Factorization Machine. It addresses the issues of the existing methods, such as difficulties in processing sparse data and representing features. [Methods] First, we used Convolutional Neural Network, Network Embedding, and other algorithms to obtain feature representations, which were processed by Factorization Machine learners. Homogeneous weak Factorization Machine learners are then trained based on Ensemble Learning. We integrated these weak learners into a stronger learner through the voting mechanism and generated the final recommendations. [Results] We examined the new model with the CiteULike dataset, and the Precision, Accuracy, and F-Measure reached 72.6%, 69.7%, and 76.2%, respectively, 20%, 15%, and 9% higher than the benchmark algorithms. [Limitations] The input, sampling strategy, and processing mode need to be further explored. [Conclusions] The proposed Ensemble Factorization Machine enables effective representation and utilization of sparse data features, enhancing the recommendation performance.

Key wordsResearch Paper Recommendation    Factorization Machine    Ensemble Learning
收稿日期: 2022-07-25      出版日期: 2023-10-08
ZTFLH:  TP311  
  G250  
基金资助:* 国家自然科学基金项目(71701134);国家自然科学基金项目(71901150);广东省基础与应用基础研究基金资助项目(2019A1515011392)
通讯作者: 耿爽,ORCID:0000-0001-8146-0786,E-mail:gs@szu.edu.cn。   
引用本文:   
杨辰, 郑若桢, 王楚涵, 耿爽, 王楠. 集成因子分解机及其在论文推荐中的应用研究*[J]. 数据分析与知识发现, 2023, 7(8): 128-137.
Yang Chen, Zheng Ruozhen, Wang Chuhan, Geng Shuang, Wang Nan. Ensemble Factorization Machine and Its Application in Paper Recommendation. Data Analysis and Knowledge Discovery, 2023, 7(8): 128-137.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0775      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I8/128
Fig.1  卷积神经网络
Fig.2  DeepWalk模型
推荐类别 用户类别
感兴趣 不感兴趣
推荐 True Positive (TP) False Positive (FP)
不推荐 False Negative (FN) True Negative (TN)
Table 1  混淆矩阵
learning_rate 准确率
0.001 0.560
0.005 0.616
0.01 0.635
0.05 0.596
0.1 0.592
Table 2  不同学习率下的模型准确率
epochs 准确率
4 0.609
6 0.612
8 0.635
10 0.632
12 0.630
15 0.585
Table 3  不同迭代次数下的模型准确率
k 准确率
8 0.581
12 0.615
16 0.635
20 0.607
24 0.594
28 0.628
32 0.641
Table 4  不同隐向量维度下的模型准确率
算法简称 算法描述
FM 因子分解机(Factorization Machine),输入数据包括用户编码信息、论文编码信息、用户-论文交互信息,构造特征向量[28]
MF 矩阵分解(Matrix Factorization),将用户-论文矩阵分解为低秩的用户矩阵和论文矩阵,两矩阵相乘得到预测结果[37]
User_based CF 基于用户的协同过滤(User-based Collaborative Filtering),基于用户-论文矩阵计算用户相似度进行推荐[38]
Table 5  实验对比算法
算法 准确率 精确率 召回率 F度量
EFM 0.697 0.726 0.801 0.762
FM 0.534 0.516 0.934 0.664
MF 0.501 0.501 0.992 0.668
User_based CF 0.544 0.089 0.982 0.163
Table 6  实验结果
[1] Kong X J, Shi Y J, Yu S, et al. Academic Social Networks: Modeling, Analysis, Mining and Applications[J]. Journal of Network and Computer Applications, 2019, 132: 86-103.
doi: 10.1016/j.jnca.2019.01.029
[2] Nascimento C, Laender A H F, da Silva A S, et al. A Source Independent Framework for Research Paper Recommendation[C]// Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries. New York: ACM, 2011: 297-306.
[3] 杨辰, 刘婷婷, 刘雷, 等. 融合语义和社交特征的电子文献资源推荐方法研究[J]. 情报学报, 2019, 38(6): 632-640.
[3] (Yang Chen, Liu Tingting, Liu Lei, et al. A Novel Recommendation Approach of Electronic Literature Resources Combining Semantic and Social Features[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(6): 632-640.)
[4] Champiri Z D, Asemi A, Binti S S S. Meta-Analysis of Evaluation Methods and Metrics Used in Context-Aware Scholarly Recommender Systems[J]. Knowledge and Information Systems, 2019, 61(2): 1147-1178.
doi: 10.1007/s10115-018-1324-5
[5] Bhagavatula C, Feldman S, Power R, et al. Content-Based Citation Recommendation[OL]. arXiv Preprint, arXiv: 1802.08301.
[6] Basu C, Hirsh H, Cohen W W, et al. Technical Paper Recommendation: A Study in Combining Multiple Information Sources[J]. Journal of Artificial Intelligence Research, 2001, 14: 231-252.
doi: 10.1613/jair.739
[7] Caragea C, Bulgarov F A, Godea A, et al. Citation-Enhanced Keyphrase Extraction from Research Papers: A Supervised Approach[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2014: 1435-1446.
[8] McNee S M, Albert I, Cosley D, et al. On the Recommending of Citations for Research Papers[C]// Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work. New York: ACM, 2002: 116-125.
[9] 毕强, 刘健. 基于领域本体的数字文献资源聚合及服务推荐方法研究[J]. 情报学报, 2017, 36(5): 452-460.
[9] (Bi Qiang, Liu Jian. Study on the Method of Aggregation and Service Recommendation of Digital Resource Based on Domain Ontology[J]. Journal of the China Society for Scientific and Technical Information, 2017, 36(5): 452-460.)
[10] 李亚梅, 秦春秀, 马续补. 基于科研人员情境化主题偏好的科技文献协同推荐研究[J]. 情报理论与实践, 2021, 44(12): 180-189.
[10] (Li Yamei, Qin Chunxiu, Ma Xubu. Research on Collaborative Recommendation of Scientific and Technological Literature Based on Researchers’ Contextual Topic Preference[J]. Information Studies: Theory & Application, 2021, 44(12): 180-189.)
[11] Beel J, Gipp B, Langer S, et al. Research-Paper Recommender Systems: A Literature Survey[J]. International Journal on Digital Libraries, 2016, 17(4): 305-338.
doi: 10.1007/s00799-015-0156-0
[12] 汤志康, 李春英, 汤庸, 等. 学术社交平台论文推荐方法[J]. 计算机与数字工程, 2017, 45(2): 221-225.
[12] (Tang Zhikang, Li Chunying, Tang Yong, et al. Paper Recommendation Method Based on Scholar Social Platform[J]. Computer & Digital Engineering, 2017, 45(2): 221-225.)
[13] 刘健, 毕强, 刘庆旭, 等. 数字文献资源内容服务推荐研究——基于本体规则推理和语义相似度计算[J]. 现代图书情报技术, 2016(9): 70-77.
[13] (Liu Jian, Bi Qiang, Liu Qingxu, et al. New Content Recommendation Service of Digital Literature[J]. New Technology of Library and Information Service, 2016(9): 70-77.)
[14] 陈海华, 孟睿, 陆伟. 学术文献引文推荐研究进展[J]. 图书情报工作, 2015, 59(15): 133-143.
doi: 10.13266/j.issn.0252-3116.2015.15.018
[14] (Chen Haihua, Meng Rui, Lu Wei. Research Review on Citation Recommendation of Academic Literatures[J]. Library and Information Service, 2015, 59(15): 133-143.)
doi: 10.13266/j.issn.0252-3116.2015.15.018
[15] 刘扬. 基于质量的学术文献混合推荐模型研究[J]. 情报理论与实践, 2015, 38(2): 17-22.
[15] (Liu Yang. Research on the Hybrid Recommendation Model of Academic Reference Based on Quality[J]. Information Studies: Theory & Application, 2015, 38(2): 17-22.)
[16] Haruna K, Ismail M A, Damiasih D, et al. A Collaborative Approach for Research Paper Recommender System[J]. PLoS One, 2017, 12(10): e0184516.
doi: 10.1371/journal.pone.0184516
[17] Guo Q Y, Zhuang F Z, Qin C, et al. A Survey on Knowledge Graph-Based Recommender Systems[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(8): 3549-3568.
doi: 10.1109/TKDE.2020.3028705
[18] Kanakia A, Shen Z H, Eide D, et al. A Scalable Hybrid Research Paper Recommender System for Microsoft Academic[C]// Proceedings of the 2019 World Wide Web Conference. New York: ACM, 2019: 2893-2899.
[19] 王勤洁, 秦春秀, 马续补, 等. 基于作者偏好和异构信息网络的科技文献推荐方法研究[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[19] (Wang Qinjie, Qin Chunxiu, Ma Xubu, et al. Research on Recommendation Method of Scientific and Technological Literature Based on Author Preference and Heterogeneous Information Network[J]. Data Analysis and Knowledge Discovery, 2021, 5(8): 54-64.)
[20] Ricci F, Rokach L, Shapira B. Introduction to Recommender Systems Handbook[A]// Ricci F, Rokach L, Shapira B, et al. Recommender Systems Handbook[M]. Boston, MA: Springer, 2011: 1-35.
[21] 蔡毅, 朱秀芳, 孙章丽, 等. 半监督集成学习综述[J]. 计算机科学, 2017, 44(S1): 7-13.
[21] (Cai Yi, Zhu Xiufang, Sun Zhangli, et al. Semi-Supervised and Ensemble Learning: A Review[J]. Computer Science, 2017, 44(S1): 7-13.)
[22] Sagi O, Rokach L. Ensemble Learning: A Survey[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2018, 8(4): e1249.
doi: 10.1002/widm.2018.8.issue-4
[23] Breiman L. Bagging Predictors[J]. Machine Learning, 1996, 24(2): 123-140.
[24] Freund Y, Schapire R E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting[J]. Journal of Computer and System Sciences, 1997, 55(1): 119-139.
doi: 10.1006/jcss.1997.1504
[25] Ho T K. The Random Subspace Method for Constructing Decision Forests[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(8): 832-844.
doi: 10.1109/34.709601
[26] Rokach L. Ensemble-Based Classifiers[J]. Artificial Intelligence Review, 2010, 33(1): 1-39.
doi: 10.1007/s10462-009-9124-7
[27] Kuncheva L I, Whitaker C J. Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy[J]. Machine Learning, 2003, 51(2): 181-207.
doi: 10.1023/A:1022859003006
[28] Rendle S. Factorization Machines[C]// Proceedings of the 2010 IEEE International Conference on Data Mining. IEEE, 2011: 995-1000.
[29] Gu J X, Wang Z H, Kuen J, et al. Recent Advances in Convolutional Neural Networks[J]. Pattern Recognition, 2018, 77: 354-377.
[30] Cui P, Wang X, Pei J, et al. A Survey on Network Embedding[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(5): 833-852.
doi: 10.1109/TKDE.69
[31] Choong A C H, Lee N K. Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences Using Ordinal Versus One-Hot Encoding Method[C]// Proceedings of the 2017 International Conference on Computer and Drone Applications. IEEE, 2018: 60-65.
[32] 周飞燕, 金林鹏, 董军. 卷积神经网络研究综述[J]. 计算机学报, 2017, 40(6): 1229-1251.
[32] (Zhou Feiyan, Jin Linpeng, Dong Jun. Review of Convolutional Neural Network[J]. Chinese Journal of Computers, 2017, 40(6): 1229-1251.)
[33] Perozzi B, Al-Rfou R, Skiena S. DeepWalk: Online Learning of Social Representations[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2014: 701-710.
[34] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
[35] Kingma D P, Ba J. A Method for Stochastic Optimization[C]// Proceedings of the 3rd International Conference on Learning Representations. 2015.
[36] Wang H, Chen B Y, Li W J. Collaborative Topic Regression with Social Regularization for Tag Recommendation[C]// Proceedings of the 23rd International Joint Conference on Artificial Intelligence. 2013.
[37] Koren Y, Bell R, Volinsky C. Matrix Factorization Techniques for Recommender Systems[J]. Computer, 2009, 42(8): 30-37.
[38] Goldberg D, Nichols D, Oki B M, et al. Using Collaborative Filtering to Weave an Information Tapestry[J]. Communications of the ACM, 1992, 35(12): 61-70.
[1] 严颖, 黄奇, 李娜. 基于优化后集成学习模型的特征选择与疾病高效预警研究——以老年抑郁焦虑为例[J]. 数据分析与知识发现, 2023, 7(7): 74-88.
[2] 李锴君, 牛振东, 时恺泽, 邱萍. 基于学术知识图谱及主题特征嵌入的论文推荐方法*[J]. 数据分析与知识发现, 2023, 7(5): 48-59.
[3] 陈果, 叶潮. 融合半监督学习与主动学习的细分领域新闻分类研究*[J]. 数据分析与知识发现, 2022, 6(4): 28-38.
[4] 王楠, 李海荣, 谭舒孺. 基于舆情事件演化分析及改进KE-SMOTE算法的舆情反转预测研究*[J]. 数据分析与知识发现, 2022, 6(2/3): 396-408.
[5] 车宏鑫,王桐,王伟. 前列腺癌预测模型对比研究*[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[6] 徐良辰, 郭崇慧. 基于集成学习的胃癌生存预测模型研究*[J]. 数据分析与知识发现, 2021, 5(8): 86-99.
[7] 王楠,李海荣,谭舒孺. 基于改进SMOTE算法与集成学习的舆情反转预测研究*[J]. 数据分析与知识发现, 2021, 5(4): 37-48.
[8] 邱云飞, 郭蕾. 面向非均衡数据的糖尿病并发症预测[J]. 数据分析与知识发现, 2021, 5(2): 116-128.
[9] 余本功,汲浩敏. 基于DW-TCI的半监督文本分类方法研究*[J]. 数据分析与知识发现, 2020, 4(10): 58-69.
[10] 余本功,曹雨蒙,陈杨楠,杨颖. 基于nLD-SVM-RF的短文本分类研究*[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[11] 余本功,陈杨楠,杨颖. 基于nBD-SVM模型的投诉短文本分类*[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[12] 肖连杰,郜梦蕊,苏新宁. 一种基于模糊C-均值聚类的欠采样集成不平衡数据分类算法*[J]. 数据分析与知识发现, 2019, 3(4): 90-96.
[13] 操玮, 李灿, 贺婷婷, 朱卫东. 基于集成学习的中国P2P网络借贷信用风险预警模型的对比研究*[J]. 数据分析与知识发现, 2018, 2(10): 65-76.
[14] 王华秋, 王斌, 聂珍. 一种应用多储备池回声状态网络的图像语义映射研究[J]. 现代图书情报技术, 2015, 31(6): 41-48.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn