Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (9): 42-49    DOI: 10.11925/infotech.2096-3467.2018.0088
Current Issue | Archive | Adv Search |
Recommending Contents Based on Zhihu Q&A Community: Case Study of Logistics Topics
He Yue, Feng Yue, Zhao Shupeng(), Ma Yufeng
Business School, Sichuan University, Chengdu 610064, China
Download: PDF (616 KB)   HTML ( 3
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This research analyzes the social behaviors of Zhihu (https://www.zhihu.com/) users, aiming to recommend relevant contents more effectively. [Methods] First, we proposed a content recommendation method based on association rules-LDA topic model. Then, we constructed a network of shared sub-topics for specific topics and extracted keywords of the sub-topics with the LDA model. Finally, we pushed contents of the relevant topics for the users. [Results] Our study found that many sub-topics with high degrees of cooccurrence under the topic of logistics, and their confidence levels were above 65%. [Limitations] More comprehensive data is needed in future studies.[Conclusions] The association rule-LDA model provides new directions for content recommendation.

Key wordsZhihu      Association Rule      LDA      Content Recommendation     
Received: 18 January 2018      Published: 25 October 2018
ZTFLH:  分类号: G206.3  

Cite this article:

He Yue,Feng Yue,Zhao Shupeng,Ma Yufeng. Recommending Contents Based on Zhihu Q&A Community: Case Study of Logistics Topics. Data Analysis and Knowledge Discovery, 2018, 2(9): 42-49.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.0088     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I9/42

处理流程 内容
原始文本 那回北京飞常州, 原定20.00起飞, 流量管制到23.00登机, 结果排队起飞排了3个点, 轮到我们的时候机长亲切地说: 旅客朋友们你们好, 由于我们排队期间滑行时间过长, 飞机燃油不足需要加油, 我们将重新排队.......好气啊!! <img data-rawwidth="143" data-rawheight="89" src="https://pic1.zhimg.com/8efad7894eef191114b0e2779e98132c_b.jpg" class="content_image" width="143">
数据清洗 那回北京飞常州, 原定, 起飞, 流量管制到, 登机, 结果排队起飞排了个点, 轮到我们的时候机长亲切地说, 旅客朋友们你们好, 由于我们排队期间滑行时间过长, 飞机燃油不足需要加油, 我们将重新排队, , , , , , , 好气啊, , , ,
分 词 那回/ 北京/ 飞/ 常州/ 原定/ 起飞/ 流量/ 管制/ 到/ 登机/ 结果/ 排队/ 起飞/ 排/ 了/ 个/ 点/ 轮到/ 我们/ 的/ 时候/ 机长/ 亲切/ 地说/ 旅客/ 朋友/ 们/ 你们好/ 由于/ 我们/ 排队/ 期间/ 滑行/ 时间/ 过长/ 飞机/ 燃油/ 不足/ 需要/ 加油/ 我们/ 将 /重新/ 排队/ 好气/ 啊/
停用词过滤 那回/ 北京/ 飞/ 常州/ 原定/ 起飞/ 流量/ 管制/ 登机/ 排队/ 起飞/ 排/ 轮到/ 机长/ 亲切/ 地说/ 旅客/ 朋友/ 你们好/
排队/ 期间/ 滑行/ 时间/ 过长/ 飞机/ 燃油/ 不足/ 需要/ 加油/ 重新/ 排队/ 好气/
ID 生活 电子商务 采购 旅行
1 T F F F
2 T F F F
3 F T F F
4 F F T F
5 F F F T
规则 前项 后项 规则支持度(%) 支持度(%) 规则置信度(%) Lift
1 快递公司 快递 5.94 6.93 85.72 2.705
2 快递公司 物流 4.95 6.93 71.43 1.535
3 电子商务 物流 8.91 12.84 69.23 1.488
4 快递公司+快递 物流 3.96 5.94 66.67 1.433
主题 特征词 标签
1 司机(0.0113), 车(0.0096), 走(0.0061), 跑(0.0047), 飞机(0.0046), 开(0.0043), 看到(0.0036), 货车(0.0036), 机场(0.0030), 延误(0.0029) 车辆运输
2 转运(0.0129), 公司(0.0098), 价格(0.0069), 物流(0.0067), 买(0.0063), 服务(0.0058), 比较(0.0057),
亚马逊(0.0056), 东西(0.0055), 海淘(0.0054)
海淘
3 外卖(0.0155), 送(0.0122), 觉得(0.0080), 天气(0.0077), 工作(0.0073), 小哥(0.0056), 恶劣(0.0051),
不要(0.0050), 钱(0.0038), 辛苦(0.0037)
恶劣天气
外卖
4 问题(0.0080), 铁路(0.0050), 北京(0.0044), 回家(0.0040), 中国(0.0035), 社会(0.0033), 春运(0.0033),
不能(0.0028), 需要(0.0027), 高铁(0.0027)
春运铁路
5 物流(0.0091), 公司(0.0078), 企业(0.0073), 问题(0.0066), 采购(0.0056), 供应链(0.0055), 行业(0.0048),
成本(0.0048), 中国(0.0042), 管理(0.0042)
物流企业
6 快递(0.0404), 顺丰(0.0180), 快递员(0.0097), 公司(0.0082), 东西(0.0067), 送(0.0066), 寄(0.0065),
邮政(0.0063), 电话(0.0061), 打电话(0.0054)
快递服务
7 小时(0.0174), 火车(0.0126), 坐(0.0118), 硬座(0.0112), 吃(0.0077), 买(0.0071), 车厢(0.0058), 站(0.0052), 睡(0.0052), 时间(0.0050) 坐火车
内容推荐方法 N P R F
LDA 10 0.09 0.07 0.08
关联规则-LDA 0.12 0.10 0.11
LDA 20 0.13 0.09 0.11
关联规则-LDA 0.17 0.13 0.15
LDA 30 0.18 0.14 0.16
关联规则-LDA 0.24 0.19 0.21
LDA 40 0.22 0.20 0.21
关联规则-LDA 0.28 0.22 0.25
[1] 杨敏, 余小萍, 郑宏. 在线问答社区用户研究综述[J]. 图书馆学研究, 2014(14): 2-5.
[1] (Yang Min, Yu Xiaoping, Zheng Hong.Review of Online Q & A Community Users[J]. Library Science Research, 2014(14): 2-5.)
[2] 陈志明, 胡震云. UGC网站用户画像研究[J]. 计算机系统应用, 2017, 26(1): 24-30.
[2] (Chen Zhiming, Hu Zhenyun.User Portrait Study on UGC Website[J]. Computer Systems & Applications, 2017, 26(1): 24-30.)
[3] Shah C, Oh S, Oh J S.Research Agenda for Social Q&A[J]. Library& Information Science Research, 2009, 31(4): 205-209.
[4] Fan S X, Wang X L, Wang X, et al.Using Hybrid Kernel Method for Question Classification in CQA[C]//Proceedings of International Conference on Neural Information Processing. Berlin: Springer, 2011: 121-130.
[5] Qu B, Cong G, Li C P, et al.An Evaluation of Classification Models for Question Topic Categorization[J]. Journal of the American Society for Information Science and Technology, 2012, 63(5): 889-903.
doi: 10.1002/asi.v63.5
[6] Chan W, Yang W, Tang J, et al.Community Question Topic Categorization via Hierarchical Kernelized Classification[C]// Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. ACM, 2013: 959-968.
[7] 田野, 张静蓓. 基于词袋模型的关联数据融合算法改进研究[J]. 图书馆杂志, 2016(12): 17-22.
[7] (Tian Ye, Zhang Jingbei.Improvement of Linked Data Fusion Algorithm Based on Bag of Words[J]. Journal of Library Science, 2016(12): 17-22.)
[8] 李湘东, 霍亚勇, 张娇. 基于LDA主题模型的图书网页书目信息提取研究[J]. 情报科学, 2016, 34(1): 34-37.
[8] (Li Xiangdong, Huo Yayong, Zhang Jiao.Bibliographic Information Extraction Research of Book Pages Based on the LDA Theme Model[J]. Journal of Information Science, 2016, 34(1): 34-37.)
[9] Cai L, Zhou G, Liu K, et al.Large-scale Question Classification in CQA by Leveraging Wikipedia Semantic Knowledge[C]//Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM, 2011: 1321-1330.
[10] Chang S, Pal A.Routing Questions for Collaborative Answering in Community Question Answering[C]// Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. ACM, 2013: 494-501.
[11] Wu H, Wu W, Zhou M, et al.Improving Search Relevance for Short Queries in Community Question Answering[C]// Proceedings of the 7th ACM International Conference on Web Search and Data Mining. ACM, 2014: 43-52.
[12] Paul S A, Hong L, Chi E H.Who is Authoritative? Understanding Reputation Mechanisms in Quora[C]// Proceedings of Collective Intelligence, 2012.
[13] Chen Y, Ho T, Kim Y.Knowledge Market Design: A Field Experiment at Google Answers[J]. Journal of Public Economic Theory, 2010, 12(4): 641-664.
doi: 10.1111/jpet.2010.12.issue-4
[14] Gazan R.Seekers, Sloths and Social Reference: Homework Questions Submitted to a Question-Answering Community[J]. New Review of Hypermedia and Multimedia, 2007, 13(2): 239-248.
doi: 10.1080/13614560701711917
[15] 黄维, 赵鹏. 虚拟社区用户知识共享行为影响因素研究[J]. 情报科学, 2016, 34(4): 68-73.
[15] (Huang Wei, Zhao Peng.Exploring Influencing Factors of User Knowledge Sharing Behavior in Virtual Communities[J]. Journal of Information Science, 2016, 34(4): 68-73.)
[16] Jurczyk P, Agichtein E.Discovering Authorities in Question Answer Communities Using Link Analysis[C]//Proceedings of the 16th ACM International Conference on Information and Knowledge Management. ACM, 2007: 919-922.
[17] Gazan R.Microcollaborations in a Social Q&A Community[J]. Information Processing and Management, 2010, 46(6): 693-702.
doi: 10.1016/j.ipm.2009.10.007
[18] 马炎. 一种自适应的协作过滤图书推荐系统研究[J]. 情报杂志, 2008, 27(5):105-106,109.
doi: 10.3969/j.issn.1002-1965.2008.05.033
[18] (Ma Yan.Research on the Adaptive Collaborative Filtering Recommendation System[J]. Journal of Information, 2008, 27(5): 105-106, 109.)
doi: 10.3969/j.issn.1002-1965.2008.05.033
[19] IJntema W, Goossen F, Frasincar F, et al.Ontology-Based News Recommendation[C]//Proceedings of the 1st International Workshop on Data Semantics, Switzerland. 2010: 22-26.
[20] Wu H, Yue K, Pei Y, et al.Collaborative Topic Regression with Social Trust Ensemble for Recommendation in Social Media Systems[J]. Knowledge-Based Systems, 2016, 97(1): 111-122.
doi: 10.1016/j.knosys.2016.01.011
[21] Kim Y,Shim K.TWILITE: A Recommendation System for Twitter Using a Probabilistic Model Based on Latent Dirichlet Allocation[J]. Information Systems, 2014, 42(3): 59-77.
doi: 10.1016/j.is.2013.11.003
[22] Ramage D, Hall D, Nallapati R, et al.Labeled LDA: A Supervised Topic Model for Credit Attribution in Multi-labeled Corpora[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore. ACL, 2009: 248-256.
[23] Wang X, McCallum A. Topics over Time: A Non-Markov Continuous-time Model of Topical Trends[C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2006: 424-433.
[24] Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[25] 艾丹祥, 张玉峰, 左晖, 等. 面向C2C在线情景的一种个性化三维推荐方法[J]. 情报学报, 2016, 35(6): 651-663.
[25] (Ai Danxiang, Zhang Yufeng, Zuo Hui, et al.A Personalized Three-dimensional Recommendation Method for C2C Online Context[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(6): 651-663.)
[1] Cai Yongming,Liu Lu,Wang Kewei. Identifying Key Users and Topics from Online Learning Community[J]. 数据分析与知识发现, 2020, 4(6): 69-79.
[2] Ye Guanghui,Zeng Jieyan,Hu Jinglan,Bi Chongwu. Analyzing Public Sentiments from the Perspective of City Profiles[J]. 数据分析与知识发现, 2020, 4(4): 15-26.
[3] Li Tiejun,Yan Duanwu,Yang Xiongfei. Recommending Microblogs Based on Emotion-Weighted Association Rules[J]. 数据分析与知识发现, 2020, 4(4): 27-33.
[4] Pan Youneng,Ni Xiuli. Recommending Online Medical Experts with Labeled-LDA Model[J]. 数据分析与知识发现, 2020, 4(4): 34-43.
[5] Liu Yuwen,Wang Kai. Finding Geographic Locations of Popular Online Topics[J]. 数据分析与知识发现, 2020, 4(2/3): 173-181.
[6] Hongfei Ling,Shiyan Ou. Review of Automatic Labeling for Topic Models[J]. 数据分析与知识发现, 2019, 3(9): 16-26.
[7] Yunfei Shao,Dongsu Liu. Classifying Short-texts with Class Feature Extension[J]. 数据分析与知识发现, 2019, 3(9): 60-67.
[8] Mingxuan Huang,Shoudong Lu,Hui Xu. Cross-Language Information Retrieval Based on Weighted Association Patterns and Rule Consequent Expansion[J]. 数据分析与知识发现, 2019, 3(9): 77-87.
[9] Lixin Xia,Jieyan Zeng,Chongwu Bi,Guanghui Ye. Identifying Hierarchy Evolution of User Interests with LDA Topic Model[J]. 数据分析与知识发现, 2019, 3(7): 1-13.
[10] Peng Guan,Yuefen Wang,Zhu Fu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[11] Yong Zhang,Shuqing Li,Yongshang Cheng. Mining Algorithm for Weighted Association Rules Based on Frequency Effective Length[J]. 数据分析与知识发现, 2019, 3(7): 85-93.
[12] Linna Xi,Yongxiang Dou. Examining Reposts of Micro-bloggers with Planned Behavior Theory[J]. 数据分析与知识发现, 2019, 3(2): 13-20.
[13] Jie Zhang,Junbo Zhao,Dongsheng Zhai,Ningning Sun. Patent Technology Analysis of Microalgae Biofuel Industrial Chain Based on Topic Model[J]. 数据分析与知识发现, 2019, 3(2): 52-64.
[14] Qinghong Zhong,Xiaodong Qiao,Yunliang Zhang,Mengjuan Weng. Cross-media Fusion Method Based on LDA2Vec and Residual Network[J]. 数据分析与知识发现, 2019, 3(10): 78-88.
[15] Junwan Liu,Zhixin Long,Feifei Wang. Finding Collaboration Opportunities from Emerging Issues with LDA Topic Model and Link Prediction[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn