Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (2): 106-115    DOI: 10.11925/infotech.2096-3467.2020.0395
Current Issue | Archive | Adv Search |
Analyzing Knowledge Demand and Supply of Community Question Answering with TF-PIDF
Li Ming1(),Li Ying1,Zhou Qing1,Wang Jun2
1School of Economics and Management, China University of Petroleum-Beijing, Beijing 102249, China
2School of Economics and Management, Beihang University, Beijing 100191, China
Download: PDF (998 KB)   HTML ( 14
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper propose a new method to study the knowledge demand and supply of community question answering, aiming to make effective targeted interventions. [Methods] First, we constructed novel word weight calculation models (TF-PIDF) for the questions and answers. Then, we obtained the main categories of demanded and supplied knowledge by clustering questions and answers, as well as the popularity of topics. Third, we paired the categories of knowledge demand and their supply counterparts. Fourth, we proposed an algorithm to calculate the popularity of knowledge demands. [Results] The proposed model was examined with topis on influenza from the community of ZHIHU. We found six categories of topics for knowledge demand and supply. The trending one was “epidemic”, which represented the most popular real time needs. [Limitations] The identified topics rely on the topic meaning from feature word clustering. [Conclusions] The proposed method could effectively manage the knowledge demand and supply of community question answering.

Key wordsCommunity Questions and Answers      Knowledge Demand      Knowledge Supply      Knowledge Management     
Received: 07 May 2020      Published: 11 March 2021
ZTFLH:  TP393  
Fund:National Natural Science Foundation of China(71571191);National Natural Science Foundation of China(71871005);National Natural Science Foundation of China(91646122)
Corresponding Authors: Li Ming ORCID:0000-0001-8732-8217     E-mail: limingzyq@cup.edu.cn

Cite this article:

Li Ming, Li Ying, Zhou Qing, Wang Jun. Analyzing Knowledge Demand and Supply of Community Question Answering with TF-PIDF. Data Analysis and Knowledge Discovery, 2021, 5(2): 106-115.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0395     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I2/106

Research Framework
Heat and Coverage Distribution of Knowledge Demand
类别序号 类别主题 主题特征词
1 疫情 新冠,流行病,病毒,流行,疫苗,美国,肺炎,新型冠状病毒,传染,死亡,严重,药,防,疫苗,医
2 症状 急性,呼吸道,肺炎,发烧,咳嗽,严重,症状,流鼻涕,痛,感觉,传染,高烧,时期,传播,综合征
3 医疗 医学,健康,病毒,医生,疾病,药,治疗,针,症状,抗生素,吃药,医院,传染,宝宝,免疫力
4 疫苗 疫苗,接种,幼儿,免疫,孩子,医院,健康,生产,病毒,生物,预防,研发,价,批次,抗体
5 病毒 新冠,病毒,冠状病毒,肺炎,变异,人类,生物,传播,亚型,病毒学,感染,致死,禽类,鸡,免疫
6 预防 戴口罩,疫苗,一次性,预防,感染,肺炎,防霾,医用外科,疫病,无纺布,传染,呼吸,疾控,病毒,购买
Category Keywords of Knowledge Demand (Question) Clustering under Influenza Topic
类别序号 类别主题 主题特征词
1 疫情 流行,病毒,新冠,爆发,美国,疫苗,感染,死亡,中国,传染,西班牙,肺炎,药,抗体,预防
2 症状 症状,发烧,肺炎,重,病毒,咳嗽,医院,鼻塞,呼吸道,流鼻涕,病毒性,免疫力,传染,孩子,轻
3 医疗 护理,降温,孩子,医疗,措施,家庭,体温计,药,医院,住院,治疗,特效药,检测,传染,自限性
4 疫苗 疫苗,病毒,预测,接种,时间,宝宝,三价,生产,卫生,中国,预防,注射,乙型,抗原,毒株
5 病毒 病毒,变异,冠状病毒,导致,猪,预防,飞沫,抗体,人类,传播,死亡,宿主,禽,免疫系统,接触
6 预防 预防,定期,疫苗,西医,多喝水,清淡,雾化,建议,效果,缓解,口罩,大夫,意识,重灾区,被动
Category Keywords of Knowledge Supply (Answer) Clustering under Influenza Topic
Number of Questions of Knowledge Demand Categories under Influenza Topic
Number of Answers of Knowledge Supply Categories under Influenza Topic
Category Heat of Knowledge Demand of Influenza Topic
Category Heat of Knowledge Supply of Influenza Topic
类别
序号
知识需求类别 主题数量 知识供应主题群
1 疫情 6 疫情高发地区,流感病毒类型,感染及死亡情况,疫情应对措施,预防方法,诊疗措施
2 症状 2 流感症状,流感与普通感冒判别方法
3 医疗 3 治疗手段,疫苗研发,特效药研发
4 疫苗 4 疫苗研发,接种,作用机理,副作用
5 病毒 4 病毒类型及结构,感染机理,变异情况,易感人群
6 预防 2 预防措施,预防效果
Topic Distribution of Knowledge Supply for Specific Knowledge Demand of Influenza Topic
类别序号 知识需求类别 覆盖度 排序
1 疫情 0.406 82 5
2 症状 0.367 38 6
3 医疗 0.448 06 2
4 疫苗 0.434 28 4
5 病毒 0.435 71 3
6 预防 0.466 33 1
Coverage of Knowledge Supply to Knowledge Demand of Influenza Topic
Distribution of Heat and Coverage of Influenza Knowledge Demand
[1] 张璐, 张鹏翼. 线上线下社会资本与社会化问答行为的关系研究——以知乎医学和健康话题为例[J]. 图书情报工作, 2017,61(17):84-90.
[1] ( Zhang Lu, Zhang Pengyi. The Relationship Between Online/Offline Social Capital and User Behavior in Social Q&A: The Case of Medical and Health Topics in Zhihu[J]. Library and Information Service, 2017,61(17):84-90.)
[2] Liu J W, Shen H Y, Yu L. Question Quality Analysis and Prediction in Community Question Answering Services with Coupled Mutual Reinforcement[J]. IEEE Transactions on Services Computing, 2017,10(2):286-301.
doi: 10.1109/TSC.2015.2446991
[3] 知乎用户数已超过2.2亿[EB/OL]. [2020-01-01]. https://baijiahao.baidu.com/s?id=1619808932395757021&wfr=spider&for=pc.
[3] (Zhihu Users has Exceeded 220 Million[EB/OL]. [2020-01-01]. https://baijiahao.baidu.com/s?id=1619808932395757021&wfr=spider&for=pc.
[4] 姚乐野, 范炜. 突发事件应急管理中的情报本征机理研究[J]. 图书情报工作, 2014,58(23):6-11.
[4] ( Yao Leye, Fan Wei. Study on Intrinsic Mechanism of Intelligence in Emergency Management[J]. Library and Information Service, 2014,58(23):6-11.)
[5] 徐鹏, 张聃. 网络问答社区知识分享动机探究——社会交换论的视角[J]. 图书情报知识, 2018(2):105-112.
[5] ( Xu Peng, Zhang Dan. A Research on the Motivation of Knowledge Sharing in Online Q&A Community: From the Perspective of Social Exchange Theory[J]. Document, Information & Knowledge, 2018(2):105-112.)
[6] Roy P K, Ahmad Z, Singh J P, et al. Finding and Ranking High-Quality Answers in Community Question Answering Sites[J]. Global Journal of Flexible Systems Management, 2018,19(1):53-68.
[7] Figueroa A. Automatically Generating Effective Search Queries Directly from Community Question-Answering Questions for Finding Related Questions[J]. Expert Systems with Applications, 2017,77:11-19.
doi: 10.1016/j.eswa.2017.01.041
[8] Li M, Li Y, Lou W Q, et al. A Hybrid Recommendation System for Q&A Documents[J]. Expert Systems with Applications, 2020,144:113088.
doi: 10.1016/j.eswa.2019.113088
[9] Fu C G. User Intimacy Model for Question Recommendation in Community Question Answering[J]. Knowledge-Based Systems, 2020,188:104844.
doi: 10.1016/j.knosys.2019.07.015
[10] 陶兴, 张向先, 郭顺利, 等. 学术问答社区用户生成内容的W2V-MMR自动摘要方法研究[J]. 数据分析与知识发现, 2020,4(4):109-118.
[10] ( Tao Xing, Zhang Xiangxian, Guo Shunli, et al. Automatic Summarization of User-Generated Content in Academic Q&A Community Based on Word2Vec and MMR[J]. Data Analysis and Knowledge Discovery, 2020,4(4):109-118.)
[11] Deng Y, Lam W, Xie Y X, et al. Joint Learning of Answer Selection and Answer Summary Generation in Community Question Answering[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020: 7651-7658.
[12] Cheng X, Zhu S G, Su S, et al. A Multi-Objective Optimization Approach for Question Routing in Community Question Answering Services[J]. IEEE Transactions on Knowledge and Data Engineering, 2017,29(9):1779-1792.
doi: 10.1109/TKDE.2017.2696008
[13] Liu D R, Chen Y H, Kao W C, et al. Integrating Expert Profile, Reputation and Link Analysis for Expert Finding in Question-Answering Websites[J]. Information Processing & Management, 2013,49(1):312-329.
doi: 10.1016/j.ipm.2012.07.002
[14] Kundu D, Pal R K, Mandal D P. Preference Enhanced Hybrid Expertise Retrieval System in Community Question Answering Services[J]. Decision Support Systems, 2020,129:113164.
doi: 10.1016/j.dss.2019.113164
[15] Mumtaz S, Rodriguez C, Benatallah B. Expert2Vec: Experts Representation in Community Question Answering for Question Routing[C]//Proceedings of International Conference on Advanced Information Systems Engineering. Springer, Cham, 2019: 213-229.
[16] 唐晓波, 李新星. 社会化问答社区知识共享机制的系统动力学仿真研究[J]. 情报科学, 2018,36(3):125-129.
[16] ( Tang Xiaobo, Li Xinxing. Research on System Dynamics Simulation of Knowledge Sharing Mechanism in Social Q&A Community[J]. Information Science, 2018,36(3):125-129.)
[17] 陈星, 张星, 曾淑云, 等. 健康问答社区中知识分享意愿的影响因素研究[J]. 现代情报, 2017,37(4):62-71.
[17] ( Chen Xing, Zhang Xing, Zeng Shuyun, et al. The Factors of Knowledge Sharing Intention in the Health Q&A Communities[J]. Journal of Modern Information, 2017,37(4):62-71.)
[18] 吴雅威, 张向先, 陶兴, 等. 基于用户感知的学术问答社区答案质量评价指标构建[J]. 情报科学, 2020,38(10):141-147.
[18] ( Wu Yawei, Zhang Xiangxian, Tao Xing, et al. Construction of Answer Quality Evaluation Index Based on User Perception of Academic Question and Answer Community[J]. Information Science, 2020,38(10):141-147.)
[19] 郭顺利, 张向先, 陶兴, 等. 社会化问答社区用户生成答案质量自动化评价研究——以“知乎”为例[J]. 图书情报工作, 2019,63(11):118-130.
[19] ( Guo Shunli, Zhang Xiangxian, Tao Xing, et al. Research on Automated Evaluation of User Generated Answer Quality in Social Question and Answer Community——Taking “Zhihu” as an Example[J]. Library and Information Service, 2019,63(11):118-130)
[20] 王伟, 冀宇强, 王洪伟, 等. 中文问答社区答案质量的评价研究:以知乎为例[J]. 图书情报工作, 2017,61(22):36-44.
[20] ( Wang Wei, Ji Yuqiang, Wang Hongwei, et al. Evaluating Chinese Answers’ Quality in the Community QA System: A Case Study of Zhihu[J]. Library and Information Service, 2017,61(22):36-44.)
[21] Bun K K, Ishizuka M. Topic Extraction from News Archive Using TF* PDF Algorithm[C]//Proceedings of the 3rd International Conference on Web Information Systems Engineering. IEEE, 2002: 73-82.
[22] Trstenjak B, Mikac S, Donko D. KNN with TF-IDF Based Framework for Text Categorization[J]. Procedia Engineering, 2014,69:1356-1364.
doi: 10.1016/j.proeng.2014.03.129
[23] 谭晋秀, 何跃 . 基于 K-Means 文本聚类的新浪微博个性化博文推荐研究[J]. 情报科学, 2016,34(4):74-79.
[23] ( Tan Jinxiu, He Yue. Study on SINA Microblog Personalized Recommendation Based on K-Means Text Clustering[J]. Information Science, 2016,34(4):74-79.)
[24] 魏建香, 刘怀, 苏新宁. 基于遗传算法的文档聚类算法的设计与仿真[J]. 南京大学学报(自然科学版), 2009,45(3):432-438.
[24] ( Wei Jianxiang, Liu Huai, Su Xinning. Design and Simulation of a Document Clustering Algorithm Based on Genetic Algorithm[J]. Journal of Nanjing University (Natural Sciences), 2009,45(3):432-438.)
[25] Schütze H, Manning C D, Raghavan P. Introduction to Information Retrieval[M]. Cambridge: Cambridge University Press, 2008.
[1] Wang Yuefen,Fu Zhu,Wu Peng. Tech-Framework for Semantic Knowledge Management in Conceptual Design[J]. 数据分析与知识发现, 2018, 2(2): 2-10.
[2] Fu Zhu,Jiang Yuxing,Wang Yuefen. Modeling Conceptual Design Process for Dynamic Knowledge Management and Reuse[J]. 数据分析与知识发现, 2018, 2(2): 20-28.
[3] Wu Jiang,Chen Jun,Zhang Jinfan. A Knowledge Supply-Demand Simulation System for Collaborative Innovation[J]. 现代图书情报技术, 2016, 32(9): 27-33.
[4] Chen Guo, Hu Changping. Research on the Structural Features of Keyword Network of Scientific Research Areas:An Empirical Study of LIS[J]. 现代图书情报技术, 2014, 30(7): 84-91.
[5] Zhang Xiaolin. Trends and Challenges for Institutional Repositories[J]. 现代图书情报技术, 2014, 30(2): 1-7.
[6] Song Wen, Huang Jinxia, Liu Yi, Tang Yijie. SKE Key Technologies and Services for Knowledge Discovery[J]. 现代图书情报技术, 2012, 28(7): 13-18.
[7] Zhang Xing, Cai Shuqin, Xia Huosong, Hou Delin. Study on the Framework of an Enterprise's Knowledge Management System Based on Social Network[J]. 现代图书情报技术, 2011, 27(5): 36-41.
[8] Dou Yumeng. Review on Tag Meaning Disambiguation Methods Based on Web Collaborative Tagging[J]. 现代图书情报技术, 2010, 26(3): 27-32.
[9] Dou Yumeng,Zhao Danqun. An Overview of Research on Collaborative Tagging System[J]. 现代图书情报技术, 2009, 3(2): 9-17.
[10] Zhou Ning,Wang Miao. Research on Special Domain Oriented Knowledge Management Model Based on MUDs[J]. 现代图书情报技术, 2008, 24(5): 33-38.
[11] Liang Yong,Zhang Chengzhi,Wang Hao. Construction Periodical K-Map Based on CSSCI[J]. 现代图书情报技术, 2008, 24(2): 59-63.
[12] Wang Yuefen,Wang Haidan. Study on Method and Process of the Human Intelligence Network Construction Based on KM[J]. 现代图书情报技术, 2007, 2(9): 12-17.
[13] Hui Guangping,Wu Shangjun,Pu Xiaobin. Study on Demonstration of the Human Intelligence Network Construction Based on KM[J]. 现代图书情报技术, 2007, 2(9): 23-27.
[14] Ding Shengchun,Pu Xiaobin,Wang Xuefen. Design and Implementation of Experimental System of the Human Intelligence Network Based on KM[J]. 现代图书情报技术, 2007, 2(9): 18-22.
[15] Wang Weijun,Xiong Rui,Cheng Jiangdong. Constructing Web2.0-based Knowledge Management Platform by DotNetNuke[J]. 现代图书情报技术, 2007, 2(7): 41-45.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn