Please wait a minute...
Advanced Search
数据分析与知识发现  2019, Vol. 3 Issue (6): 117-122     https://doi.org/10.11925/infotech.2096-3467.2018.1209
  应用论文 本期目录 | 过刊浏览 | 高级检索 |
基于用户使用行为视角的百度百科词条分类研究*
何振宇(),董祥祥,朱庆华
南京大学信息管理学院 南京 210023
Classifying Baidu Encyclopedia Entries with User Behaviors
Zhenyu He(),Xiangxiang Dong,Qinghua Zhu
School of Information Management, Nanjing University, Nanjing 210023, China
全文: PDF (599 KB)   HTML ( 20
输出: BibTeX | EndNote (RIS)      
摘要 

目的】将用户使用行为作为百科词条分类依据, 找到并优化具有高使用价值与使用潜力的词条。【方法】结合国内外学者的研究成果, 选取用户使用程度与用户认可度作为研究指标, 基于波士顿矩阵和BP神经网络方法提出词条分类模型并进行自动分类。【结果】基于用户使用行为指标对词条做出分类并提出相应的发展策略; 自动分类方法可以准确判别单一词条所属的词条类别。【局限】对新生词条的研究不足, 未考虑丰富度、严谨性等难以准确量化的特征。【结论】拓展百科词条分类的新思路, 提出百科词条分类的新方法。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
何振宇
董祥祥
朱庆华
关键词 百度百科词条波士顿矩阵BP神经网络    
Abstract

[Objective] This paper classifies Baidu encyclopedia entries based on users’ information behaviors, aiming to identify entries with high potential values. [Methods] We chose the usage and recognition levels as indicators, and proposed a new entry classification model base on Boston matrix and BP neural network. [Results] We classified the Baidu encyclopedia entries automatically with usage indicators and created development strategies for each category. Our new model correctly identified each entry’s category information. [Limitations] More research is needed to study the newly generated entries and features difficult to quantify. [Conclusions] This research proposed an effective method to automatically classify online encyclopedia entries.

Key wordsBaidu Encyclopedia Entry    Boston Matrix    BP Neural Network
收稿日期: 2018-11-01      出版日期: 2019-08-15
基金资助:*本文系国家自然科学基金面上项目“协同视角下社会化搜索的形成机制与实现模式研究”(项目编号: 71473114)的研究成果之一
引用本文:   
何振宇,董祥祥,朱庆华. 基于用户使用行为视角的百度百科词条分类研究*[J]. 数据分析与知识发现, 2019, 3(6): 117-122.
Zhenyu He,Xiangxiang Dong,Qinghua Zhu. Classifying Baidu Encyclopedia Entries with User Behaviors. Data Analysis and Knowledge Discovery, 2019, 3(6): 117-122.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.1209      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2019/V3/I6/117
[1] Stvilia B, Twidale M B, Smith L C, et al.Assessing Information Quality of a Community-based Encyclopedia[C]// Proceedings of International Conference on Information Quality, 2005: 442-454.
[2] Warncke-Wang M, Cosley D, Riedl J. Tell Me More: An Actionable Quality Model for Wikipedia[C]// Proceedings of the 9th International Symposium on Open Collaboration. ACM, 2013: Article No.8.
[3] Blumenstock J E.Size Matters: Word Count as a Measure of Quality on Wikipedia[C]// Proceedings of the 17th International Conference on World Wide Web, 2008: 1095-1096.
[4] Dalip D H, Gonçalves M A, Cristo M, et al.Automatic Quality Assessment of Content Created Collaboratively by Web Communities: A Case Study of Wikipedia[C]// Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries. ACM, 2009:295-304.
[5] Wöhner T, Peters R. Assessing the Quality of Wikipedia Articles with Lifecycle Based Metrics[C]// Proceedings of the 5th International Symposium on Wikis and Open Collaboration, Orlando, Florida, USA.2009: Article No.16.
[6] Wang S, Iwaihara M.Quality Evaluation of Wikipedia Articles Through Edit History and Editor Groups[C]// Proceedings of Asia-Pacific Web Conference on Web Technologies and Applications. Springer-Verlag, 2011:188-199.
[7] Xu Y, Luo T.Measuring Article Quality in Wikipedia: Lexical Clue Model[C]// Proceedings of the 2011 3rd Symposium on Web Society. IEEE, 2011:141-146.
[8] Suzuki Y, Yoshikawa M. Mutual Evaluation of Editors and Texts for Assessing Quality of Wikipedia Articles[C]// Proceedings of the 8th Annual International Symposium on Wikis and Open Collaboration. ACM, 2012: Article No.18.
[9] Ferretti E, Fusilier D H, Cabrera R G, et al.On the Use of PU Learning for Quality Flaw Prediction in Wikipedia[C]// Proceedings of the CLEF 2012 Evaluation Labs and Workshop, 2012.
[10] Dalip D H, Goncalves M A, Cardoso T, et al.A Multi-view Approach for the Quality Assessment of Wiki Articles[J]. Journal of Information & Data Management, 2012, 3(1):73-82.
[11] Flekova L, Ferschke O, Gurevych I.What Makes a Good Biography? Multidimensional Quality Analysis Based on Wikipedia Article Feedback Data[C]// Proceedings of the International Conference on World Wide Web. ACM, 2014:855-866.
[12] Dang Q V, Ignat C L. Quality Assessment of Wikipedia Articles: A Deep Learning Approach by Quang Vinh Dang and Claudia-Lavinia Ignat with Martin Vesely as Coordinator[J]. ACM SIGWEB Newsletter, 2016, 5: Article No.5.
[13] Khairova N, Lewoniewski W, Wecel K.Estimating the Quality of Articles in Russian Wikipedia Using the Logical-Linguistic Model of Fact Extraction[C]// Proceedings of the International Conference on Business Information Systems(BIS 2017). 2017:28-40.
[14] Shen A L, Qi J Z, Baldwin T.A Hybrid Model for Quality Assessment of Wikipedia Articles[C] // Proceedings of Australasian Language Technology Association Workshop, 2017:43-52.
[15] 裘江南, 翁楠, 徐胜国. 基于C4.5的维基百科页面信息质量评价模型研究[J]. 情报学报, 2012, 31(12):1259-1264.
[15] (Qiu Jiangnan, Weng Nan, Xu Shengguo.Research on Evaluation Model for the Information Quality of Wikipedia Articles Based on C4.5[J]. Journal of the China Society for Scientific and Technical Information, 2012, 31(12): 1259-1264.)
[16] 肖奎, 李兵, 吴天吉. 基于用户行为分析的维基百科词条质量评价方法[J]. 情报杂志, 2015,34(5): 185-189.
[16] (Xiao Kui, Li Bing, Wu Tianji.Detection of Article Qualities in Wikipedia Based on Analysis of User Behaviors[J]. Journal of Intelligence, 2015,34(5): 185-189.)
[17] 袁彬悠, 吕红波. 波士顿矩阵应用扩展研究[J]. 经营与管理, 2012(6):85-89.
[17] (Yuan Binyou, Lv Hongbo.Boston Matrix Application Extension Study[J]. Management and Administration, 2012(6):85-89.)
[18] Karsoliya S.Approximating Number of Hidden Layer Neurons in Multiple Hidden Layer BPNN Architecture[J]. International Journal of Engineering Trends & Technology, 2012, 3(6):714-717.
[1] 程铁军, 王曼, 黄宝凤, 冯兰萍. 基于CEEMDAN-BP模型的突发事件网络舆情预测研究*[J]. 数据分析与知识发现, 2021, 5(11): 59-67.
[2] 武玉英, 孙平, 何喜军, 蒋国瑞. 新能源领域专利转让加权网络中主体间技术交易机会预测*[J]. 数据分析与知识发现, 2018, 2(11): 73-79.
[3] 闫晶, 毕强, 李洁, 王福. 图书馆数字资源聚合质量预测模型构建*——基于改进遗传算法和BP神经网络[J]. 数据分析与知识发现, 2017, 1(12): 49-62.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn