Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (6): 117-122    DOI: 10.11925/infotech.2096-3467.2018.1209
Current Issue | Archive | Adv Search |
Classifying Baidu Encyclopedia Entries with User Behaviors
Zhenyu He(),Xiangxiang Dong,Qinghua Zhu
School of Information Management, Nanjing University, Nanjing 210023, China
Download: PDF (599 KB)   HTML ( 18
Export: BibTeX | EndNote (RIS)      

[Objective] This paper classifies Baidu encyclopedia entries based on users’ information behaviors, aiming to identify entries with high potential values. [Methods] We chose the usage and recognition levels as indicators, and proposed a new entry classification model base on Boston matrix and BP neural network. [Results] We classified the Baidu encyclopedia entries automatically with usage indicators and created development strategies for each category. Our new model correctly identified each entry’s category information. [Limitations] More research is needed to study the newly generated entries and features difficult to quantify. [Conclusions] This research proposed an effective method to automatically classify online encyclopedia entries.

Key wordsBaidu Encyclopedia Entry      Boston Matrix      BP Neural Network     
Received: 01 November 2018      Published: 15 August 2019

Cite this article:

Zhenyu He,Xiangxiang Dong,Qinghua Zhu. Classifying Baidu Encyclopedia Entries with User Behaviors. Data Analysis and Knowledge Discovery, 2019, 3(6): 117-122.

URL:     OR

[1] Stvilia B, Twidale M B, Smith L C, et al.Assessing Information Quality of a Community-based Encyclopedia[C]// Proceedings of International Conference on Information Quality, 2005: 442-454.
[2] Warncke-Wang M, Cosley D, Riedl J. Tell Me More: An Actionable Quality Model for Wikipedia[C]// Proceedings of the 9th International Symposium on Open Collaboration. ACM, 2013: Article No.8.
[3] Blumenstock J E.Size Matters: Word Count as a Measure of Quality on Wikipedia[C]// Proceedings of the 17th International Conference on World Wide Web, 2008: 1095-1096.
[4] Dalip D H, Gonçalves M A, Cristo M, et al.Automatic Quality Assessment of Content Created Collaboratively by Web Communities: A Case Study of Wikipedia[C]// Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries. ACM, 2009:295-304.
[5] Wöhner T, Peters R. Assessing the Quality of Wikipedia Articles with Lifecycle Based Metrics[C]// Proceedings of the 5th International Symposium on Wikis and Open Collaboration, Orlando, Florida, USA.2009: Article No.16.
[6] Wang S, Iwaihara M.Quality Evaluation of Wikipedia Articles Through Edit History and Editor Groups[C]// Proceedings of Asia-Pacific Web Conference on Web Technologies and Applications. Springer-Verlag, 2011:188-199.
[7] Xu Y, Luo T.Measuring Article Quality in Wikipedia: Lexical Clue Model[C]// Proceedings of the 2011 3rd Symposium on Web Society. IEEE, 2011:141-146.
[8] Suzuki Y, Yoshikawa M. Mutual Evaluation of Editors and Texts for Assessing Quality of Wikipedia Articles[C]// Proceedings of the 8th Annual International Symposium on Wikis and Open Collaboration. ACM, 2012: Article No.18.
[9] Ferretti E, Fusilier D H, Cabrera R G, et al.On the Use of PU Learning for Quality Flaw Prediction in Wikipedia[C]// Proceedings of the CLEF 2012 Evaluation Labs and Workshop, 2012.
[10] Dalip D H, Goncalves M A, Cardoso T, et al.A Multi-view Approach for the Quality Assessment of Wiki Articles[J]. Journal of Information & Data Management, 2012, 3(1):73-82.
[11] Flekova L, Ferschke O, Gurevych I.What Makes a Good Biography? Multidimensional Quality Analysis Based on Wikipedia Article Feedback Data[C]// Proceedings of the International Conference on World Wide Web. ACM, 2014:855-866.
[12] Dang Q V, Ignat C L. Quality Assessment of Wikipedia Articles: A Deep Learning Approach by Quang Vinh Dang and Claudia-Lavinia Ignat with Martin Vesely as Coordinator[J]. ACM SIGWEB Newsletter, 2016, 5: Article No.5.
[13] Khairova N, Lewoniewski W, Wecel K.Estimating the Quality of Articles in Russian Wikipedia Using the Logical-Linguistic Model of Fact Extraction[C]// Proceedings of the International Conference on Business Information Systems(BIS 2017). 2017:28-40.
[14] Shen A L, Qi J Z, Baldwin T.A Hybrid Model for Quality Assessment of Wikipedia Articles[C] // Proceedings of Australasian Language Technology Association Workshop, 2017:43-52.
[15] 裘江南, 翁楠, 徐胜国. 基于C4.5的维基百科页面信息质量评价模型研究[J]. 情报学报, 2012, 31(12):1259-1264.
[15] (Qiu Jiangnan, Weng Nan, Xu Shengguo.Research on Evaluation Model for the Information Quality of Wikipedia Articles Based on C4.5[J]. Journal of the China Society for Scientific and Technical Information, 2012, 31(12): 1259-1264.)
[16] 肖奎, 李兵, 吴天吉. 基于用户行为分析的维基百科词条质量评价方法[J]. 情报杂志, 2015,34(5): 185-189.
[16] (Xiao Kui, Li Bing, Wu Tianji.Detection of Article Qualities in Wikipedia Based on Analysis of User Behaviors[J]. Journal of Intelligence, 2015,34(5): 185-189.)
[17] 袁彬悠, 吕红波. 波士顿矩阵应用扩展研究[J]. 经营与管理, 2012(6):85-89.
[17] (Yuan Binyou, Lv Hongbo.Boston Matrix Application Extension Study[J]. Management and Administration, 2012(6):85-89.)
[18] Karsoliya S.Approximating Number of Hidden Layer Neurons in Multiple Hidden Layer BPNN Architecture[J]. International Journal of Engineering Trends & Technology, 2012, 3(6):714-717.
[1] Wu Yuying,Sun Ping,He Xijun,Jiang Guorui. Predicting Transactions Among Agents in Patent Transfer Weighted Networks for New Energy[J]. 数据分析与知识发现, 2018, 2(11): 73-79.
[2] Yan Jing,Bi Qiang,Li Jie,Wang Fu. Construction of Aggregation Quality Predicting Model for Digital Resource in Library ——Based on Improved Genetic Algorithm and BP Neural Network[J]. 数据分析与知识发现, 2017, 1(12): 49-62.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938