Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (2): 78-84    DOI: 10.11925/infotech.1003-3513.2015.02.11
Current Issue | Archive | Adv Search |
A Parallel Naive Bayesian Network Public Opinion Fast Classification Algorithm Based on Hadoop Platform
Ma Bin1,2,3, Yin Lifeng1
1. Department of Information Science and Technology, Shandong University of Political Science and Law, Ji'nan 250014, China;
2. Key Laboratory of Forensic Evidence in Shandong Province, Ji'nan 250014, China;
3. School of Electrical Engineering, Shandong University, Ji'nan 250061, China
Download: PDF(539 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] A new Network Public Opinion (NPO) classification method based on parallel Naive Bayesian Classification Algorithm (NBCA) in Hadoop environment is proposed. [Context] The NPO are high-volume, high-distribution and high-variety information assets, thus the accurate and fast classification is difficult to achieve. [Methods] According to the distributed storage and parallel processing features of Hadoop platform, the NBCA is parallel encapsulated and the NPO documents are locally stored under HDFS frame and parallel classified in MapReduce process. [Results] The performance of MapReduce packaged parallel NBCA is testified and the results show that the execution efficiency of proposed algorithm improves 82% compared to centralized method and its classification accuracy rate arrives more than 85%. [Conclusions] The proposed algorithm can effectively improve the NPO classification efficiency and ability.

Key wordsNetwork Public Opinion      Hadoop      MapReduce      Naive Bayes      Classification     
Received: 27 June 2014      Published: 17 March 2015
:  TP391.1  

Cite this article:

Ma Bin, Yin Lifeng. A Parallel Naive Bayesian Network Public Opinion Fast Classification Algorithm Based on Hadoop Platform. New Technology of Library and Information Service, 2015, 31(2): 78-84.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2015.02.11     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2015/V31/I2/78

[1] 王珊, 王会举, 覃雄派, 等. 架构大数据: 挑战、现状与展 望[J]. 计算机学报, 2011, 34(10): 1742-1752. (Wang Shan, Wang Huiju, Qin Xiongpai, et al. Architecting Big Data: Challenges, Studies and Forecasts [J]. Chinese Journal of Computers, 2011, 34(10): 1741-1752.)
[2] 人民网. 2013 年中国互联网舆情分析报告[EB/OL]. [2014-03-18]. http://yuqing.people.com.cn/n/2014/0318/c364391-24662668.html. (People.cn. 2013 Report of Internet Public Opinion Analysis [EB/OL]. [2014-03-18]. http://yuqing. people.com.cn/n/2014/0318/c364391-24662668.html.)
[3] 王昊, 叶鹏, 邓三鸿. 机器学习在中文期刊论文自动分类 研究中的应用[J]. 现代图书情报技术, 2014(3): 80-87. (Wang Hao, Ye Peng, Deng Sanhong. The Application of Machine-Learning in the Research on Automatic Categorization of Chinese Periodical Articles [J]. New Technology of Library and Information Service, 2014(3): 80-87.)
[4] 郭岩, 刘春阳, 余智华, 等. 网络舆情信息源影响力的评 估研究[J]. 中文信息学报, 2011, 25(3): 64-71. (Guo Yan, Liu Chunyang, Yu Zhihua, et al. Research on the Impact Evaluation of Web Information Sources of Public Opinion [J]. Journal of Chinese Information Processing, 2011, 25(3): 64-71.)
[5] 兰月新. 突发事件网络衍生舆情监测模型研究[J]. 现代图 书情报技术, 2013(3): 51-57. (Lan Yuexin. Research on Monitoring Model of Public Opinion Derived for Network Emergencies [J]. New Technology of Library and Information Serviece, 2013(3): 51-57.)
[6] Dave K, Lawrence S, Pennock D M. Mining the Peanut Gallery: Opinion Extraction and Sentiment Classification of Product Reviews [C]. In: Proceedings of the 12th International Conference on World Wide Web. New York: ACM, 2003: 519-528.
[7] Allan J, Lavrenko V, Swan R. Explorations within Topic Tracking and Detection [A]. //Topic Detection and Tracking[M]. Springer US, 2002: 197-224.
[8] 天玑舆情监测服务平台. [EB/OL]. [2014-08-20]. http://www. int-yt.com/product/441.jhtml. (Golaxy: Public Opinion Monitoring Platform [EB/OL]. [2014-08-20]. http://www.int-yt.com/product/441. jhtml.)
[9] 方正智思互联网信息监控分析系统 [EB/OL]. [2014-06-18]. http://www.founder.com.cn/BigData/ProductIntroduction/index/show/692/. (Founder Wise Internet Information Monitoring and Analysis System [EB/OL]. [2014-06-18]. http://www.founder. com.cn/BigData/ProductIntroduction/index/show/692/.)
[10] 网络舆情及其分析技术-乐思网络舆情监测系统[EB/OL].[2014-12-02]. http://www.knowlesys.cn/product_webmonitor_ index.html. (Internet Public Opinion and Analysis Technology-Knowlesys Network Public Opinion Monitoring System[EB/OL]. [2014-12-02]. http://www.knowlesys.cn/product_ webmonitor_index.html.)
[11] Ma B. Experimental Research of Image Digital Watermark Based on DWT Technology [C]. In: Proceedings of International Conference on Uncertainty Reasoning and Knowledge Engineering, Bali, Indonesia. IEEE, 2011: 9-12.
[12] Ortigosa A, Carro R M, Quiroga J I. Predicting User Personality by Mining Social Interactions in Facebook [J]. Journal of Computer and System Sciences, 2014, 80(1): 57-71.
[13] 杨阳, 向阳, 熊磊. 基于矩阵分解与用户近邻模型的协同 过滤推荐算法[J]. 计算机应用, 2012, 32(2): 395-398. (Yang Yang, Xiang Yang, Xiong Lei. Collaborative Filtering and Recommendation Algorithm Based on Matrix Factorization and User Nearest Neighbor Model [J]. Journal of Computer Applications, 2012, 32(2): 395-398.)
[14] 杨超, 冯时, 王大玲, 等. 基于情感词典扩展技术的网络 舆情倾向性分析[J]. 小型微型计算机系统, 2010, 31(4): 691-695. (Yang Chao, Feng Shi, Wang Daling, et al. Analysis on Web Public Opinion Orientation Based on Extending Sentimental Lexicon [J]. Journal of Chinese Computer Systems, 2010, 31(4): 691-695.)
[15] 新华网.网络舆情参考(周报) [R/OL]. [2013-11-08]. http://www.xinhuanet.com/yuqing/zhuanti/03.htm. (Xinhua. net. Network Public Opinion Reference (Weekly) [R/OL]. [2013-11-08]. http://www.xinhuanet.com/yuqing/zhuanti/03.htm.)

[1] Ru Li,Rui Li,Jie Jiang,Huayi Wu. Spatio-Temporal Characteristics of WMTS Access Sessions[J]. 数据分析与知识发现, 2019, 3(6): 1-11.
[2] Cheng Zhou,Hongqin Wei. Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine[J]. 数据分析与知识发现, 2019, 3(5): 117-124.
[3] Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[4] Qingqing Zhang,Xingshi He,Huimin Wang,Shengjun Meng. Text Sentiment Classification Based on Deep Belief Network[J]. 数据分析与知识发现, 2019, 3(4): 71-79.
[5] Lianjie Xiao,Mengrui Gao,Xinning Su. An Under-sampling Ensemble Classification Algorithm Based on Fuzzy C-Means Clustering for Imbalanced Data[J]. 数据分析与知识发现, 2019, 3(4): 90-96.
[6] Sisi Gui,Wei Lu,Xiaojuan Zhang. Temporal Intent Classification with Query Expression Feature[J]. 数据分析与知识发现, 2019, 3(3): 66-75.
[7] Xiang Xue,Yuxiang Zhao. Exploring User Mental Models of Online Music Classification System: Case Study of College Students[J]. 数据分析与知识发现, 2019, 3(2): 1-12.
[8] Zixuan Zhang,Hao Wang,Liping Zhu,Sanhong eng. Identifying Risks of HS Codes by China Customs[J]. 数据分析与知识发现, 2019, 3(1): 72-84.
[9] Hui Li,Yaqing Chai. Fine-Grained Sentiment Analysis Based on Convolutional Neural Network[J]. 数据分析与知识发现, 2019, 3(1): 95-103.
[10] Jiehua Wu,Jing Shen,Bei Zhou. Classifying Multilayer Social Network Links Based on Transfer Component Analysis[J]. 数据分析与知识发现, 2018, 2(9): 88-99.
[11] Xinlei Li,Hao Wang,Xiaomin Liu,Sanhong Deng. Comparing Text Vector Generators for Weibo Short Text Classification[J]. 数据分析与知识发现, 2018, 2(8): 41-50.
[12] Longjia Jia,Bangzuo Zhang. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
[13] Lin Li,Hui Li. Computing Text Similarity Based on Concept Vector Space[J]. 数据分析与知识发现, 2018, 2(5): 48-58.
[14] Tingxin Wen,Yangzi Li,Jingshuang Sun. Extracting Text Features with Improved Fruit Fly Optimization Algorithm[J]. 数据分析与知识发现, 2018, 2(5): 59-69.
[15] Cuiqing Jiang,Kailun Song,Yong Ding,Yao Liu. Identifying Potential Customers Based on User-Generated Contents[J]. 数据分析与知识发现, 2018, 2(3): 1-8.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn