Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (2): 78-84    DOI: 10.11925/infotech.1003-3513.2015.02.11
Current Issue | Archive | Adv Search |
A Parallel Naive Bayesian Network Public Opinion Fast Classification Algorithm Based on Hadoop Platform
Ma Bin1,2,3, Yin Lifeng1
1. Department of Information Science and Technology, Shandong University of Political Science and Law, Ji'nan 250014, China;
2. Key Laboratory of Forensic Evidence in Shandong Province, Ji'nan 250014, China;
3. School of Electrical Engineering, Shandong University, Ji'nan 250061, China
Export: BibTeX | EndNote (RIS)      

[Objective] A new Network Public Opinion (NPO) classification method based on parallel Naive Bayesian Classification Algorithm (NBCA) in Hadoop environment is proposed. [Context] The NPO are high-volume, high-distribution and high-variety information assets, thus the accurate and fast classification is difficult to achieve. [Methods] According to the distributed storage and parallel processing features of Hadoop platform, the NBCA is parallel encapsulated and the NPO documents are locally stored under HDFS frame and parallel classified in MapReduce process. [Results] The performance of MapReduce packaged parallel NBCA is testified and the results show that the execution efficiency of proposed algorithm improves 82% compared to centralized method and its classification accuracy rate arrives more than 85%. [Conclusions] The proposed algorithm can effectively improve the NPO classification efficiency and ability.

Key wordsNetwork Public Opinion      Hadoop      MapReduce      Naive Bayes      Classification     
Received: 27 June 2014      Published: 17 March 2015
:  TP391.1  

Cite this article:

Ma Bin, Yin Lifeng. A Parallel Naive Bayesian Network Public Opinion Fast Classification Algorithm Based on Hadoop Platform. New Technology of Library and Information Service, 2015, 31(2): 78-84.

URL:     OR

[1] 王珊, 王会举, 覃雄派, 等. 架构大数据: 挑战、现状与展 望[J]. 计算机学报, 2011, 34(10): 1742-1752. (Wang Shan, Wang Huiju, Qin Xiongpai, et al. Architecting Big Data: Challenges, Studies and Forecasts [J]. Chinese Journal of Computers, 2011, 34(10): 1741-1752.)
[2] 人民网. 2013 年中国互联网舆情分析报告[EB/OL]. [2014-03-18]. ( 2013 Report of Internet Public Opinion Analysis [EB/OL]. [2014-03-18]. http://yuqing.
[3] 王昊, 叶鹏, 邓三鸿. 机器学习在中文期刊论文自动分类 研究中的应用[J]. 现代图书情报技术, 2014(3): 80-87. (Wang Hao, Ye Peng, Deng Sanhong. The Application of Machine-Learning in the Research on Automatic Categorization of Chinese Periodical Articles [J]. New Technology of Library and Information Service, 2014(3): 80-87.)
[4] 郭岩, 刘春阳, 余智华, 等. 网络舆情信息源影响力的评 估研究[J]. 中文信息学报, 2011, 25(3): 64-71. (Guo Yan, Liu Chunyang, Yu Zhihua, et al. Research on the Impact Evaluation of Web Information Sources of Public Opinion [J]. Journal of Chinese Information Processing, 2011, 25(3): 64-71.)
[5] 兰月新. 突发事件网络衍生舆情监测模型研究[J]. 现代图 书情报技术, 2013(3): 51-57. (Lan Yuexin. Research on Monitoring Model of Public Opinion Derived for Network Emergencies [J]. New Technology of Library and Information Serviece, 2013(3): 51-57.)
[6] Dave K, Lawrence S, Pennock D M. Mining the Peanut Gallery: Opinion Extraction and Sentiment Classification of Product Reviews [C]. In: Proceedings of the 12th International Conference on World Wide Web. New York: ACM, 2003: 519-528.
[7] Allan J, Lavrenko V, Swan R. Explorations within Topic Tracking and Detection [A]. //Topic Detection and Tracking[M]. Springer US, 2002: 197-224.
[8] 天玑舆情监测服务平台. [EB/OL]. [2014-08-20]. http://www. (Golaxy: Public Opinion Monitoring Platform [EB/OL]. [2014-08-20]. jhtml.)
[9] 方正智思互联网信息监控分析系统 [EB/OL]. [2014-06-18]. (Founder Wise Internet Information Monitoring and Analysis System [EB/OL]. [2014-06-18]. http://www.founder.
[10] 网络舆情及其分析技术-乐思网络舆情监测系统[EB/OL].[2014-12-02]. index.html. (Internet Public Opinion and Analysis Technology-Knowlesys Network Public Opinion Monitoring System[EB/OL]. [2014-12-02]. webmonitor_index.html.)
[11] Ma B. Experimental Research of Image Digital Watermark Based on DWT Technology [C]. In: Proceedings of International Conference on Uncertainty Reasoning and Knowledge Engineering, Bali, Indonesia. IEEE, 2011: 9-12.
[12] Ortigosa A, Carro R M, Quiroga J I. Predicting User Personality by Mining Social Interactions in Facebook [J]. Journal of Computer and System Sciences, 2014, 80(1): 57-71.
[13] 杨阳, 向阳, 熊磊. 基于矩阵分解与用户近邻模型的协同 过滤推荐算法[J]. 计算机应用, 2012, 32(2): 395-398. (Yang Yang, Xiang Yang, Xiong Lei. Collaborative Filtering and Recommendation Algorithm Based on Matrix Factorization and User Nearest Neighbor Model [J]. Journal of Computer Applications, 2012, 32(2): 395-398.)
[14] 杨超, 冯时, 王大玲, 等. 基于情感词典扩展技术的网络 舆情倾向性分析[J]. 小型微型计算机系统, 2010, 31(4): 691-695. (Yang Chao, Feng Shi, Wang Daling, et al. Analysis on Web Public Opinion Orientation Based on Extending Sentimental Lexicon [J]. Journal of Chinese Computer Systems, 2010, 31(4): 691-695.)
[15] 新华网.网络舆情参考(周报) [R/OL]. [2013-11-08]. (Xinhua. net. Network Public Opinion Reference (Weekly) [R/OL]. [2013-11-08].

[1] Fan Shaoping,Zhao Yuxuan,An Xinying,Wu Qingqiang. Classification Model for Medical Entity Relations with Convolutional Neural Network[J]. 数据分析与知识发现, 2021, 5(9): 75-84.
[2] Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[3] Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[4] Lu Quan, He Chao, Chen Jing, Tian Min, Liu Ting. A Multi-Label Classification Model with Two-Stage Transfer Learning[J]. 数据分析与知识发现, 2021, 5(7): 91-100.
[5] Xie Hao,Mao Jin,Li Gang. Sentiment Classification of Image-Text Information with Multi-Layer Semantic Fusion[J]. 数据分析与知识发现, 2021, 5(6): 103-114.
[6] Yu Bengong,Zhu Xiaojie,Zhang Ziwei. A Capsule Network Model for Text Classification with Multi-level Feature Extraction[J]. 数据分析与知识发现, 2021, 5(6): 93-102.
[7] Meng Zhen,Wang Hao,Yu Wei,Deng Sanhong,Zhang Baolong. Vocal Music Classification Based on Multi-category Feature Fusion[J]. 数据分析与知识发现, 2021, 5(5): 59-70.
[8] Zhang Mengyao, Zhu Guangli, Zhang Shunxiang, Zhang Biao. Grouping Microblog Users of Trending Topics Based on Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[9] Dong Miao, Su Zhongqi, Zhou Xiaobei, Lan Xue, Cui Zhigang, Cui Lei. Improving PubMedBERT for CID-Entity-Relation Classification Using Text-CNN[J]. 数据分析与知识发现, 2021, 5(11): 145-152.
[10] Feng Hao, Li Shuqing. Multi-layer Cascade Classifier for Credit Scoring with Multiple-Support Vector Machines[J]. 数据分析与知识发现, 2021, 5(10): 28-36.
[11] Wang Yan, Wang Huyan, Yu Bengong. Chinese Text Classification with Feature Fusion[J]. 数据分析与知识发现, 2021, 5(10): 1-14.
[12] Leng Jidong,Lv Xueqiang,Jiang Yang,Li Guolin. Consensus Mechanisms of Consortium Blockchain: A Survey[J]. 数据分析与知识发现, 2021, 5(1): 56-65.
[13] Yu Bengong, Zhu Mengdi. Question Classification Based on Bidirectional GRU with Hierarchical Attention and Multi-channel Convolution[J]. 数据分析与知识发现, 2020, 4(8): 50-62.
[14] Zhao Yang, Zhang Zhixiong, Liu Huan, Ding Liangping. Classification of Chinese Medical Literature with BERT Model[J]. 数据分析与知识发现, 2020, 4(8): 41-49.
[15] Weng Mengjuan,Yao Changqing,Han Hongqi,Wang Lijun,Ran Yaxin. Classification and Indexing Method with CNN for Imbalanced Datasets[J]. 数据分析与知识发现, 2020, 4(7): 87-95.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938