|
|
A Parallel Naive Bayesian Network Public Opinion Fast Classification Algorithm Based on Hadoop Platform |
Ma Bin1,2,3, Yin Lifeng1 |
1. Department of Information Science and Technology, Shandong University of Political Science and Law, Ji'nan 250014, China;
2. Key Laboratory of Forensic Evidence in Shandong Province, Ji'nan 250014, China;
3. School of Electrical Engineering, Shandong University, Ji'nan 250061, China |
|
|
Abstract [Objective] A new Network Public Opinion (NPO) classification method based on parallel Naive Bayesian Classification Algorithm (NBCA) in Hadoop environment is proposed. [Context] The NPO are high-volume, high-distribution and high-variety information assets, thus the accurate and fast classification is difficult to achieve. [Methods] According to the distributed storage and parallel processing features of Hadoop platform, the NBCA is parallel encapsulated and the NPO documents are locally stored under HDFS frame and parallel classified in MapReduce process. [Results] The performance of MapReduce packaged parallel NBCA is testified and the results show that the execution efficiency of proposed algorithm improves 82% compared to centralized method and its classification accuracy rate arrives more than 85%. [Conclusions] The proposed algorithm can effectively improve the NPO classification efficiency and ability.
|
Received: 27 June 2014
Published: 17 March 2015
|
|
[1] 王珊, 王会举, 覃雄派, 等. 架构大数据: 挑战、现状与展 望[J]. 计算机学报, 2011, 34(10): 1742-1752. (Wang Shan, Wang Huiju, Qin Xiongpai, et al. Architecting Big Data: Challenges, Studies and Forecasts [J]. Chinese Journal of Computers, 2011, 34(10): 1741-1752.)
[2] 人民网. 2013 年中国互联网舆情分析报告[EB/OL]. [2014-03-18]. http://yuqing.people.com.cn/n/2014/0318/c364391-24662668.html. (People.cn. 2013 Report of Internet Public Opinion Analysis [EB/OL]. [2014-03-18]. http://yuqing. people.com.cn/n/2014/0318/c364391-24662668.html.)
[3] 王昊, 叶鹏, 邓三鸿. 机器学习在中文期刊论文自动分类 研究中的应用[J]. 现代图书情报技术, 2014(3): 80-87. (Wang Hao, Ye Peng, Deng Sanhong. The Application of Machine-Learning in the Research on Automatic Categorization of Chinese Periodical Articles [J]. New Technology of Library and Information Service, 2014(3): 80-87.)
[4] 郭岩, 刘春阳, 余智华, 等. 网络舆情信息源影响力的评 估研究[J]. 中文信息学报, 2011, 25(3): 64-71. (Guo Yan, Liu Chunyang, Yu Zhihua, et al. Research on the Impact Evaluation of Web Information Sources of Public Opinion [J]. Journal of Chinese Information Processing, 2011, 25(3): 64-71.)
[5] 兰月新. 突发事件网络衍生舆情监测模型研究[J]. 现代图 书情报技术, 2013(3): 51-57. (Lan Yuexin. Research on Monitoring Model of Public Opinion Derived for Network Emergencies [J]. New Technology of Library and Information Serviece, 2013(3): 51-57.)
[6] Dave K, Lawrence S, Pennock D M. Mining the Peanut Gallery: Opinion Extraction and Sentiment Classification of Product Reviews [C]. In: Proceedings of the 12th International Conference on World Wide Web. New York: ACM, 2003: 519-528.
[7] Allan J, Lavrenko V, Swan R. Explorations within Topic Tracking and Detection [A]. //Topic Detection and Tracking[M]. Springer US, 2002: 197-224.
[8] 天玑舆情监测服务平台. [EB/OL]. [2014-08-20]. http://www. int-yt.com/product/441.jhtml. (Golaxy: Public Opinion Monitoring Platform [EB/OL]. [2014-08-20]. http://www.int-yt.com/product/441. jhtml.)
[9] 方正智思互联网信息监控分析系统 [EB/OL]. [2014-06-18]. http://www.founder.com.cn/BigData/ProductIntroduction/index/show/692/. (Founder Wise Internet Information Monitoring and Analysis System [EB/OL]. [2014-06-18]. http://www.founder. com.cn/BigData/ProductIntroduction/index/show/692/.)
[10] 网络舆情及其分析技术-乐思网络舆情监测系统[EB/OL].[2014-12-02]. http://www.knowlesys.cn/product_webmonitor_ index.html. (Internet Public Opinion and Analysis Technology-Knowlesys Network Public Opinion Monitoring System[EB/OL]. [2014-12-02]. http://www.knowlesys.cn/product_ webmonitor_index.html.)
[11] Ma B. Experimental Research of Image Digital Watermark Based on DWT Technology [C]. In: Proceedings of International Conference on Uncertainty Reasoning and Knowledge Engineering, Bali, Indonesia. IEEE, 2011: 9-12.
[12] Ortigosa A, Carro R M, Quiroga J I. Predicting User Personality by Mining Social Interactions in Facebook [J]. Journal of Computer and System Sciences, 2014, 80(1): 57-71.
[13] 杨阳, 向阳, 熊磊. 基于矩阵分解与用户近邻模型的协同 过滤推荐算法[J]. 计算机应用, 2012, 32(2): 395-398. (Yang Yang, Xiang Yang, Xiong Lei. Collaborative Filtering and Recommendation Algorithm Based on Matrix Factorization and User Nearest Neighbor Model [J]. Journal of Computer Applications, 2012, 32(2): 395-398.)
[14] 杨超, 冯时, 王大玲, 等. 基于情感词典扩展技术的网络 舆情倾向性分析[J]. 小型微型计算机系统, 2010, 31(4): 691-695. (Yang Chao, Feng Shi, Wang Daling, et al. Analysis on Web Public Opinion Orientation Based on Extending Sentimental Lexicon [J]. Journal of Chinese Computer Systems, 2010, 31(4): 691-695.)
[15] 新华网.网络舆情参考(周报) [R/OL]. [2013-11-08]. http://www.xinhuanet.com/yuqing/zhuanti/03.htm. (Xinhua. net. Network Public Opinion Reference (Weekly) [R/OL]. [2013-11-08]. http://www.xinhuanet.com/yuqing/zhuanti/03.htm.) |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|