Please wait a minute...
New Technology of Library and Information Service  2012, Vol. Issue (11): 65-71    DOI: 10.11925/infotech.1003-3513.2012.11.11
Current Issue | Archive | Adv Search |
Focused Crawling for Network Public Opinion’s Topic Information
Huang Wei1,2, Jin Yabo1, Hu Changlong1
1. School of Management, Hubei University of Technology, Wuhan 430068, China;
2. School of Management, Wuhan University of Technology, Wuhan 430070, China
Download: PDF(740 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  The unfocused problem of network public opinion becomes more and more serious. This article proposes a focused crawler for network public opinion based on content topic selection strategy with time and spatial dimension factor by analyzing feature and evolution of network group events. The results of experiments prove that this focused crawler has higher execution efficiency, and also achives good focused ability. That provides the focused resources of processing network public opinion group events.
Key wordsNetwork group events      Network public opinion      Focused crawler      Domain Ontology      Focused factor     
Received: 05 November 2012      Published: 06 February 2013
:  G353.1  

Cite this article:

Huang Wei, Jin Yabo, Hu Changlong. Focused Crawling for Network Public Opinion’s Topic Information. New Technology of Library and Information Service, 2012, (11): 65-71.

URL:     OR

[1] 中国互联网信息中心.第30次中国互联网络发展状况调查统计报告[R/OL].[2012-07-25]. (China Internet Network Information Center. The 30th China Internet Development Statistics Report[R/OL].[2012-07-25].
[2] 刘毅. 略论网络舆情的概念、特点、表达与传播[J]. 理论界, 2007(1):11-12. (Liu Yi. Research on Network Public Opinion, Expression and Dissemination[J]. Theory Horizon, 2007(1):11-12.)
[3] Sahami M. Using Machine Learning to Improve Information Access[D]. Stanford: Stanford University, 1998.
[4] 北大方正技术研究院. 以科技手段辅助网络舆情突发事件的监测分析—方正智思舆情辅助决策支持系统[J]. 信息化建设, 2005(10):50-52. (Research Department of Fonder. Research on the Monitoring and Analysis of Network Public Opinion System-Founder Public Opinion of the Decision Support System[J]. Informatization Construction, 2005(10):50-52.)
[5] 周立柱, 林玲. 聚焦爬虫技术研究综述[J]. 计算机应用, 2005, 25(9):1965-1969.(Zhou Lizhu, Lin Ling. Survey on the Research of Focused Crawling Technique[J]. Journal of Computer Applications, 2005, 25(9):1965-1969.)
[6] Sun H, Wei Y M. A Note on the PageRank Algorithm[J]. Applied Mathematics and Computation, 2006, 79(2):799-806.
[7] Nomura S, Oyama S, Hayamizu T, et al. Analysis and Improvement of HITS Algorithm for Detecting Web Communities[C]. In: Proceedings of 2002 Symposium on Applications and the Internet (SAINT'02). 2002:132-140.
[8] Aggarwal C C, Gates S C, Yu P S. On the Merits of Using Supervised Clustering for Building Categorization Systems[C]. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'99). New York: ACM, 1999: 352-356.
[9] De Bra P M E, Houben G, Kornatzky Y, et al. Information Retrieval in Distributed Hypertexts[C]. In: Proceedings of the 4th Computer-Assisted Information Retrieval (RIAO'94). 1994: 481-493.
[10] De Bra P M E, Post R D J. Information Retrieval in the World Wide Web: Making Client-based Searching Feasible[C]. In: Proceedings of the 1st International Conference on World Wide Web, Geneva. Amsterdam: Elsevier, 1994: 183-192.
[11] 姜鹏,宋继华.一种主题爬虫文本分类器的构建[J]. 中文信息学报, 2010,24(6):92-96.(Jiang Peng,Song Jihua. A Method of Text Classifier for Focused Crawler[J]. Journal of Chinese Information Processing, 2010, 24(6):92-96.)
[12] 朱学芳,韩占校.基于P2P的分布式主题爬虫系统的设计与实现[J]. 情报学报, 2010,29(3):402-407.(Zhu Xuefang, Han Zhanxiao. Design and Implementation of Distributed Topic Crawler Based on P2P for Image Retrieval[J]. Journal of the China Society for Scientific and Technical Information,2010,29(3):402-407.)
[13] 乔建忠.基于主题爬虫的本体非分类关系学习框架[J]. 图书情报工作,2010,54(18):120-125, 129.(Qiao Jianzhong. Learning Non-taxonomic Relationships Based on Focused Crawler[J]. Library and Information Service, 2010,54(18):120-125, 129.)
[14] 蒋国瑞,王秋利.基于本体的TBT电子信息产品领域主题爬虫研究[J]. 情报杂志, 2011,30(7):157-161.(Jiang Guorui, Wang Qiuli.Research on Focused Crawler of TBT Electronic Information Products Based on Ontology[J].Journal of Information, 2011, 30(7):157-161.)
[15] 宋海洋,刘晓然,钱海俊.一种新的主题网络爬虫爬行策略[J]. 计算机应用与软件, 2011,28(11):264-267.(Song Haiyang, Liu Xiaoran, Qian Haijun. A Novel Crawling Strategy of Focused Web Crawler[J].Computer Applications and Software,2011,28(11):264-267.)
[16] 张囡囡. 面向语义网的领域本体半自动构建方法的研究[D]. 大连:大连海事大学, 2008.(Zhang Nannan. Research on the Method of Semi-automatic Domain Ontology Building for the Semantic Web[D]. Dalian: Dalian Maritime University, 2008.)
[17] 黄炜,程宝生,杨青. 基于本体的网络群体性事件主题发现研究[J]. 图书情报工作, 2012, 56(20):47-52.(Huang Wei, Cheng Baosheng, Yang Qing. Topic Discovery of Network Group Events Based on Ontology[J]. Library and Information Service, 2012, 56(20):47-52.)
[18] 连浩,刘悦,许洪波, 等. 改进的基于布尔模型的网页查重算法[J]. 计算机应用研究, 2007, 24(2):36-39.(Lian Hao, Liu Yue, Xu Hongbo, et al. Duplicated Web Pages Detection Algorithm Based on Boolean Model[J]. Application Research of Computers, 2007, 24(2):36-39.)
[19] 黄炜,张李义. 基于语义爬虫的商品信息主题采集研究[J]. 现代图书情报技术, 2010(1):3-8.(Huang Wei, Zhang Liyi. Research on Focused Merchandise Information Crawling Based on Semantic Crawler[J]. New Technology of Library and Information Service, 2010 (1):3-8.)
[20] 谢科范,赵湜,陈刚, 等.网络舆情突发事件的生命周期原理及集群决策研究[J]. 武汉理工大学学报:社会科学版, 2010, 23(4):482-486.(Xie Kefan, Zhao Shi, Chen Gang, et al. Research on Lifecycle Principle and Group Decision-making of Network Public Sentiment Emergency[J]. Journal of Wuhan University of Technology :Social Science Edition, 2010, 23(4):482-486.)
[1] Youshi He,Shufang He. Sentiment Mining of Online Product Reviews Based on Domain Ontology[J]. 数据分析与知识发现, 2018, 2(8): 60-68.
[2] Zhen Li,Shengchun Ding,Nan Wang. Identifying Topics of Online Public Opinion[J]. 数据分析与知识发现, 2017, 1(8): 18-30.
[3] Xiwei Wang,Liu Zhang,Shimeng Li,Nan’axue Wang. The Dissemination of Online Public Opinion on Social Welfare Issues via New Media: Case Study of “Draw up the Lifeline” in Sina Weibo[J]. 数据分析与知识发现, 2017, 1(6): 93-101.
[4] Lu Jiaying,Yuan Qinjian,Huang Qi,Qian Yunjie. Building Product Domain Ontology with Concept Lattice Theory[J]. 现代图书情报技术, 2016, 32(5): 38-46.
[5] Bao Yulai,Bi Qiang. Semantic Retrieval for Mongolian Music: An Explorative Study[J]. 现代图书情报技术, 2016, 32(11): 94-100.
[6] Zhang Fan, Le Xiaoqiu. Research on Recognition of Concept Attribute Instances in Innovation Sentences of Scientific Research Paper[J]. 现代图书情报技术, 2015, 31(5): 15-23.
[7] Duan Yufeng, Zhu Wenjing, Chen Qiao, Liu Wei, Liu Fenghong. The Study on Out-of-Vocabulary Identification on a Model Based on the Combination of CRFs and Domain Ontology Elements Set[J]. 现代图书情报技术, 2015, 31(4): 41-49.
[8] Ma Bin, Yin Lifeng. A Parallel Naive Bayesian Network Public Opinion Fast Classification Algorithm Based on Hadoop Platform[J]. 现代图书情报技术, 2015, 31(2): 78-84.
[9] Duan Yufeng, Huang Sisi. Research on Construction of Chinese Plant Species Diversity Domain Ontology Based on BFO[J]. 现代图书情报技术, 2015, 31(12): 72-79.
[10] Yan Shiyan, Wang Shengqing, Luo Yunchuan, Huang Haojun. An Ontology Collaborative Construction Model Based on FCA in Cloud Computing Environment[J]. 现代图书情报技术, 2014, 30(3): 49-56.
[11] Qiao Jianzhong. An Improved Best-First Search Algorithm Based Focused Crawling Research[J]. 现代图书情报技术, 2013, 29(7/8): 28-35.
[12] Yao Xiaona, Zhu Zhongming, Wang Sili. Research on Automatic Semantic Annotation for Geosciences[J]. 现代图书情报技术, 2013, (4): 48-53.
[13] Lan Yuexin. Research on Monitoring Model of Public Opinion Derived for Network Emergencies[J]. 现代图书情报技术, 2013, 29(3): 51-57.
[14] Xu Xin, Guo Jinlong. Construction of Subject Knowledge Base——Taking the Domain of Chinese Cuisine Culture as an Example[J]. 现代图书情报技术, 2013, (12): 2-9.
[15] Guo Jinlong, Hong Yunjia, Xu Xin. Construction and Application of Ontology in the Domain of Chinese Cuisine Culture[J]. 现代图书情报技术, 2013, (12): 10-18.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938