Please wait a minute...
New Technology of Library and Information Service  2010, Vol. 26 Issue (1): 3-8    DOI: 10.11925/infotech.1003-3513.2010.01.02
article Current Issue | Archive | Adv Search |
Research on Focused Merchandise Information Crawling Based on Semantic Crawler
Huang Wei1,2   Zhang Liyi1
1(Center for Studies of Information Resources, Wuhan University, Wuhan  430072, China)
2(School of Management, Hubei University of Technology, Wuhan 430068, China)
Download: PDF(457 KB)   HTML  
Export: BibTeX | EndNote (RIS)      

This article proposes a method to gather merchandise information based on focused crawler, which integrates the Web topic link analysis and topic content semantic analysis. Through the statistical learning to Ontology during the crawling, the reference of domain-specific Ontology is optimized continuously. The experiment results show that comparing with other conventional crawling algorithms,this method is more effective, as it can prevent the occurrence of topic drift and bring a higher topic harvest rate.

Key wordsFocused crawler      Merchandise information      Semantic      Topic link analysis      Ontology learning     
Received: 21 December 2009      Published: 25 January 2001


Corresponding Authors: Wei Huang     E-mail:
About author:: Huang Wei,Zhang Liyi

Cite this article:

Huang Wei,Zhang Liyi. Research on Focused Merchandise Information Crawling Based on Semantic Crawler. New Technology of Library and Information Service, 2010, 26(1): 3-8.

URL:     OR

[1] Chakrabarti S, Van den Berg M, Dom B. Focused Crawling: A New Approach to Topic-specific Web Resource Discovery[J]. Computer Networks, 1999(31):1623-1640.
[2] Diligenti M, Coetzee F, Lawrence S, et al. Focused Crawling Using Context Graphs[C].In: Proceedings of the 26th International Conference on Very Large Databases, Cairo, Egypt.2000:527-534.
[3] Hsu C C, Wu F. Topic-specific Crawling on the Web with the Measurements of the Relevancy Context Graph[J]. Information Systems, 2006, 31(4):232-246.
[4] Chakrabarti S, Punera K, Subramanyam M. Accelerated Focused Crawling Through Online Relevance Feedback[C]. In: Proceedings of the 11th International World Wide Web Conference, Hawaii. 2002:251-259.
[5] Higham D J. Google PageRank as Mean Playing Time for Pinball on the Reverse Web[J]. Applied Mathematics Letters, 2005, 18(12):1359-1362.
[6] Ehrig M, Maedche A. Ontology-focused Crawling of Web Documents[C].In: Proceedings of the 2003 ACM Symposium on Applied Computing. New York:ACM Press, 2003:1174-1178.
[7]  Kapetanios E, Sugumaran V. An Ontology-based Focused Crawler[C]. In: Proceedings of the NLDB 2008. 2008:376-379.
[8] Maedche A, Ehrig M, Handschuh S,et al. Ontology-focused Crawling of Documents and Relational Metadata[C]. In: Proceedings of the 11th International World Wide Web Conference, Hawaii.2002:347-354.
[9]  Zhou L. Ontology Learning: State of the Art and Open Issues[J]. Information Technology and Management, 2007,8(3):241-252.
[10] Nie X, Zhou J L. A Domain Adaptive Ontology Learning Framework[C]. In: Proceedings of the IEEE International Conference on Networking, Sensing and Control.2008:1726-1729.
[11]Rungsawang A, Angkawattanawit N. Learnable Topic-specific Web Crawler[J]. Journal of Network and Computer Applications, 2005,28(2):97-114.
[12] Li M, Du X Y, Wang S. Learning Ontology from Relational Database[C]. In: Proceedings of the 2005 International Conference on Machine Learning and Cybernetics.2005:3410-3415.
[13] Fang W, Cui Z M, Zhao P P. Ontology-based Focused Crawling of Deep Web Sources[C]. In: Proceedings of the KSEM 2007.2007:514-519.
[14]Rennie J, McCallum A K. Using Reinforcement Learning to Spider the Web Efficiently[C]. In: Proceedings of the 16th International Conference on Machine Learning. San Francisco,USA:Morgan Kaufmann Publishers, 1999:335-343.
[15]Su C, Gao Y, Yang J,et al. An Efficient Adaptive Focused Crawler Based on Ontology Learning[C]. In: Proceedings of the 5th International Conference of Hybrid Intelligent Systems.2005:6-9.
[16] Zheng H T, Kang B Y, Kim H G. An Ontology-based Approach to Learnable Focused Crawling[J].Information Sciences, 2008, 178(23):4512-4522.
[17] Zheng H T, Kang B Y, Kim H G. Learnable Focused Crawling Based on Ontology[C]. In: Proceedings of the AIRS 2008.2008:264-275.
[18]Aggarwal C C, Al-Garawi F, Yu  P S. Intelligent Crawling on the World Wide Web with Arbitrary Predicates[C]. In: Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China. New York:ACM Press, 2001:96-105.
[19]Aggarwal C C, Al-Garawi F, Yu P S. On the Design of a Learning Crawler for Topical Resource Discovery[J]. Transactions on Information Systems, 2001,19(3):286-309.
[20]Almpanidis G, Kotropoulos C, Pitas I. Combining Text and Link Analysis for Focused Crawling-An Application for Vertical Search Engines[J]. Information Systems, 2007,32(5):886-908.
[21] Ye Y M,Ma F Y,Lu Y M,et al. iSurfer: A Focused Web Crawler Based on Incremental Learning from Positive Samples[C]. In: Proceedings of the  Asia-Pacific Web 2004, Hangzhou , China. 2004:122-134.
[22] Liu H L, Kou C H, Wang G X. Efficiently Crawling Strategy for Focused Searching Engine[C]. In: Proceedings of the Asia-Pacific Web/WAIM 2007.2007:25-36.
[23] Tamma V, Phelps S, Dickinson I,et al . Ontologies for Supporting Negotiation in E-commerce[J]. Engineering Applications of Artificial Intelligence, 2005,18(2):223-236.
[24] Malucelli A, Palzer D, Oliveira E. Ontology-based Services to Help Solving the Heterogeneity Problem in E-commerce Negotiations[J]. Electronic Commerce Research and Applications, 2006,5(1):29-43.
[25] Lu Y,He H,Peng Q,et al. Clustering E-commerce Search Engines Based on Their Search Interface Pages Using WISE-Cluster[J]. Data & Knowledge Engineering, 2006,59(2):231-246.
[26] Zhai J,Shen L,Liang Y,et al. Application of Fuzzy Ontology to Information Retrieval for Electronic Commerce[C]. In:Proceedings of the 2008 International Symposium on Electronic Commerce and Security. Washington, DC, USA:IEEE Computer Society,2008:221-225.

[1] Peng Guan,Yuefen Wang,Zhu Fu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[2] Zhu Fu,Yuefen Wang,Xuhui Ding. Semantic Representation of Design Process Knowledge Reuse[J]. 数据分析与知识发现, 2019, 3(6): 21-29.
[3] Junliang Yao,Xiaoqiu Le. Semantic Matching for Sci-Tech Novelty Retrieval[J]. 数据分析与知识发现, 2019, 3(6): 50-56.
[4] Xiangdong Li,Fan Gao,Youhai Li. Categorizing Documents Automatically within Common Semantic Space[J]. 数据分析与知识发现, 2018, 2(9): 66-73.
[5] Dan Wu,Liuxing Lu. Semantic Changes of Queries from Cross-device Searching[J]. 数据分析与知识发现, 2018, 2(8): 69-78.
[6] Huihui Tang,Hao Wang,Zixuan Zhang,Xueying Wang. Extracting Names of Historical Events Based on Chinese Character Tags[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[7] Xueying Wang,Hao Wang,Zixuan Zhang. Recognizing Semantics of Continuous Strings in Chinese Patent Documents[J]. 数据分析与知识发现, 2018, 2(5): 11-22.
[8] Jiaqi Wang,Junsheng Zhang,Xiaodong Qiao. Analyzing Representation and Semantic Links of Scientific Research Events[J]. 数据分析与知识发现, 2018, 2(5): 32-39.
[9] Yuefen Wang,Zhu Fu,Peng Wu. Tech-Framework for Semantic Knowledge Management in Conceptual Design[J]. 数据分析与知识发现, 2018, 2(2): 2-10.
[10] Zhu Fu,Yuxing Jiang,Yuefen Wang. Modeling Conceptual Design Process for Dynamic Knowledge Management and Reuse[J]. 数据分析与知识发现, 2018, 2(2): 20-28.
[11] Erjing Chen,Enbo Jiang. Review of Studies on Text Similarity Measures[J]. 数据分析与知识发现, 2017, 1(6): 1-11.
[12] Rujiang Bai,Fuhai Leng,Junhua Liao. An Improved Cosine Text Similarity Computing Method Based on Semantic Chunk Feature[J]. 数据分析与知识发现, 2017, 1(6): 56-64.
[13] Zixuan Wang,Xiaoqiu Le,Yuanbiao He. Recognizing Core Topic Sentences with Improved TextRank Algorithm Based on WMD Semantic Similarity[J]. 数据分析与知识发现, 2017, 1(4): 1-8.
[14] Jiawang Cui,Chunwang Li. Identifying Semantic Relations of Clusters Based on Linked Data[J]. 数据分析与知识发现, 2017, 1(4): 57-66.
[15] Jin Zeng,Wei Lu,Heng Ding,Haihua Chen. Modeling User’s Interests Based on Image Semantics[J]. 数据分析与知识发现, 2017, 1(4): 76-83.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938