Please wait a minute...
New Technology of Library and Information Service  2010, Vol. 26 Issue (1): 3-8    DOI: 10.11925/infotech.1003-3513.2010.01.02
article Current Issue | Archive | Adv Search |
Research on Focused Merchandise Information Crawling Based on Semantic Crawler
Huang Wei1,2   Zhang Liyi1
1(Center for Studies of Information Resources, Wuhan University, Wuhan  430072, China)
2(School of Management, Hubei University of Technology, Wuhan 430068, China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

This article proposes a method to gather merchandise information based on focused crawler, which integrates the Web topic link analysis and topic content semantic analysis. Through the statistical learning to Ontology during the crawling, the reference of domain-specific Ontology is optimized continuously. The experiment results show that comparing with other conventional crawling algorithms,this method is more effective, as it can prevent the occurrence of topic drift and bring a higher topic harvest rate.

Key wordsFocused crawler      Merchandise information      Semantic      Topic link analysis      Ontology learning     
Received: 21 December 2009      Published: 25 January 2001
: 

TP393

 
Corresponding Authors: Wei Huang     E-mail: tonny_hw@163.com
About author:: Huang Wei,Zhang Liyi

Cite this article:

Huang Wei,Zhang Liyi. Research on Focused Merchandise Information Crawling Based on Semantic Crawler. New Technology of Library and Information Service, 2010, 26(1): 3-8.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2010.01.02     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2010/V26/I1/3

[1] Chakrabarti S, Van den Berg M, Dom B. Focused Crawling: A New Approach to Topic-specific Web Resource Discovery[J]. Computer Networks, 1999(31):1623-1640.
[2] Diligenti M, Coetzee F, Lawrence S, et al. Focused Crawling Using Context Graphs[C].In: Proceedings of the 26th International Conference on Very Large Databases, Cairo, Egypt.2000:527-534.
[3] Hsu C C, Wu F. Topic-specific Crawling on the Web with the Measurements of the Relevancy Context Graph[J]. Information Systems, 2006, 31(4):232-246.
[4] Chakrabarti S, Punera K, Subramanyam M. Accelerated Focused Crawling Through Online Relevance Feedback[C]. In: Proceedings of the 11th International World Wide Web Conference, Hawaii. 2002:251-259.
[5] Higham D J. Google PageRank as Mean Playing Time for Pinball on the Reverse Web[J]. Applied Mathematics Letters, 2005, 18(12):1359-1362.
[6] Ehrig M, Maedche A. Ontology-focused Crawling of Web Documents[C].In: Proceedings of the 2003 ACM Symposium on Applied Computing. New York:ACM Press, 2003:1174-1178.
[7]  Kapetanios E, Sugumaran V. An Ontology-based Focused Crawler[C]. In: Proceedings of the NLDB 2008. 2008:376-379.
[8] Maedche A, Ehrig M, Handschuh S,et al. Ontology-focused Crawling of Documents and Relational Metadata[C]. In: Proceedings of the 11th International World Wide Web Conference, Hawaii.2002:347-354.
[9]  Zhou L. Ontology Learning: State of the Art and Open Issues[J]. Information Technology and Management, 2007,8(3):241-252.
[10] Nie X, Zhou J L. A Domain Adaptive Ontology Learning Framework[C]. In: Proceedings of the IEEE International Conference on Networking, Sensing and Control.2008:1726-1729.
[11]Rungsawang A, Angkawattanawit N. Learnable Topic-specific Web Crawler[J]. Journal of Network and Computer Applications, 2005,28(2):97-114.
[12] Li M, Du X Y, Wang S. Learning Ontology from Relational Database[C]. In: Proceedings of the 2005 International Conference on Machine Learning and Cybernetics.2005:3410-3415.
[13] Fang W, Cui Z M, Zhao P P. Ontology-based Focused Crawling of Deep Web Sources[C]. In: Proceedings of the KSEM 2007.2007:514-519.
[14]Rennie J, McCallum A K. Using Reinforcement Learning to Spider the Web Efficiently[C]. In: Proceedings of the 16th International Conference on Machine Learning. San Francisco,USA:Morgan Kaufmann Publishers, 1999:335-343.
[15]Su C, Gao Y, Yang J,et al. An Efficient Adaptive Focused Crawler Based on Ontology Learning[C]. In: Proceedings of the 5th International Conference of Hybrid Intelligent Systems.2005:6-9.
[16] Zheng H T, Kang B Y, Kim H G. An Ontology-based Approach to Learnable Focused Crawling[J].Information Sciences, 2008, 178(23):4512-4522.
[17] Zheng H T, Kang B Y, Kim H G. Learnable Focused Crawling Based on Ontology[C]. In: Proceedings of the AIRS 2008.2008:264-275.
[18]Aggarwal C C, Al-Garawi F, Yu  P S. Intelligent Crawling on the World Wide Web with Arbitrary Predicates[C]. In: Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China. New York:ACM Press, 2001:96-105.
[19]Aggarwal C C, Al-Garawi F, Yu P S. On the Design of a Learning Crawler for Topical Resource Discovery[J]. Transactions on Information Systems, 2001,19(3):286-309.
[20]Almpanidis G, Kotropoulos C, Pitas I. Combining Text and Link Analysis for Focused Crawling-An Application for Vertical Search Engines[J]. Information Systems, 2007,32(5):886-908.
[21] Ye Y M,Ma F Y,Lu Y M,et al. iSurfer: A Focused Web Crawler Based on Incremental Learning from Positive Samples[C]. In: Proceedings of the  Asia-Pacific Web 2004, Hangzhou , China. 2004:122-134.
[22] Liu H L, Kou C H, Wang G X. Efficiently Crawling Strategy for Focused Searching Engine[C]. In: Proceedings of the Asia-Pacific Web/WAIM 2007.2007:25-36.
[23] Tamma V, Phelps S, Dickinson I,et al . Ontologies for Supporting Negotiation in E-commerce[J]. Engineering Applications of Artificial Intelligence, 2005,18(2):223-236.
[24] Malucelli A, Palzer D, Oliveira E. Ontology-based Services to Help Solving the Heterogeneity Problem in E-commerce Negotiations[J]. Electronic Commerce Research and Applications, 2006,5(1):29-43.
[25] Lu Y,He H,Peng Q,et al. Clustering E-commerce Search Engines Based on Their Search Interface Pages Using WISE-Cluster[J]. Data & Knowledge Engineering, 2006,59(2):231-246.
[26] Zhai J,Shen L,Liang Y,et al. Application of Fuzzy Ontology to Information Retrieval for Electronic Commerce[C]. In:Proceedings of the 2008 International Symposium on Electronic Commerce and Security. Washington, DC, USA:IEEE Computer Society,2008:221-225.

[1] Li Wenna, Zhang Zhixiong. Entity Alignment Method for Different Knowledge Repositories with Joint Semantic Representation[J]. 数据分析与知识发现, 2021, 5(7): 1-9.
[2] Xu Zheng,Le Xiaoqiu. Generating AND-OR Logical Expressions for Semantic Features of Categorical Documents[J]. 数据分析与知识发现, 2021, 5(5): 95-103.
[3] Zhang Guobiao,Li Jie. Detecting Social Media Fake News with Semantic Consistency Between Multi-model Contents[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[4] Shi Xiang,Liu Ping. Extraction and Representation of Domain Knowledge with Semantic Description Model and Knowledge Elements——Case Study of Information Retrieval[J]. 数据分析与知识发现, 2021, 5(4): 123-133.
[5] Zhang Jinzhu, Yu Wenqian. Topic Recognition and Key-Phrase Extraction with Phrase Representation Learning[J]. 数据分析与知识发现, 2021, 5(2): 50-60.
[6] Shao Qi,Mu Dongmei,Wang Ping,Jin Chunyan. Identifying Subjects of Online Opinion from Public Health Emergencies[J]. 数据分析与知识发现, 2020, 4(9): 68-80.
[7] Wei Tingxin,Bai Wenlei,Qu Weiguang. Sense Prediction for Chinese OOV Based on Word Embedding and Semantic Knowledge[J]. 数据分析与知识发现, 2020, 4(6): 109-117.
[8] Deng Siyi,Le Xiaoqiu. Coreference Resolution Based on Dynamic Semantic Attention[J]. 数据分析与知识发现, 2020, 4(5): 46-53.
[9] Zhu Lu,Tian Xiaomeng,Cao Sainan,Liu Yuanyuan. Subspace Cross-modal Retrieval Based on High-Order Semantic Correlation[J]. 数据分析与知识发现, 2020, 4(5): 84-91.
[10] Zhang Dongyu,Cui Zijuan,Li Yingxia,Zhang Wei,Lin Hongfei. Identifying Noun Metaphors with Transformer and BERT[J]. 数据分析与知识发现, 2020, 4(4): 100-108.
[11] Zhang Runtong,Chen Donghua,Zhao Hongmei,Zhu Xiaomin. Computer-Assisted ICD-11 Coding Method Based on Chinese Semantic Analysis[J]. 数据分析与知识发现, 2020, 4(4): 44-55.
[12] Wei Wei,Guo Chonghui,Xing Xiaoyu. Annotating Knowledge Points & Recommending Questions Based on Semantic Association Rules[J]. 数据分析与知识发现, 2020, 4(2/3): 182-191.
[13] Tian Zhonglin,Wu Xu,Xie Xiaqing,Xu Jin,Lu Yueming. Real-time Analysis Model for Short Texts with Relationship Graph of Domain Semantics[J]. 数据分析与知识发现, 2020, 4(2/3): 239-248.
[14] Yang Lin, Huang Xiaoshuo, Wang Jiayang, Li Jiao. Extracting Clinical Scale Information and Identifying Trial Cohorts with Semantic Alignment[J]. 数据分析与知识发现, 2020, 4(12): 33-44.
[15] Zhang Jinzhu,Zhu Lipeng,Liu Jingjie. Unsupervised Cross-Language Model for Patent Recommendation Based on Representation[J]. 数据分析与知识发现, 2020, 4(10): 93-103.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn