Please wait a minute...
Advanced Search
现代图书情报技术  2010, Vol. 26 Issue (1): 3-8     https://doi.org/10.11925/infotech.1003-3513.2010.01.02
  专题 本期目录 | 过刊浏览 | 高级检索 |
基于语义爬虫的商品信息主题采集研究*
黄炜1,2 张李义1
1(武汉大学信息资源研究中心武汉 430072)
2(湖北工业大学管理学院武汉 430068)
Research on Focused Merchandise Information Crawling Based on Semantic Crawler
Huang Wei1,2   Zhang Liyi1
1(Center for Studies of Information Resources, Wuhan University, Wuhan  430072, China)
2(School of Management, Hubei University of Technology, Wuhan 430068, China)
全文: PDF (457 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

结合网页主题链接分析和网页主题内容语义分析,提出一个以主题爬虫实现采集商品信息的方法。在爬行过程中通过对本体的统计学习,使主题本体参照物不断得到优化。实验结果表明,该方法较其他传统爬行算法更有效,并能防止主题漂移的发生,带来较高的主题收获率。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
黄炜
张李义
关键词 主题爬虫商务信息语义主题链接分析本体学习    
Abstract

This article proposes a method to gather merchandise information based on focused crawler, which integrates the Web topic link analysis and topic content semantic analysis. Through the statistical learning to Ontology during the crawling, the reference of domain-specific Ontology is optimized continuously. The experiment results show that comparing with other conventional crawling algorithms,this method is more effective, as it can prevent the occurrence of topic drift and bring a higher topic harvest rate.

Key wordsFocused crawler    Merchandise information    Semantic    Topic link analysis    Ontology learning
收稿日期: 2009-12-21      出版日期: 2001-01-25
: 

TP393

 
基金资助:

*本文系教育部人文社会科学重点研究基地重大项目“电子商务中商务信息跨平台检索与信息重组”(项目编号:07JJD870220)和湖北省教育厅人文社会科学项目“Web数据危机下商务信息资源的语义化管理研究”(项目编号:2009b228)的研究成果之一。

通讯作者: 黄炜     E-mail: tonny_hw@163.com
作者简介: 黄炜,张李义
引用本文:   
黄炜,张李义. 基于语义爬虫的商品信息主题采集研究*[J]. 现代图书情报技术, 2010, 26(1): 3-8.
Huang Wei,Zhang Liyi. Research on Focused Merchandise Information Crawling Based on Semantic Crawler. New Technology of Library and Information Service, 2010, 26(1): 3-8.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2010.01.02      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2010/V26/I1/3

[1] Chakrabarti S, Van den Berg M, Dom B. Focused Crawling: A New Approach to Topic-specific Web Resource Discovery[J]. Computer Networks, 1999(31):1623-1640.
[2] Diligenti M, Coetzee F, Lawrence S, et al. Focused Crawling Using Context Graphs[C].In: Proceedings of the 26th International Conference on Very Large Databases, Cairo, Egypt.2000:527-534.
[3] Hsu C C, Wu F. Topic-specific Crawling on the Web with the Measurements of the Relevancy Context Graph[J]. Information Systems, 2006, 31(4):232-246.
[4] Chakrabarti S, Punera K, Subramanyam M. Accelerated Focused Crawling Through Online Relevance Feedback[C]. In: Proceedings of the 11th International World Wide Web Conference, Hawaii. 2002:251-259.
[5] Higham D J. Google PageRank as Mean Playing Time for Pinball on the Reverse Web[J]. Applied Mathematics Letters, 2005, 18(12):1359-1362.
[6] Ehrig M, Maedche A. Ontology-focused Crawling of Web Documents[C].In: Proceedings of the 2003 ACM Symposium on Applied Computing. New York:ACM Press, 2003:1174-1178.
[7]  Kapetanios E, Sugumaran V. An Ontology-based Focused Crawler[C]. In: Proceedings of the NLDB 2008. 2008:376-379.
[8] Maedche A, Ehrig M, Handschuh S,et al. Ontology-focused Crawling of Documents and Relational Metadata[C]. In: Proceedings of the 11th International World Wide Web Conference, Hawaii.2002:347-354.
[9]  Zhou L. Ontology Learning: State of the Art and Open Issues[J]. Information Technology and Management, 2007,8(3):241-252.
[10] Nie X, Zhou J L. A Domain Adaptive Ontology Learning Framework[C]. In: Proceedings of the IEEE International Conference on Networking, Sensing and Control.2008:1726-1729.
[11]Rungsawang A, Angkawattanawit N. Learnable Topic-specific Web Crawler[J]. Journal of Network and Computer Applications, 2005,28(2):97-114.
[12] Li M, Du X Y, Wang S. Learning Ontology from Relational Database[C]. In: Proceedings of the 2005 International Conference on Machine Learning and Cybernetics.2005:3410-3415.
[13] Fang W, Cui Z M, Zhao P P. Ontology-based Focused Crawling of Deep Web Sources[C]. In: Proceedings of the KSEM 2007.2007:514-519.
[14]Rennie J, McCallum A K. Using Reinforcement Learning to Spider the Web Efficiently[C]. In: Proceedings of the 16th International Conference on Machine Learning. San Francisco,USA:Morgan Kaufmann Publishers, 1999:335-343.
[15]Su C, Gao Y, Yang J,et al. An Efficient Adaptive Focused Crawler Based on Ontology Learning[C]. In: Proceedings of the 5th International Conference of Hybrid Intelligent Systems.2005:6-9.
[16] Zheng H T, Kang B Y, Kim H G. An Ontology-based Approach to Learnable Focused Crawling[J].Information Sciences, 2008, 178(23):4512-4522.
[17] Zheng H T, Kang B Y, Kim H G. Learnable Focused Crawling Based on Ontology[C]. In: Proceedings of the AIRS 2008.2008:264-275.
[18]Aggarwal C C, Al-Garawi F, Yu  P S. Intelligent Crawling on the World Wide Web with Arbitrary Predicates[C]. In: Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China. New York:ACM Press, 2001:96-105.
[19]Aggarwal C C, Al-Garawi F, Yu P S. On the Design of a Learning Crawler for Topical Resource Discovery[J]. Transactions on Information Systems, 2001,19(3):286-309.
[20]Almpanidis G, Kotropoulos C, Pitas I. Combining Text and Link Analysis for Focused Crawling-An Application for Vertical Search Engines[J]. Information Systems, 2007,32(5):886-908.
[21] Ye Y M,Ma F Y,Lu Y M,et al. iSurfer: A Focused Web Crawler Based on Incremental Learning from Positive Samples[C]. In: Proceedings of the  Asia-Pacific Web 2004, Hangzhou , China. 2004:122-134.
[22] Liu H L, Kou C H, Wang G X. Efficiently Crawling Strategy for Focused Searching Engine[C]. In: Proceedings of the Asia-Pacific Web/WAIM 2007.2007:25-36.
[23] Tamma V, Phelps S, Dickinson I,et al . Ontologies for Supporting Negotiation in E-commerce[J]. Engineering Applications of Artificial Intelligence, 2005,18(2):223-236.
[24] Malucelli A, Palzer D, Oliveira E. Ontology-based Services to Help Solving the Heterogeneity Problem in E-commerce Negotiations[J]. Electronic Commerce Research and Applications, 2006,5(1):29-43.
[25] Lu Y,He H,Peng Q,et al. Clustering E-commerce Search Engines Based on Their Search Interface Pages Using WISE-Cluster[J]. Data & Knowledge Engineering, 2006,59(2):231-246.
[26] Zhai J,Shen L,Liang Y,et al. Application of Fuzzy Ontology to Information Retrieval for Electronic Commerce[C]. In:Proceedings of the 2008 International Symposium on Electronic Commerce and Security. Washington, DC, USA:IEEE Computer Society,2008:221-225.

[1] 李文娜, 张智雄. 基于联合语义表示的不同知识库中的实体对齐方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 1-9.
[2] 张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[3] 徐峥,乐小虬. 类目式文档语义特征AND-OR逻辑表达式生成方法[J]. 数据分析与知识发现, 2021, 5(5): 95-103.
[4] 石湘,刘萍. 基于知识元语义描述模型的领域知识抽取与表示研究 *——以信息检索领域为例[J]. 数据分析与知识发现, 2021, 5(4): 123-133.
[5] 张金柱, 于文倩. 基于短语表示学习的主题识别及其表征词抽取方法研究[J]. 数据分析与知识发现, 2021, 5(2): 50-60.
[6] 邵琦,牟冬梅,王萍,靳春妍. 基于语义的突发公共卫生事件网络舆情主题发现研究*[J]. 数据分析与知识发现, 2020, 4(9): 68-80.
[7] 魏庭新,柏文雷,曲维光. 词向量和语义知识相结合的汉语未登录词语义预测研究*[J]. 数据分析与知识发现, 2020, 4(6): 109-117.
[8] 邓思艺,乐小虬. 基于动态语义注意力的指代消解方法[J]. 数据分析与知识发现, 2020, 4(5): 46-53.
[9] 朱路,田晓梦,曹赛男,刘媛媛. 基于高阶语义相关的子空间跨模态检索方法研究*[J]. 数据分析与知识发现, 2020, 4(5): 84-91.
[10] 张冬瑜,崔紫娟,李映夏,张伟,林鸿飞. 基于Transformer和BERT的名词隐喻识别*[J]. 数据分析与知识发现, 2020, 4(4): 100-108.
[11] 张润彤,陈东华,赵红梅,朱晓敏. 基于中文语义分析的计算机辅助ICD-11编码方法研究*[J]. 数据分析与知识发现, 2020, 4(4): 44-55.
[12] 魏伟,郭崇慧,邢小宇. 基于语义关联规则的试题知识点标注及试题推荐*[J]. 数据分析与知识发现, 2020, 4(2/3): 182-191.
[13] 田钟林,吴旭,颉夏青,许晋,陆月明. 一种基于领域语义关系图的短文本实时分析模型*[J]. 数据分析与知识发现, 2020, 4(2/3): 239-248.
[14] 杨林, 黄晓硕, 王嘉阳, 李姣. 基于语义对齐的临床量表信息提取方法及其临床试验队列识别的应用研究*[J]. 数据分析与知识发现, 2020, 4(12): 33-44.
[15] 张金柱,主立鹏,刘菁婕. 基于表示学习的无监督跨语言专利推荐研究*[J]. 数据分析与知识发现, 2020, 4(10): 93-103.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn