|
|
The Algorithm of Forecasting URL-Topic Based on Web Structure and Web Page Contents |
Liu Hong Shao Xiaoliang Hu Jibing |
(The Network Information Center of Second Military Medical University, Shanghai 200433, China) |
|
|
Abstract This paper introduces primarily a core Algorithm of Web topic information gathering system that we designed——the Forecast URL-Topic Algorithm. It bases on the related theories, analyzes the experiment data and discovers the topic of the hyperlink be decided by three factors primarily: the topic Similarity of the parent Web page, the topic Similarity of the (ex-)anchor text and the structure characteristic of Web graph, then puts forward the algorithm of Forecasting URL-Topic based on Web structure and Web page contents, the system evaluation result shows that the algorithm has great efficiency.
|
Received: 31 December 2004
Published: 25 May 2005
|
|
Corresponding Authors:
Liu Hong
E-mail: llhhyybb@163.com
|
About author:: Liu Hong,Shao Xiaoliang,Hu Jibing |
1Jon M. KleinbergAuthoritative Sources in a Hyperlinked EnvironmentTarjan RE, Baecker T, eds. Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms. New Orleans: ACM Press, 1997:668-677
2Andrei Broder, Ravi Kumar, Farzin Maghoul etcGraph structure in the Web: Experiments and models.9th World Wide Web Conference, 2000
3Charu C. Aggarwal, Fatima Al-Garawi and Philip S. YuIntelligent Crawling on the World Wide Web with Arbitrary Predicates".WWW10, May 2-5, 2001, Hong Kong ACM 1-58113-348-0/01/0005
4Andrei Broder, Ravi Kumar, Farzin Maghoul etcGraph structure in the Web: Experiments and models. In 9th World Wide Web Conference, 2000
5Golub GH, Van Loan CFMatrix Computations, London, Johns Hopkins University Press, 1989:40-45
6Jon Kleinberg and Steve LawrenceThe Structure of the WebS C I E N C E'S COMPA S S, www.sciencemag.org, SCIENCE VOL 294 30 NOVEMBER 2001
7李培,赵麟网上证券金融信息采集系统的研究现代图书情报技术2001(6):56-59
8李勇,桑艳艳网络文本数据分类技术与实现算法情报学报,2002(1):21-26
9李盛韬,余智华,程学旗,白硕Web信息采集研究进展计算机科学,2003(2):151-157,171
10王晓宇,周傲英万维网的链接结构分析及其应用综述软件学报,2003,14(10):1768-1780
11刘红利用扩展锚点文本来分类网页计算机应用研究,2004,21(3):112-113,124
12刘红在军训网中构建基于Web的主题信息采集系统硕士毕业论文,2004(7) |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|