Please wait a minute...
New Technology of Library and Information Service  2005, Vol. 21 Issue (5): 41-45    DOI: 10.11925/infotech.1003-3513.2005.05.10
Current Issue | Archive | Adv Search |
The Algorithm of Forecasting URL-Topic Based on Web Structure  and Web Page Contents
Liu Hong   Shao Xiaoliang   Hu Jibing
(The Network Information Center of  Second Military Medical University, Shanghai  200433, China)
Download: PDF(0 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

This paper introduces primarily a core Algorithm of Web topic information gathering system that we designed——the Forecast URL-Topic Algorithm. It bases on the related theories, analyzes the experiment data and discovers the topic of the hyperlink be decided by three factors primarily: the topic Similarity of the parent Web page, the topic Similarity of the (ex-)anchor text and the structure characteristic of Web graph, then puts forward the algorithm of Forecasting URL-Topic based on Web structure and Web page contents, the system evaluation result shows that the algorithm has great efficiency.

Key wordsWeb structure      Hyperlink      Topic      Forecast      Algorithm     
Received: 31 December 2004      Published: 25 May 2005
: 

TP391

 
Corresponding Authors: Liu Hong     E-mail: llhhyybb@163.com
About author:: Liu Hong,Shao Xiaoliang,Hu Jibing

Cite this article:

Liu Hong,Shao Xiaoliang,Hu Jibing. The Algorithm of Forecasting URL-Topic Based on Web Structure  and Web Page Contents. New Technology of Library and Information Service, 2005, 21(5): 41-45.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2005.05.10     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2005/V21/I5/41

1Jon M. KleinbergAuthoritative Sources in a Hyperlinked EnvironmentTarjan RE, Baecker T, eds. Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms. New Orleans: ACM Press, 1997:668-677
2Andrei Broder, Ravi Kumar, Farzin Maghoul etcGraph structure in the Web: Experiments and models.9th World Wide Web Conference, 2000
3Charu C. Aggarwal, Fatima Al-Garawi and Philip S. YuIntelligent Crawling on the World Wide Web with Arbitrary Predicates".WWW10, May 2-5, 2001, Hong Kong ACM 1-58113-348-0/01/0005
4Andrei Broder, Ravi Kumar, Farzin Maghoul etcGraph structure in the Web: Experiments and models. In 9th World Wide Web Conference, 2000
5Golub GH, Van Loan CFMatrix Computations, London, Johns Hopkins University Press, 1989:40-45
6Jon Kleinberg and Steve LawrenceThe Structure of the WebS C I E N C E'S COMPA S S, www.sciencemag.org, SCIENCE VOL 294 30 NOVEMBER 2001
7李培,赵麟网上证券金融信息采集系统的研究现代图书情报技术2001(6):56-59
8李勇,桑艳艳网络文本数据分类技术与实现算法情报学报,2002(1):21-26
9李盛韬,余智华,程学旗,白硕Web信息采集研究进展计算机科学,2003(2):151-157,171
10王晓宇,周傲英万维网的链接结构分析及其应用综述软件学报,2003,14(10):1768-1780
11刘红利用扩展锚点文本来分类网页计算机应用研究,2004,21(3):112-113,124
12刘红在军训网中构建基于Web的主题信息采集系统硕士毕业论文,2004(7)

[1] Ruihua Qi,Junyi Zhou,Xu Guo,Caihong Liu. Extracting Book Review Topics with Knowledge Base[J]. 数据分析与知识发现, 2019, 3(6): 83-91.
[2] Mengji Zhang,Wanyu Du,Nan Zheng. Predicting Stock Trends Based on News Events[J]. 数据分析与知识发现, 2019, 3(5): 11-18.
[3] Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[4] Xiaolan Wu,Chengzhi Zhang. Analysis of Knowledge Flow Based on Academic Social Networks:
A Case Study of ScienceNet.cn
[J]. 数据分析与知识发现, 2019, 3(4): 107-116.
[5] Jiang Wu,Guanjun Liu,Xian Hu. An Overview of Online Medical and Health Research: Hot Topics, Theme Evolution and Research Content[J]. 数据分析与知识发现, 2019, 3(4): 2-12.
[6] Lu An,Yanping Liang. Selection of Users’ Behaviors Towards Different Topics of Microblog on Public Health Emergencies[J]. 数据分析与知识发现, 2019, 3(4): 33-41.
[7] Tingxin Wen,Yangzi Li,Jingshuang Sun. News Hotspots Discovery Method Based on Multi Factor Feature Selection and AFOA/K-means[J]. 数据分析与知识发现, 2019, 3(4): 97-106.
[8] Guangshang Gao. A Survey of User Profiles Methods[J]. 数据分析与知识发现, 2019, 3(3): 25-35.
[9] Peiyao Zhang,Dongsu Liu. Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM[J]. 数据分析与知识发现, 2019, 3(3): 95-101.
[10] Linna Xi,Yongxiang Dou. Examining Reposts of Micro-bloggers with Planned Behavior Theory[J]. 数据分析与知识发现, 2019, 3(2): 13-20.
[11] Hongqinling Wang,Zhichao Ba,Gang Li. Conversational Topic Intensity Calculation and Evolution Analysis of WeChat Group[J]. 数据分析与知识发现, 2019, 3(2): 33-42.
[12] Jie Zhang,Junbo Zhao,Dongsheng Zhai,Ningning Sun. Patent Technology Analysis of Microalgae Biofuel Industrial Chain Based on Topic Model[J]. 数据分析与知识发现, 2019, 3(2): 52-64.
[13] Yanshuang Mei,Hengmin Zhu,Jing Wei. A Study on the Mechanism of Media Collaboration on the Spread of Internet Public Opinion[J]. 数据分析与知识发现, 2019, 3(2): 65-71.
[14] Junwan Liu,Zhixin Long,Feifei Wang. Finding Collaboration Opportunities from Emerging Issues with LDA Topic Model and Link Prediction[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
[15] Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn