|
|
on the Specific Topic on Web |
Ding Yi |
(Department of Computer Science and Technology, Hubei Normal University, Huangshi 435000, China) |
|
|
Abstract Information Retrieval (IR) on the Web is the automatic retrieval of all relevant documents, the same as resource finding of intended Web documents, while the same time retrieves as few of the non-relevant as possible. Web IR has become very popular and favorite at present. It concentrates on the using traditional text IR methods in the Internet, as well as the properties of Web graph. This research focuses on how to effectively and broadly get relevant Web pages and contents, filter Web pages and assign proper labels for them. Accurate finding user-specific information in the Web is very difficult. And traditional Web search engines take a query as input and produce a set of (hopefully) relevant pages that match the query terms. While useful in many circumstances, search engines have the disadvantage that users have to formulate queries that specify their information need, which is prone to errors. Based on the discussion of Page Rank, HITS and similarity between Web texts, some new algorithms called RG-HITS (Resemblance Graph-HITS) for finding relevant documents on the Web are introduced.
|
Received: 08 February 2005
Published: 25 June 2005
|
|
Corresponding Authors:
Ding Yi
E-mail: a_carrie@sina.com
|
About author:: Ding Yi |
1Filippo Menczer, Gautam Pant, Padmini Srinivasan, et al. Evaluating Topic-Driven Web Crawlers. 21st ACM International Conference on Research and Development in Information Retrieval .New Orleans,Lonisiana,USA.2002:241-249
2T. Hofmann. The cluster-abstraction model: Unsupervised learning of topic hierarchies from text data. Proceedings of 16th International Joint Conference on Artificial Intelligence (IJCAI'99). Stockholm, Sweden. 1999:682-687
3Kleinberg M.Authoritative Sources in a Hyperlinked Eveironment.Journal of the ACM,1999,46(5):604-632
4U. Y. Nahm and R. J. Mooney. Ua mutually beneficial integration of data mining and information extraction. Proceedings of the 17th National Conference on Artificial Intelligence (AAAI '00). AAAI Press, 2000:627-632
5叶允明,马范援,于水等. Igloo分布式爬虫系统的性能优化. 李晓明,李星主编. 搜索引擎与Web挖掘进展. 北京:高等教育出版社,2003:1-8 |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|