Please wait a minute...
New Technology of Library and Information Service  2008, Vol. 24 Issue (7): 43-46    DOI: 10.11925/infotech.1003-3513.2008.07.09
Current Issue | Archive | Adv Search |
The Study on the Duplicated Web Pages Detection Algorithm Based on the Keyword from User’s Submission
Xie Hui   Qin Jie   Hu Shuangshuang
(College of Information Science and Engineering,Henan University of Technology,Zhengzhou  450001,China)
Download: PDF(390 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

 Based on the study of the duplicated Web pages detection algorithm with feature code, the paper proposes a duplicated detection algorithm based on the keyword from user’s  submission for meta search engine. The main steps of algorithm are introduced. And this algorithm is tested and verified its validity in an experiment.

Key wordsDuplicate detection      Meta search      Feature code      Chinese word segmentation     
Received: 27 March 2008      Published: 25 July 2008
: 

TP285

 
Corresponding Authors: Xie Hui     E-mail: xiehui0517@163.com
About author:: Xie Hui,Qin Jie,Hu Shuangshuang

Cite this article:

Xie Hui,Qin Jie,Hu Shuangshuang. The Study on the Duplicated Web Pages Detection Algorithm Based on the Keyword from User’s Submission. New Technology of Library and Information Service, 2008, 24(7): 43-46.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2008.07.09     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2008/V24/I7/43

[1] Cho J,Shivakumar N, Garcia-Molina H.Finding Replicated Web Collections[C].In:Proceedings of the ACM International Conference on Management of the Data. USA:ACM Press, May 2000,29(2):355-366.
[2] 孔素然.基于模糊匹配思想的网页去重算法[D].上海:复旦大学,2006.
[3] 唐培丽,胡明,解飞.元搜索引擎研究[J].气象水文海洋仪器,2005(3):62-66.
[4] 刘迁,贾惠波.中文信息处理中自动分词技术的研究与展望[J].计算机工程与应用,2006,42(3):175-177,182.
[5] Ye S, Song R, Wen J-R, et al. A Query-dependent Duplicate Detection Approach for Large Scale Search Engines[C]. In: Proceedings of the 6th Asia-Pacific Web Conference, 2004:48-58.
[6] Fetterly D, Manasse M, Najork M .On the Evolution of Clusters of Near-Duplicate Web Pages[C]. In:Proceedings of the 1st Conference on Latin American Web Congress, 2003:37-45.
[7] Ye S,Wen J R,Ma W Y.A Systematic Study on Parameter Correlations in Large-scale Duplicate Document Detection[J].Knowledge and Information Systems, 2008,14(2):217-232.

[1] Guoming Feng,Xiaodong Zhang,Suhui Liu. DBLC Model for Word Segmentation Based on Autonomous Learning[J]. 数据分析与知识发现, 2018, 2(5): 40-47.
[2] Weijian Ni,Haohao Sun,Tong Liu,Qingtian Zeng. An Unsupervised Approach to Optimize Chinese Word Segmentation on Domain Literature[J]. 数据分析与知识发现, 2018, 2(2): 96-104.
[3] Yue Zhang,Dongbo Wang,Danhao Zhu. Segmenting Chinese Words from Food Safety Emergencies[J]. 数据分析与知识发现, 2017, 1(2): 64-72.
[4] Yu Xincong, Li Honglian, Lv Xueqiang. Research on the Application of Hyponymy in the Enrollment Robot[J]. 现代图书情报技术, 2015, 31(12): 65-71.
[5] Zhang Jie, Zhang Haichao, Zhai Dongsheng. Research of the Word Segmentation for Chinese Patent Claims[J]. 现代图书情报技术, 2014, 30(9): 91-98.
[6] Li Gang, Mao Jin, Chen Jinghao. Fast Duplicate Detection for Chinese Texts Based on Semantic Fingerprint[J]. 现代图书情报技术, 2013, 29(9): 41-47.
[7] Li Wenjiang, Chen Shiqin. Application of AIMLBot Intelligent Robot in Real-time Virtual Reference Service[J]. 现代图书情报技术, 2012, 28(7): 127-132.
[8] Jiang Hua, Su Xiaoguang. Chinese High-frequency Words Extraction Algorithm Without Thesaurus[J]. 现代图书情报技术, 2012, 28(6): 50-53.
[9] Shi Chongde, Wang Huilin. Research on Chinese Word Segmentation Optimization in Statistical Machine Translation[J]. 现代图书情报技术, 2012, 28(4): 29-34.
[10] Gu Jun, Wang Hao. Study on Term Extraction on the Basis of Chinese Domain Texts[J]. 现代图书情报技术, 2011, 27(4): 29-34.
[11] Zhang Jinzhu,Zhang Dong,Wang Huilin. The Research of Character-Position-Based Chinese Word Segmentation[J]. 现代图书情报技术, 2008, 24(5): 39-43.
[12] Yao Xingshan. The Improvement in a Chinese Word Segmentation Based on Hash Algorism[J]. 现代图书情报技术, 2008, 24(3): 78-81.
[13] Ouyang Jian,Li Guansheng . Application of Meta Search Engine Principle in Distributed of Virtual Union Catalog[J]. 现代图书情报技术, 2006, 1(9): 63-67.
[14] Wu Shaogen . Study of Scheme Automaton for Chinese Word Automatic Segmentation[J]. 现代图书情报技术, 2006, 1(5): 47-49.
[15] Zhang Jiangong,Chen Dingquan,Wu Zhenxin. Research on Search Engine and Meta Search Engine[J]. 现代图书情报技术, 2002, 18(2): 36-38.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn