[1] Yi L, Liu B. Web Page Cleaning for Web Mining Through Feature Weighting[C]. In:Proceedings of the 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico. 2003:9-15.
[2] Li J, Ezeife C I. Cleaning Web Pages for Effective Web Content Mining[C].In:Proceedings of the 17th International Conference. 2006:560-571.
[3] Laender A, Riberiro-Neto B, Da Silva A S, et al. A Brief Survey of Web Data Extraction Tools[J]. ACM SIGMOD Record, 2002,31(2):84-93.
[4] Adelberg B. NoDoSE - A Tool for Semi-Automatically Extracting Structured and Semistructured Data from Text Documents[J]. ACM SIGMOD Record,1998,27(2):283-294.
[5] Baumgartner R, Flesca S, Gottlob G. Visual Web Information Extraction with Lixto[C]. In:Proceedings of the 27th International Conference on Very Large Data Bases.2001:119-128.
[6] Chidlowskii B, Ragetli J, De Rijke M. Automatic Wrapper Generation for Web Search Engines[C].In: Proceedings of the 1st International Conference on Web-Age Information Management,Shanghai, China. London, UK:Springer-Verlag,2005:66-75.
[7] Crescenzi V, Mecca G, Merialdo P. RoadRunner:Towards Automatic Data Extraction from Large Web Sites[C]. In:Proceedings of the 27th International Conference on Very Large Data Bases.2001:109-118.
[8] DOM Specification[EB/OL].[2008-09-03].http://www.w3.org/DOM/DOMTR.
[9] Debnath S,Mitra P,Giles C L. Automatic Extraction of Informative Blocks from Webpages[C].In: Proceedings of the 2005 ACM Symposium on Applied Computing,Santa Fe, New Mexico.2005:1722-1726.
[10] 骆思安,徐俊杰.应用MMB算法清理网页噪声和撷取网页[EB/OL].[2009-06-25]. http://ccnet.km.nccu.edu.tw/xms/read_attach.php?id=129.
[11] Zhao H K,Meng W Y, Wu Z H,et al.Fully Automatic Wrapper Generation for Search Engines[C]. In: Proceedings of the 14th International Conference on World Wide Web,Chiba, Japan. ACM Press,2005: 66-75.
[12] Song R, Liu H, Wen J R,et al. Learning Important Models for Web Page Blocks Based on Layout and Content Analysis[J]. ACM SIGKDD Explorations,2005,6(2):14-23.
[13] Buttler D, Liu L, Pu C. A Fully Automated Object Extraction System for the World Wide Web[EB/OL]. [2008-11-10]. http://ieeexplore.ieee.org/iel5/7339/19871/00918966.pdf?arnumber=918966.
[14] Yi L, Liu B,Xiao L.Eliminating Noisy Information in Web Pages for Data Mining[C].In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington DC, USA.2003:296-305.
[15] Debnath S,Mitra P,Giles C L.Identifying Content Blocks from Web Documents[J]. Lecture Notes in Computer Science, 2005(3488):285-293. |