[1] Shivakumar N, Garcia-Molina H. SCAM: A Copy Detection Mechanism for Digital Documents[C]. In:Proceedings of 2nd International Conference in Theory and Practice of Digital Libraries. Austin, Texas, 1995.
[2] Yan T, Garcia-Molina H. The Sift Information Dissemination System[J]. ACM Trans. on Database Systems, 1999,24(4):529-565.
[3] Kirriemuir J W, Willett P. Identification of Duplicate and Near–duplicate Full-text Records in Database Search Outputs Using Hierarchic Cluster Analysis[J]. Program, 1995, 29(3):241-256.
[4] Buckley C, Carrie C, Mardis S, et al. The Smart/empire TIPSTER IR System[C]. In:Proceedings of TIPSTER Phase 3 Workshop. San Francisco: Morgan Kaufmann Publishers, 1999:107-121.
[5] 张文涛.www上一种Meta-Search Engine的研究与实现[D]. 北京:清华大学,2002.
[6] 张刚,刘挺,郑实福,等. 大规模网页快速去重算法[C]. 中国中文信息学学会二十周年学术会论文集(续集), 2001(11):18-25.
[7] Ukkonen E. On-line Construction of Suffix Trees[J]. Algorithmica, 1995, 14(3):249-260.
[8] Chang W I, Lawler E L. Sublinear Expected Time Approximate String Matching and Biological Applications[J]. Algorithmica, 1994,12(4):327-344.
[9] Yan T W, Garcia-Molina H. Duplicate Removal in Information Dissemination[C]. In:Proceedings of the 21st International Conference on Very Large Data Bases. Zurich, Switzerland, 1995.
[10] 吴平博,陈群秀,马亮. 基于特征串的大规模中文网页快速去重算法研究[J]. 中文信息学报,2003,17(2):28-35.
[11] 南京大学信息技术开发研究所. 江苏法院网络舆情分析系统[EB/OL].(2007-10-08). [2007-11-08]. http://218.94.26.134.
[12] Weiner P. Linear Pattern Matching Algorithms[C]. In: Proceedings of the 14th IEEE Annual Symposium on Switching and Automata Theory, 1973:1-11.
[13] McCreight E. A Space-economical Suffix Tree Construction Algorithm[J]. Journal of the ACM, 1976, 23(2):262-272.
[14] Gusfield D. Algorithms on Strings, Trees, and Sequence: Computer Science and Computational Biology[M]. New York: Cambridge University Press,1997:87-207.
[15] Drozdek A. Data Structures and Algorithms in Java[M].2nd edition. Beijing: China Machine Press, 2006:707-712.
[16] Zhou Meili. Some Concepts and Mathematical Consideration of Similarity System Theory[J]. Journal of System Science and System Engineering, 1992, 1(1):84-92.
[17] Hirschberg D S. A Linear Space Algorithm for Computing Maximal Common Subsequences[J]. Communications of the ACM, 1975, 18(6):341-343. |