Concerning the present problem of a growing academic plagiarism,the algorithm of the text copy detection based on text structure tree is put forward.A paper can be divided into a construction tree with three layers:the uppermost root node is a text;branch node represents a sentence bag;leaf node denotes sentence.According to synthetic similarity and a function this paper computes sentence similarity,and similarity of leaf node is based on maximal sentence similarity.At the same time,the upper similarity is derived from the adjacent lower similarity.Finally,papers of China Journal Full-Text Database is chosen for a test,and the experimental result shows that this algorithm is feasible and efficient.
王森,王宇. 基于文本结构树的论文复制检测算法[J]. 现代图书情报技术, 2009, (10): 50-55.
Wang Sen,Wang Yu. Algorithm of the Text Copy Detection Based on Text Structure Tree. New Technology of Library and Information Service, 2009, (10): 50-55.
[1] Brin S, Davis J, Garcia-Molina H. Copy Detection Mechanisms for Digital Documents [C]. In: Proceedings of the ACM SIGMOD Annual Conference. New York: ACM Press, 1995:398-409.
[2] Shivakumar N, Garcia-Molina H. SCAM: A Copy Detection Mechanism for Digital Documents [C]. In: Proceedings of the 2nd International Conference on Theory and Practice of Digital Libraries,Austin,Texas. 1995:1-13.
[3] Si A, Leong H V, Lau R H. CHECK: A Document Plagiarism Detection System [C]. In: Proceedings of the ACM Symposium for Applied Computing.1997: 70-77.
[4] 宋擒豹,沈钧毅.数字商品非法复制和扩散的监测机制[J].计算机研究与发展, 2001,38(1):121-125.
[5] 鲍军鹏,沈钧毅,刘晓东,等.自然语言文档复制检测研究综述[J].软件学报,2003,14(10): 1753-1760.
[6] 史彦军,滕弘飞,金博.抄袭论文识别研究与进展[J].大连理工大学学报, 2005,45(1): 50-57.
[7] 易彤,徐升华,万常选,等.抄袭剽窃论文识别研究综述[J]. 情报学报, 2007,26(4): 567-573.
[8] 化柏林.基于句子匹配的文章自写度测评系统[J].现代图书情报技术,2007(11): 40-44.
[9] 林鸿飞,战学刚,姚天顺.文本层次分析与文本浏览[J].中文信息学报, 1999(4):7-13.
[10] 秦新国.基于句子相似度的文档复制检测算法研究[J].现代图书情报技术, 2007(11): 63-66.
[11] ICTCLAS汉语分词系统.ICTCLAS2009版在线演示 [P/OL]. [2009-09-18]. http://ictclas.org/test.html.
[12] 何维,王宇.基于句子关系图的网页文本主题句抽取[J].现代图书情报技术,2009(3):57-61.
[13] 吕学强,任飞亮,黄志丹,等.句子相似模型和最相似句子查找算法[J].东北大学学报:自然科学版, 2003,24(6):531–534.
[14] Dietterich T G, Lathrop R H, Lozano-Perez T. Solving the Multiple-instance Problem with Axis-parallel Rectangles [J]. Artificial Intelligence,1997,89(1-2):31-71.