In the paper,a new document copy detection algorithm based on the similarity of the sentences is proposed.In order to improve the detection accuracy,the authors not only emphasize on the whole document,but also on the structure of the document.In the end,experiments and comparison are taken between the new algorithm and other typical algorithms,the result shows that it is feasible.
秦新国. 基于句子相似度的文档复制检测算法研究[J]. 现代图书情报技术, 2007, 2(11): 63-66.
Qin Xinguo. Research on the Copy Detection Based on the Similarity of Sentences. New Technology of Library and Information Service, 2007, 2(11): 63-66.
[1] 史彦军,滕弘飞,金博抄袭论文识别研究与进展[J]大连理工大学学报,2005,45(1):50-57
[2] 鲍军鹏,沈钧毅,刘晓东,等自然语言文档复制检测研究综述[J]软件学报,2003,14(10):1753-1760
[3] NamOh Kang,Alexander Gelbukh,et al.PPCheck:Plagiarism Pattern Checker in Document Copy Detection[EB/OL] .http://www.gelbukh.com/CV/Publications/2006/TSD-2006-Plagiarism.pdf.
[4] 何明,胡彩霞一种文本相似性的度量方法和计算方法[J]黄山学院学报,2005,7(6):71-72
[5] 宋擒豹,杨向荣,沈钧义,等数字商品非法复制的检测算法[J]计算机学报,2002,25(11):1206-1211
[6] Andrei Z B.On the Resemblance and Containment of Documents[C].Compression and Complexity of SEQUENCES1997,Salerno,Italy,1997:21-29
[7] Shivakumar N,Molina H G.SCAM:A Copy Detection Mechanism for Digital Documents[C]The 2nd International Conference in Theory and Practice of Digital Libraries,Austin,Texas,USA,1995:9-17
[8] Manber U.Finding Similar Files in a Large File System[C].USENIX Conference,SanFrancisco,CA,1994:1-10