Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (7): 114-119    DOI: 10.11925/infotech.1003-3513.2014.07.16
Current Issue | Archive | Adv Search |
Applying Bilingual Lexicons to Detect Correspondences in English-Chinese Cross-lingual Plagiarism Documents
Qin Ying
Department of Computer Science, Beijing Foreign Studies University, Beijing 100089, China
Download: PDF(616 KB)   HTML  
Export: BibTeX | EndNote (RIS)      

[Objective] Translation correspondence in English-Chinese cross-lingual plagiarism documents is studied.[Methods] Similarity analysis is taken according to bilingual lexicons. To improve the precision and efficiency of corresponding words recognition, this study merges and sorts several bilingual lexicons. As to the problems of disambiguation and multiple matching, the paper proposes a method which applies word distribution and matching location to select the proper translation items. Similarities between sentences and paragraphs are defined on the stratified complex features such as word matching category, position of words and so on.[Results] Experiments on real translation documents show that precision and recall of retrieval reach 0.841 and 0.748 respectively.[Limitations] Out of Vocabulary (00V) correspondence is still hard to judge by lexicons.[Conclusions] The approach of cross-lingual similarity detection based on bilingual lexicons is easy to implement and has a wide range of application.

Key wordsCross-lingual plagiarism      Similarity      Ambiguity      Bilingual lexicon      OOV     
Received: 27 February 2014      Published: 20 October 2014
:  TP18  

Cite this article:

Qin Ying. Applying Bilingual Lexicons to Detect Correspondences in English-Chinese Cross-lingual Plagiarism Documents. New Technology of Library and Information Service, 2014, 30(7): 114-119.

URL:     OR

[1] Alzahrani S M, Salim N, Abraham A. Understanding Plagiarism Linguistic Patterns, Textual Features and Detection Methods[J]. IEEE Transactions on Systems, Man and Cybernetics, Part C: Applications and Reviews, 2012: 42(2): 133-149.
[2] Potthast M, Eiselt A, Barrón-Cedeño A, et al. Overview of the 3rd International Competition on Plagiarism Detection[C]. In: Proceeding of CLEF 2011 Labs and Workshop, Notebook Papers, Amsterdam, The Netherlands. 2011: 19-22.
[3] Pereira R C, Moreira V P, Galante R. A New Approach for Cross-language Plagiarism Analysis[C]. In: Proceedings of the 2010 International Conference on Multilingual and Multimodal Information Access Evaluation: Cross-language Evaluation Forum (CLEF’10). Berlin, Heidelberg: Springer- Verlag, 2010: 15-26.
[4] Barrón-Cedeño A, Rosso P, Agirre E, et al. Plagiarism Detection across Distant Language Pairs[C]. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). Stroudsburg: Association for Computational Linguistics, 2010: 37-45.
[5] 吕雅娟, 赵铁军, 李生. 单语句法分析指导的双语结构对齐[J]. 计算机研究与发展, 2003, 40(7): 970-976. (Lv Yajuan, Zhao Tiejun, Li Sheng. Bilingual Structure Alignment Based on Monolingual Parsing[J]. Journal of Computer Research and Development, 2003, 40(7): 970-976.)
[6] 刘非凡, 赵军, 徐波. 大规模非限定领域汉英双语语料库建设及句子对齐研究[C]. 见: 全国第七届计算语言学联合学术会议论文集. 2003: 339-345. (Liu Feifan, Zhao Jun, Xu Bo. Building Large-Scale Domain Independent Chinese- English Bilingual Corpus and the Researches on Sentence Alignment[C]. In: Proceedings of the 7th National Conference on Computational Linguistics. 2003: 339-345.)
[7] 邓丹, 刘群, 俞鸿魁. 基于双语词典的汉英词语对齐算法研究[J]. 计算机工程, 2005, 31(16): 45-47. (Deng Dan, Liu Qun, Yu Hongkui. Research of Chinese-English Word Alignment Algorithm Based on Bilingual Dictionary[J]. Computer Engineering, 2005, 31(16): 45-47.)
[8] Chen J. A Lexical Knowledge Base Approach for English- Chinese Cross-Language Information Retrieval[J]. Journal of the American Society for Information Science and Technology, 2006, 57(2): 233-243.
[9] Yarowsky D, Florian R. Evaluating Sense Disambiguation Across Diverse Parameter Spaces[J]. Natural Language Engineering, 2002, 8(4): 293-310.

[1] Peng Guan,Yuefen Wang,Zhu Fu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[2] Peiyao Zhang,Dongsu Liu. Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM[J]. 数据分析与知识发现, 2019, 3(3): 95-101.
[3] Sisi Gui,Xiaojuan Zhang,Xin Wang. Automatically Rating Query Ambiguity with Alt-Metrics[J]. 数据分析与知识发现, 2019, 3(2): 79-89.
[4] Dan Wu,Liuxing Lu. Semantic Changes of Queries from Cross-device Searching[J]. 数据分析与知识发现, 2018, 2(8): 69-78.
[5] Haixia Sun,Lei Wang,Yingjie Wu,Weina Hua,Junlian Li. Matching Strategies for Institution Names in Literature Database[J]. 数据分析与知识发现, 2018, 2(8): 88-97.
[6] Ya’nan Zhao,Yuqing Wang. Research on Collaborative Filtering Traveling Products Recommendation Algorithm Based on IUNCF[J]. 数据分析与知识发现, 2018, 2(7): 63-71.
[7] Mansheng Xiao, Lijuan Zhou, Zhicheng Wen. A Fuzzy C-Means Algorithm Based on Huffman Tree[J]. 数据分析与知识发现, 2018, 2(7): 81-88.
[8] Daoping Wang,Zhongyang Jiang,Boqing Zhang. Collaborative Filtering Algorithm Based on Gray Correlation Analysis and Time Factor[J]. 数据分析与知识发现, 2018, 2(6): 102-109.
[9] Lin Li,Hui Li. Computing Text Similarity Based on Concept Vector Space[J]. 数据分析与知识发现, 2018, 2(5): 48-58.
[10] Yong Wang,Yongdong Wang,Huifang Guo,Yumin Zhou. Measuring Item Similarity Based on Increment of Diversity[J]. 数据分析与知识发现, 2018, 2(5): 70-76.
[11] Lingfeng Hua,Gaoming Yang,Xiujun Wang. Recommending Diversified News Based on User’s Locations[J]. 数据分析与知识发现, 2018, 2(5): 94-104.
[12] Junwan Liu,Bo Yang,Feifei Wang. Ranking Scholarly Impacts Based on Citations and Academic Similarity[J]. 数据分析与知识发现, 2018, 2(4): 59-70.
[13] Yuying Wu,Ping Sun,Xijun He,Guorui Jiang. Predicting Transactions Among Agents in Patent Transfer Weighted Networks for New Energy[J]. 数据分析与知识发现, 2018, 2(11): 73-79.
[14] Jianmin Xu,Caiyun Xu. Computing Similarity of Sci-Tech Documents Based on Texts and Formulas[J]. 数据分析与知识发现, 2018, 2(10): 103-109.
[15] Erjing Chen,Enbo Jiang. Review of Studies on Text Similarity Measures[J]. 数据分析与知识发现, 2017, 1(6): 1-11.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938