利用双语词典检索英汉跨语言瓢窃文档对应内容

doi:10.11925/infotech.1003-3513.2014.07.16

现代图书情报技术

2014, Vol. 30

Issue (7): 114-119 https://doi.org/10.11925/infotech.1003-3513.2014.07.16

应用实践

本期目录 | 过刊浏览 | 高级检索

利用双语词典检索英汉跨语言瓢窃文档对应内容

秦颖

北京外国语大学计算机系, 北京100089

Applying Bilingual Lexicons to Detect Correspondences in English-Chinese Cross-lingual Plagiarism Documents

Qin Ying

Department of Computer Science, Beijing Foreign Studies University, Beijing 100089, China

摘要
参考文献
相关文章
Metrics

全文: PDF (616 KB) HTML
输出: BibTeX | EndNote (RIS)

摘要

[目的]在英汉跨语言票日窃文档中检索翻译对应内容。[方法]基于双语同典进行相似分析，合并整理同典以提高同语级匹配的准确率和效率，利用整体同频分布、匹配位置特征等解决歧义和多重匹配问题，根据同的对应情况、同的位置信息等综合加权计算句子及段落的相似度。[结果]在真实翻译语料上的实验结果表明，检索的准确率为0.841，召回率为0.748 0[局限]未登录同的翻译关系不易根据同典判定。[结论]基于双语同典检索跨语言相似内容的方法简单易行、适用面广。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	秦颖

Abstract：

[Objective] Translation correspondence in English-Chinese cross-lingual plagiarism documents is studied.[Methods] Similarity analysis is taken according to bilingual lexicons. To improve the precision and efficiency of corresponding words recognition, this study merges and sorts several bilingual lexicons. As to the problems of disambiguation and multiple matching, the paper proposes a method which applies word distribution and matching location to select the proper translation items. Similarities between sentences and paragraphs are defined on the stratified complex features such as word matching category, position of words and so on.[Results] Experiments on real translation documents show that precision and recall of retrieval reach 0.841 and 0.748 respectively.[Limitations] Out of Vocabulary (00V) correspondence is still hard to judge by lexicons.[Conclusions] The approach of cross-lingual similarity detection based on bilingual lexicons is easy to implement and has a wide range of application.

Key words： Cross-lingual plagiarism Similarity Ambiguity Bilingual lexicon OOV

收稿日期: 2014-02-27 出版日期: 2014-10-20

TP18

基金资助:

校级科研专项基金项目“基于平行语料库的学生译文自动评价研究与实现”（项目编号：2009JJ056）和全国教育科学规划课题“计算机辅助音译系统的研究与实现“（项目编号：GPA115033）的研究成果之一。

通讯作者: 秦颖E-mail：qinying@bfsu.edu.cn E-mail: qinying@bfsu.edu.c

引用本文:

秦颖. 利用双语词典检索英汉跨语言瓢窃文档对应内容[J]. 现代图书情报技术, 2014, 30(7): 114-119.
Qin Ying. Applying Bilingual Lexicons to Detect Correspondences in English-Chinese Cross-lingual Plagiarism Documents. New Technology of Library and Information Service, 2014, 30(7): 114-119.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2014.07.16 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2014/V30/I7/114

[1] Alzahrani S M, Salim N, Abraham A. Understanding Plagiarism Linguistic Patterns, Textual Features and Detection Methods[J]. IEEE Transactions on Systems, Man and Cybernetics, Part C: Applications and Reviews, 2012: 42(2): 133-149.
[2] Potthast M, Eiselt A, Barrón-Cedeño A, et al. Overview of the 3rd International Competition on Plagiarism Detection[C]. In: Proceeding of CLEF 2011 Labs and Workshop, Notebook Papers, Amsterdam, The Netherlands. 2011: 19-22.
[3] Pereira R C, Moreira V P, Galante R. A New Approach for Cross-language Plagiarism Analysis[C]. In: Proceedings of the 2010 International Conference on Multilingual and Multimodal Information Access Evaluation: Cross-language Evaluation Forum (CLEF’10). Berlin, Heidelberg: Springer- Verlag, 2010: 15-26.
[4] Barrón-Cedeño A, Rosso P, Agirre E, et al. Plagiarism Detection across Distant Language Pairs[C]. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). Stroudsburg: Association for Computational Linguistics, 2010: 37-45.
[5] 吕雅娟, 赵铁军, 李生. 单语句法分析指导的双语结构对齐[J]. 计算机研究与发展, 2003, 40(7): 970-976. （Lv Yajuan, Zhao Tiejun, Li Sheng. Bilingual Structure Alignment Based on Monolingual Parsing[J]. Journal of Computer Research and Development, 2003, 40(7): 970-976.）
[6] 刘非凡, 赵军, 徐波. 大规模非限定领域汉英双语语料库建设及句子对齐研究[C]. 见: 全国第七届计算语言学联合学术会议论文集. 2003: 339-345. （Liu Feifan, Zhao Jun, Xu Bo. Building Large-Scale Domain Independent Chinese- English Bilingual Corpus and the Researches on Sentence Alignment[C]. In: Proceedings of the 7th National Conference on Computational Linguistics. 2003: 339-345.）
[7] 邓丹, 刘群, 俞鸿魁. 基于双语词典的汉英词语对齐算法研究[J]. 计算机工程, 2005, 31(16): 45-47. （Deng Dan, Liu Qun, Yu Hongkui. Research of Chinese-English Word Alignment Algorithm Based on Bilingual Dictionary[J]. Computer Engineering, 2005, 31(16): 45-47.）
[8] Chen J. A Lexical Knowledge Base Approach for English- Chinese Cross-Language Information Retrieval[J]. Journal of the American Society for Information Science and Technology, 2006, 57(2): 233-243.
[9] Yarowsky D, Florian R. Evaluating Sense Disambiguation Across Diverse Parameter Spaces[J]. Natural Language Engineering, 2002, 8(4): 293-310.

[1]	王伟, 高宁, 徐玉婷, 王洪伟. 基于LDA的众筹项目在线评论主题动态演化分析^*[J]. 数据分析与知识发现, 2021, 5(10): 103-123.
[2]	陈星月, 倪丽萍, 倪志伟. 基于ELECTRA模型与词性特征的金融事件抽取方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 36-47.
[3]	石湘,刘萍. *基于知识元语义描述模型的领域知识抽取与表示研究 ^——以信息检索领域为例**[J]. 数据分析与知识发现, 2021, 5(4): 123-133.
[4]	韩康康,徐建民,张彬. 融合用户兴趣和多维信任度的微博推荐^*[J]. 数据分析与知识发现, 2020, 4(12): 95-104.
[5]	韩康康, 徐建民, 张彬. 融合用户兴趣和多维信任度的微博推荐 [J]. 数据分析与知识发现, 0, (): 1-.
[6]	尹浩然,曹金璇,曹鲁喆,王国栋. 扩充语义维度的BiGRU-AM突发事件要素识别研究^*[J]. 数据分析与知识发现, 2020, 4(9): 91-99.
[7]	李静,潘舒笑,李雪岩,贾立静,赵宇卓. 基于多目标量子优化分类器的急诊危重患者关键指标筛选 ^*[J]. 数据分析与知识发现, 2019, 3(12): 101-112.
[8]	温彦,马立健,曾庆田,郭文艳. 基于地理信息偏好修正和社交关系偏好隐式分析的POI推荐 ^*[J]. 数据分析与知识发现, 2019, 3(8): 30-39.
[9]	周成, 魏红芹. 基于随机森林属性约简的众包竞赛参与者识别体系研究^*[J]. 数据分析与知识发现, 2018, 2(7): 46-54.
[10]	花凌锋, 杨高明, 王修君. 面向位置的多样性兴趣新闻推荐研究^*[J]. 数据分析与知识发现, 2018, 2(5): 94-104.
[11]	羊柳, 傅柱, 王曰芬. 概念设计中的设计过程知识获取研究^*[J]. 数据分析与知识发现, 2018, 2(2): 29-36.
[12]	王曰芬, 傅柱, 吴鹏. 概念设计中基于知识流的语义化知识管理技术框架研究^*[J]. 数据分析与知识发现, 2018, 2(2): 2-10.
[13]	丁晟春, 刘梦露, 傅柱. 概念设计中基于知识流的多维设计知识统一建模技术研究^*[J]. 数据分析与知识发现, 2018, 2(2): 11-19.
[14]	郭博, 李守光, 王昊, 张晓军, 龚伟, 于昭君, 孙宇. 电商评论综合分析系统的设计与实现——情感分析与观点挖掘的研究与应用[J]. 数据分析与知识发现, 2017, 1(12): 1-9.
[15]	熊回香, 蒋武轩. 基于标签与关系网络的用户聚类推荐研究^*[J]. 数据分析与知识发现, 2017, 1(6): 36-46.

Viewed

Full text

Abstract

Cited

Shared

Discussed