Please wait a minute...
Advanced Search
现代图书情报技术  2015, Vol. 31 Issue (1): 31-37     https://doi.org/10.11925/infotech.1003-3513.2015.01.05
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
典籍英译作者身份识别研究
祁瑞华1, 霍跃红2, 郭旭1, 刘彩虹1
1. 大连外国语大学计算机教研部 大连 116044;
2. 大连外国语大学英语学院 大连 116044
Authorship Identification in English Translations of Chinese Classics
Qi Ruihua1, Huo Yuehong2, Guo Xu1, Liu Caihong1
1. Computer Education Department, Dalian University of Foreign Languages, Dalian 116044, China;
2. School of English Studies, Dalian University of Foreign Languages, Dalian 116044, China
全文: PDF (582 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的] 分析典籍英译作者身份识别的关键问题, 提出不完整数据作者身份识别的有效方法。[方法] 针对诗词典籍篇幅短小和语料不平衡的特点, 建立基于词汇、句子和语篇层面的文体特征向量空间模型, 提出用于不完整数据作者身份识别的加权朴素信念分类算法。[结果] 加权朴素信念分类算法可以有效改善朴素信念分类算法性能, 与目前主流分类算法对比实验表明其在不完整数据集上具有很好的综合性能。[局限] 需进一步扩展数据集的样本数量和作者数量, 在大数据集上提高文体特征提取效率和作者身份识别的准确性。[结论] 提出的多层面文体特征模型和加权朴素信念分类算法在诗词典籍英译作品集上具有较好的准确性和应用性。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
霍跃红
郭旭
刘彩虹
祁瑞华
关键词 典籍英译作者身份不完整数据    
Abstract

[Objective] This paper analyzes the key issues of the authorship indentification in English translations of Chinese classics and proposes the effective way to identify the authorship of incomplete data. [Methods] Based on the stylistic features composed of vocabulary level, sentence level and discourse level, the stylistic feature vector space model for poetry translation texts is established. From the angle of the characteristics of imbalance poetry corpus, the Weighted Naïve Credal Classifier is proposed. [Results] The output of the contrast experiments verifies the effectiveness of the Weighted Naïve Credal Classifier. [Limitations] The size of the data set and the number of the authors should be further expanded, so that the efficiency and the accuracy of authorship identification on large data sets can be improved. [Conclusions] The method proposed in this paper has good accuracy and applicability on poetry translation collections.

Key wordsEnglish translation of Chinese classics    Authorship identification    Incomplete data
收稿日期: 2014-05-15      出版日期: 2015-02-12
:  TP393  
基金资助:

本文系教育部人文社会科学研究规划青年基金项目"基于多层面特征分析的在线信息作者身份识别研究"(项目编号:11YJCZH131)、辽宁省高等学校优秀人才支持计划项目(项目编号:WJQ2013017)和大连外国语大学校级科研一般项目"基于语言学特征的网络舆情信息挖掘"的研究成果之一。

通讯作者: 祁瑞华,ORCID:0000-0002-2583-3055,E-mail:rhqi@dlufl.edu.cn。     E-mail: rhqi@dlufl.edu.cn
作者简介: 作者贡献声明: 祁瑞华: 提出研究思路, 设计研究方案, 论文起草和最终版本修订; 霍跃红: 设计研究方案, 调查、采集和分析数据; 郭旭: 数据清洗和分析; 刘彩虹: 论文最终版本修订。
引用本文:   
祁瑞华, 霍跃红, 郭旭, 刘彩虹. 典籍英译作者身份识别研究[J]. 现代图书情报技术, 2015, 31(1): 31-37.
Qi Ruihua, Huo Yuehong, Guo Xu, Liu Caihong. Authorship Identification in English Translations of Chinese Classics. New Technology of Library and Information Service, 2015, 31(1): 31-37.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2015.01.05      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2015/V31/I1/31

[1] 胡壮麟. 理论文体学[M]. 北京: 外语教学与研究出版社, 2000: 50-63. (Hu Zhuanglin. Theoretical Stylistics [M]. Beijing: Foreign Language Teaching and Research Press, 2000: 50-63.)
[2] Gamon M. Linguistic Correlates of Style: Authorship Classification with Deep Linguistic Analysis Features [C]. In: Proceedings of the 20th International Conference of Computational Linguistics (COLING'04), Geneva, Swissland. Stroudsburg: Association for Computational Linguistics, 2004: 611-617.
[3] Abbasi A, Chen H. Applying Authorship Analysis to Extremist-group Web Forum Messages [J]. IEEE Intelligent Systems, 2005, 20(5): 67-75.
[4] Baayen H, van Halteren H, Tweedie F. Outside the Cave of Shadows: Using Syntactic Annotation to Enhance Authorship Attribution [J]. Literary and Linguistic Computing, 1996, 11(3): 121-132.
[5] 陈炳藻. 从词汇上的统计论《红楼梦》的作者问题[C]. 见:首届国际《红楼梦》研讨会, 美国威斯康星大学. 1980: 1-10. (Chen Bingzao. The Authorship Problem of "A Dream of Red Mansions" from Vocabulary Statistics Theory [C]. In: Proceedings of the 1st International Seminar on "A Dream of Red Mansions", the University of Wisconsin.1980: 1-10.)
[6] 李贤平. 《红楼梦》成书新说[J]. 复旦学报: 社会科学版, 1987(5): 3-16. (Li Xianping. The New Statement of the Form of "A Dream of Red Mansions" [J]. Fudan Journal: Social Sciences Edition, 1987(5): 3-16.)
[7] 施建军. 基于支持向量机技术的《红楼梦》作者研究[J]. 红楼梦学刊, 2011(5): 35-52. (Shi Jianjun. The Author Research of "A Dream of Red Mansions" Based on Support Vector Machine [J]. Studies on "A Dream of Red Mansions", 2011(5): 35-52.)
[8] 吕英杰, 范静, 刘景方. 基于文体学的中文UGC作者身份识别研究[J]. 现代图书情报技术, 2013(9): 48-53. (Lv Yingjie, Fan Jing, Liu Jingfang. Authorship Identification of Chinese UGC Based on Stylistics [J]. New Technology of Library and Information Service, 2013(9): 48-53.)
[9] 吴春龙, 周昌乐. 基于频繁关键字共现的诗词风格分类模型研究[J]. 厦门大学学报: 自然科学版, 2008, 47(1): 41-44. (Wu Chunlong, Zhou Changle. Frequent Keyword Concurrence-based Vector Space Model for Chinese Poetry Style Analysis [J]. Journal of Xiamen University: Natural Science, 2008, 47(1): 41-44.)
[10] 易勇. 计算机辅助诗词创作中的风格辨析及联语应对研究[D]. 重庆: 重庆大学, 2005. (Yi Yong. A Study on Style Identification and Chinese Couplet Responses Oriented Computer Aided Poetry Composing [D]. Chongqing: Chongqing University, 2005.)
[11] Zhao Y, Zobel J. Searching with Style: Authorship Attribution in Classic Literature [C]. In: Proceedings of the 30th Australasian Computer Science Conference (ACSC'07). Darlinghurst: Australian Computer Society, 2007: 59-68.
[12] 张全, 张运良, 袁毅. 利用语言概念表示的作者写作风格分类研究[C]. 见:第七届中文信息处理国际会议,武汉.北京: 电子工业出版社, 2007: 460-464. (Zhang Quan, Zhang Yunliang, Yuan Yi. Text Categorization for Authorship Based on the Features of Lingual Conceptual Expression [C]. In: Proceedings of the 7th International Conference on Chinese Computing, Wuhan. Bejing: Publishing House of Electronics Industry, 2007: 460-464.)
[13] 雷蕾, 吴乃君, 刘鹏, 等. 灵敏度分析: 分类器中的缺失数据[J]. 管理学报, 2005, 2(S2): 153-157. (Lei Lei, Wu Naijun, Liu Peng, et al. Sensitivity Analysis: Missing Data in Classifiers [J]. Chinese Journal of Management, 2005, 2(S2): 153-157.)
[14] Zaffalon M. The Naive Credal Classifier [J]. Journal of Statistical Planning and Inference, 2002, 105(1): 5-21.
[15] Machine Learning Group at the University of Waikato. Weka--Machine Learning Software in Java [EB/OL]. [2014-04-23]. http://sourceforge.net/projects/weka/files/?source=navbar.

[1] 郭旭,祁瑞华. 作者身份识别中新奇检测方法研究*[J]. 数据分析与知识发现, 2020, 4(4): 56-62.
[2] 郭旭,祁瑞华. 作者身份识别中不规范文本特征选择方法的研究*[J]. 现代图书情报技术, 2016, 32(11): 27-33.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn