Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (1): 31-37    DOI: 10.11925/infotech.1003-3513.2015.01.05
Current Issue | Archive | Adv Search |
Authorship Identification in English Translations of Chinese Classics
Qi Ruihua1, Huo Yuehong2, Guo Xu1, Liu Caihong1
1. Computer Education Department, Dalian University of Foreign Languages, Dalian 116044, China;
2. School of English Studies, Dalian University of Foreign Languages, Dalian 116044, China
Download: PDF(582 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper analyzes the key issues of the authorship indentification in English translations of Chinese classics and proposes the effective way to identify the authorship of incomplete data. [Methods] Based on the stylistic features composed of vocabulary level, sentence level and discourse level, the stylistic feature vector space model for poetry translation texts is established. From the angle of the characteristics of imbalance poetry corpus, the Weighted Naïve Credal Classifier is proposed. [Results] The output of the contrast experiments verifies the effectiveness of the Weighted Naïve Credal Classifier. [Limitations] The size of the data set and the number of the authors should be further expanded, so that the efficiency and the accuracy of authorship identification on large data sets can be improved. [Conclusions] The method proposed in this paper has good accuracy and applicability on poetry translation collections.

Key wordsEnglish translation of Chinese classics      Authorship identification      Incomplete data     
Received: 15 May 2014      Published: 12 February 2015
:  TP393  

Cite this article:

Qi Ruihua, Huo Yuehong, Guo Xu, Liu Caihong. Authorship Identification in English Translations of Chinese Classics. New Technology of Library and Information Service, 2015, 31(1): 31-37.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2015.01.05     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2015/V31/I1/31

[1] 胡壮麟. 理论文体学[M]. 北京: 外语教学与研究出版社, 2000: 50-63. (Hu Zhuanglin. Theoretical Stylistics [M]. Beijing: Foreign Language Teaching and Research Press, 2000: 50-63.)
[2] Gamon M. Linguistic Correlates of Style: Authorship Classification with Deep Linguistic Analysis Features [C]. In: Proceedings of the 20th International Conference of Computational Linguistics (COLING'04), Geneva, Swissland. Stroudsburg: Association for Computational Linguistics, 2004: 611-617.
[3] Abbasi A, Chen H. Applying Authorship Analysis to Extremist-group Web Forum Messages [J]. IEEE Intelligent Systems, 2005, 20(5): 67-75.
[4] Baayen H, van Halteren H, Tweedie F. Outside the Cave of Shadows: Using Syntactic Annotation to Enhance Authorship Attribution [J]. Literary and Linguistic Computing, 1996, 11(3): 121-132.
[5] 陈炳藻. 从词汇上的统计论《红楼梦》的作者问题[C]. 见:首届国际《红楼梦》研讨会, 美国威斯康星大学. 1980: 1-10. (Chen Bingzao. The Authorship Problem of "A Dream of Red Mansions" from Vocabulary Statistics Theory [C]. In: Proceedings of the 1st International Seminar on "A Dream of Red Mansions", the University of Wisconsin.1980: 1-10.)
[6] 李贤平. 《红楼梦》成书新说[J]. 复旦学报: 社会科学版, 1987(5): 3-16. (Li Xianping. The New Statement of the Form of "A Dream of Red Mansions" [J]. Fudan Journal: Social Sciences Edition, 1987(5): 3-16.)
[7] 施建军. 基于支持向量机技术的《红楼梦》作者研究[J]. 红楼梦学刊, 2011(5): 35-52. (Shi Jianjun. The Author Research of "A Dream of Red Mansions" Based on Support Vector Machine [J]. Studies on "A Dream of Red Mansions", 2011(5): 35-52.)
[8] 吕英杰, 范静, 刘景方. 基于文体学的中文UGC作者身份识别研究[J]. 现代图书情报技术, 2013(9): 48-53. (Lv Yingjie, Fan Jing, Liu Jingfang. Authorship Identification of Chinese UGC Based on Stylistics [J]. New Technology of Library and Information Service, 2013(9): 48-53.)
[9] 吴春龙, 周昌乐. 基于频繁关键字共现的诗词风格分类模型研究[J]. 厦门大学学报: 自然科学版, 2008, 47(1): 41-44. (Wu Chunlong, Zhou Changle. Frequent Keyword Concurrence-based Vector Space Model for Chinese Poetry Style Analysis [J]. Journal of Xiamen University: Natural Science, 2008, 47(1): 41-44.)
[10] 易勇. 计算机辅助诗词创作中的风格辨析及联语应对研究[D]. 重庆: 重庆大学, 2005. (Yi Yong. A Study on Style Identification and Chinese Couplet Responses Oriented Computer Aided Poetry Composing [D]. Chongqing: Chongqing University, 2005.)
[11] Zhao Y, Zobel J. Searching with Style: Authorship Attribution in Classic Literature [C]. In: Proceedings of the 30th Australasian Computer Science Conference (ACSC'07). Darlinghurst: Australian Computer Society, 2007: 59-68.
[12] 张全, 张运良, 袁毅. 利用语言概念表示的作者写作风格分类研究[C]. 见:第七届中文信息处理国际会议,武汉.北京: 电子工业出版社, 2007: 460-464. (Zhang Quan, Zhang Yunliang, Yuan Yi. Text Categorization for Authorship Based on the Features of Lingual Conceptual Expression [C]. In: Proceedings of the 7th International Conference on Chinese Computing, Wuhan. Bejing: Publishing House of Electronics Industry, 2007: 460-464.)
[13] 雷蕾, 吴乃君, 刘鹏, 等. 灵敏度分析: 分类器中的缺失数据[J]. 管理学报, 2005, 2(S2): 153-157. (Lei Lei, Wu Naijun, Liu Peng, et al. Sensitivity Analysis: Missing Data in Classifiers [J]. Chinese Journal of Management, 2005, 2(S2): 153-157.)
[14] Zaffalon M. The Naive Credal Classifier [J]. Journal of Statistical Planning and Inference, 2002, 105(1): 5-21.
[15] Machine Learning Group at the University of Waikato. Weka--Machine Learning Software in Java [EB/OL]. [2014-04-23]. http://sourceforge.net/projects/weka/files/?source=navbar.

[1] Lv Yingjie, Fan Jing, Liu Jingfang. Authorship Identification of Chinese UGC Based on Stylistics[J]. 现代图书情报技术, 2013, 29(9): 48-53.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn