Please wait a minute...
New Technology of Library and Information Service  2013, Vol. 29 Issue (9): 48-53    DOI: 10.11925/infotech.1003-3513.2013.09.08
Current Issue | Archive | Adv Search |
Authorship Identification of Chinese UGC Based on Stylistics
Lv Yingjie1, Fan Jing2, Liu Jingfang3
1. School of Economics and Management, Beijing University of Chemical Technology, Beijing 100029, China;
2. International Business School, Beijing Foreign Studies University, Beijing 100089, China;
3. Antai College of Economics and Management, Shanghai Jiaotong University, Shanghai 200052, China
Download: PDF(398 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  The characteristics of information network such as openness and virtuality make it difficult for authorship identification. Therefore, this paper proposes the approach of authorship identification of Chinese UGC based on stylistics. The authors integrate four types of features including lexical, syntactic, structural and content-specific features to compose writing-style features, and then use text classification technologies for authorship identification. The experimental results demonstrate that the proposed approach can be used for authorship identification of Chinese UGC efficiently.
Key wordsStylistics      UGC      Authorship identification     
Received: 18 April 2013      Published: 27 September 2013
: 

TP391

 

Cite this article:

Lv Yingjie, Fan Jing, Liu Jingfang. Authorship Identification of Chinese UGC Based on Stylistics. New Technology of Library and Information Service, 2013, 29(9): 48-53.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2013.09.08     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2013/V29/I9/48

[1] 孙晓明,马少平.基于写作风格的作者识别[C]. 见: 中国中文信息学会二十周年学术会议论文集. 北京:清华大学出版社,2001:198-204.(Sun Xiaoming,Ma Shaoping. Author Identification Based on Stylometric Approach[C].In: Proceedings of the 20th Anniversary Chinese Information Processing Society of China. Beijing: Tsinghua University Press, 2001: 198-204.)
[2] Efron B, Thisted R.Estimating the Number of Unseen Species: How Many Words did Shakespeare Know?[J].Biometrika,1976,63(3):435-447.
[3] 张运良,朱礼军,乔晓东,等.基于句类特征的作者写作风格分类研究[J]. 计算机工程与应用,2009,45(22):129-131,223.(Zhang Yunliang,Zhu Lijun,Qiao Xiaodong,et al.Research on Text Authorship Categorization Based on Sentence Category Features[J]. Computer Engineering and Applications,2009,45(22):129-131,223.)
[4] 张凯,张明允.基于SVM的《红楼梦》写作风格研究[J]. 贵阳学院学报:自然科学版,2011,6(1):55-57.(Zhang Kai,Zhang Mingyun.Research on the Writing Style of “Dream of the Red Chamber” Based on SVM[J].Journal of Guiyang College: Natural Sciences,2011,6(1):55-57.)
[5] 年洪东,陈小荷,王东波.现当代文学作品的作者身份识别研究[J]. 计算机工程与应用,2010,46(4):226-229.(Nian Hongdong,Chen Xiaohe,Wang Dongbo. Research on Authorship Attribution of Contemporary Literature[J].Computer Engineering and Applications,2010,46(4):226-229.)
[6] 武晓春,黄萱菁,吴立德.基于语义分析的作者身份识别方法研究[J]. 中文信息学报, 2006,20(6):61-68.(Wu Xiaochun,Huang Xuanjing,Wu Lide. Authorship Identification Based on Semantic Analysis[J]. Journal of Chinese Information Processing,2006,20(6):61-68.)
[7] De Vel O,Anderson A,Corney M,et al.Mining E-mail Content for Author Identification Forensics[J]. ACM SIGMOD Record,2001,30(4):55-64.
[8] Zheng R,Li J,Huang Z, et al.A Framework for Authorship Identification of Online Messages: Writing-style Features and Classification Techniques[J].Journal of the American Society for Information Science and Technology,2006,57(3):378-393.
[9] Abbasi A,Chen H.Identification and Comparison of Extremist-group Web Forum Messages Using Authorship Analysis [J]. IEEE Intelligent Systems,2005,20(5):67-75.
[10] Holmes D I,Forsyth R S.The Federalist Revisited:New Directions in Authorship Attribution[J].Literary and Linguistic Computing,1995,10(2):111-127.
[11] Juola P,Baayen H.A Controlled Corpus Experiment in Authorship Identification by Cross-entropy[J]. Literary and Linguistic Computing,2005,20(S):59-67.
[12] Abbasi A,Chen H. Writeprints:A Stylometric Approach to Identity-level Identification and Similarity Detection in Cyberspace[J]. ACM Transactions on Information Systems,2008,26(2):1-29.
[13] Salton G,Buckley C.Term-weighting Approaches in Automatic Text Retrieval [J]. Information Processing and Management,1988,24 (5):513-523.
[14] Battiti R.Using Mutual Information for Selecting Features in Supervised Neural Net Learning [J]. IEEE Transactions on Neural Networks,1994,5(4): 537-550.
[15] Yang Y,Pederson J O.A Comparative Study on Feature Selection in Text Categorization [C].In: Proceedings of the 14th International Conference on Machine Learning.1997:412-420.
[16] Friedman N,Geiger D,Goldszmidt M. Bayesian Network Classifiers[J].Machine Learning,1997,29 (2-3):131-163.
[17] Quinlan J R.C4.5:Programs for Machine Learning [M]. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,1993.
[18] Cortes C,Vapnik V.Support-Vector Network[J].Machine Learning,1995,20 (3):273-297.
[1] Meimei Chen,Kangjie Xue. Personalized Recommendation Algorithm of Multi-faceted Trust Tensor Based on Tag Clustering[J]. 数据分析与知识发现, 2017, 1(5): 94-101.
[2] Meimei Chen, Kangjie Xue. Personalized Recommendation Algorithm Based on Modified Tensor Decomposition Model[J]. 数据分析与知识发现, 2017, 1(3): 38-45.
[3] Qi Ruihua, Huo Yuehong, Guo Xu, Liu Caihong. Authorship Identification in English Translations of Chinese Classics[J]. 现代图书情报技术, 2015, 31(1): 31-37.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn