[Objective] Research on the relationship between the first 80 chapters and the last 40 chapters of “A Dream of Red Mansions”. [Methods] Combined quantitative with qualitative method, compare the first 40 chapters, the middle 40 chapters and last 40 chapters with each other to calculate the ratios of the unique words of every part. Clustering is conducted respectively by utilizing the function words, N-gram model of words and part-of-speech, all content words and the word length, compute the similarities among the first 40 chapters, the middle 40 chapters and last 40 chapters according to high-frequency words. [Results] There are differences between the first 80 chapters and the last 40 chapters. There are less long words in the first 80 chapters and it is more readable and coherent than the last 40 chapters. The first 80 chapters pay more attention to description of details, while the last 40 chapters focus more on the description of actions and scenes. [Limitations] Only consider words and N-gram models, semantic and pragmatic features are not utilized. [Conclusions] The author of the first 80 chapters and the author of the last 40 chapters are not the same according to these features.
肖天久, 刘颖. 《红楼梦》词和N元文法分析[J]. 现代图书情报技术, 2015, 31(4): 50-57.
Xiao Tianjiu, Liu Ying. Words and N-gram Models Analysis for “A Dream of Red Mansions”. New Technology of Library and Information Service, 2015, 31(4): 50-57.
[1] 陈大康. 从数理语言学看后四十回的作者——与陈炳藻先生商榷[J]. 红楼梦学刊, 1987(1): 293-318. (Chen Dakang. Identification of the Authorship of the Last 40 Chapters of “A Dream of Red Mansions” from the Aspect of Mathematical Linguistic: Discuss with Chen Bingzao [J]. Studies on “A Dream of Red Mansions”, 1987(1): 293-318.)
[2] 张运良, 朱礼军, 乔晓东, 等. 基于句类特征的作者写作风格分类研究[J].计算机工程与应用, 2009, 45(22): 129-131. (Zhang Yunliang, Zhu Lijun, Qiao Xiaodong, et al. Research on Text Authorship Categorization Based on Sentences Category Features [J]. Computer Engineering and Applications, 2009, 45(22): 129-131.)
[3] 韦博成. 《红楼梦》前80回与后40回某些文风差异的统计分析(两个独立二项总体等价性检验的一个应用)[J]. 应用概率统计, 2009, 25(4): 441-448. (Wei Bocheng. Statistical Analysis on the Differences of Writing Style Between First 80 Chapters and Last 40 Chapters in “Dream of Red Chamber”: An Application of Equivalent Test on Two Independent Binomial Populations [J]. Chinese Journal of Applied Probability and Statistics, 2009, 25(4): 441-448.)
[4] 施建军. 基于支持向量机技术的《红楼梦》作者研究[J]. 红楼梦学刊, 2011(5): 35-52. (Shi Jianjun. The Authorship Research on A Dream of Red Mansions Based on Support Vector Machine [J]. Studies on “A Dream of Red Mansions”, 2011(5): 35-52.)
[5] Li H, Liu Y. Language Models and Classification Analysis for Dream of the Red Chamber [C]. In: Proceedings of the 2nd International Conference on Cloud Computing and Intelligence Systems, Hangzhou, China. IEEE, 2012: 1459-1464.
[6] 刘颖, 肖天久. 《红楼梦》计量风格学研究[J]. 红楼梦学刊, 2014(4): 260-281. (Liu Ying, Xiao Tianjiu. Studies on Quantitative Styles of A Dream of Red Mansions [J]. Studies on “A Dream of Red Mansions”, 2014(4): 260-281.)
[7] Zheng R, Li J, Chen H, et al. A Framework for Authorship Identification of Online Messages: Writing-style Features and Classification Techniques [J]. Journal of the American Society for Information Science and Technology, 2006, 57(3): 378-393.
[8] Grieve J. Quantitative Authorship Attribution: An Evaluation of Techniques [J]. Literary and Linguistic Computing, 2007, 22(3): 251-270.
[9] Argamon S, Whitelaw C, Chase P J, et al. Stylistic Text Classification Using Functional Lexical Features [J]. Journal of the American Society for Information Science and Technology, 2007, 58(6): 802-822.
[10] Peng F, Schuurmans D, Wang S, et al. Language Independent Authorship Attribution Using Character Level Language Models [C]. In: Proceedings of the 10th Conference on European Chapter of the Association for Computational Linguistics. 2003: 267-274.
[11] Gamon M. Linguistic Correlates of Style: Authorship Classification with Deep Linguistic Analysis Features [C]. In: Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland. 2004.
[12] 王少康, 董科军, 阎保平. 基于语句节奏特征的作者身份识别研究[J]. 计算机工程, 2011, 37(9): 4-5, 8. (Wang Shaokang, Dong Kejun, Yan Baoping. Research on Authorship Identification Based on Sentence Rhythm Feature [J]. Computer Engineering, 2011, 37(9): 4-5, 8.)
[13] 李惠, 刘颖. 基于语言模型和特征分类的抄袭判定[J]. 计算机工程, 2013, 39(5): 230-234. (Li Hui, Liu Ying. Plagiarism Judgment Based on Language Model and Feature Classification [J]. Computer Engineering, 2013, 39(5): 230-234.)
[14] 曹雪芹, 高鹗. 红楼梦[M]. 北京: 人民文学出版社, 2000. (Cao Xueqin, Gao E. A Dream of Red Mansions [M]. Beijing: People's Literature Publishing House, 2000.)
[15] ICTCLAS [CP/OL]. [2014-07-28]. http://ictclas.nlpir.org/.
[16] Han J, Kamber M, Pei J. 数据挖掘: 概念与技术[M]. 第3版. 范明, 孟小峰译. 北京: 机械工业出版社, 2012. (Han J, Kamber M, Pei J. Data Mining: Concepts and Techniques [M]. The 3rd Edition. Translated by Fan Ming, Meng Xiaofeng. Beijing: China Machine Press, 2012.)
[17] Manning C D, Raghavan P, Schütze H. 信息检索导论[M]. 王斌译. 北京: 人民邮电出版社, 2010. (Manning C D, Raghavan P, Schütze H. Introduction to Information Retrieval [M]. Translated by Wang Bin. Beijing: Posts & Telecom Press, 2010.)