|
|
Identifying Authorship with Novelty Detection Method |
Guo Xu(),Qi Ruihua |
Research Center for Language Intelligence of Dalian University of Foreign Languages, Dalian 116044, China |
|
|
Abstract [Objective] This paper proposes a novelty detection method to identify authorship. [Methods] We built an algorithm combining one-class SVM or multivariate Gaussian algorithm with multi-layer stylistic feature model. Then, we proposed a threshold selection method based on tolerance t. [Results] When the total number of sample characters was greater than 500, the accuracy, recall and F1 values were more than 0.9. Once the number of sample characters reached 2000, the accuracy, recall and F1 values were 0.978, 0.984 and 0.979. [Limitations] The model’s performance with short texts needs to be improved. [Conclusions] The proposed method could effectively address the novelty detection issue facing long text for authorship identification.
|
Received: 01 April 2019
Published: 01 June 2020
|
|
Corresponding Authors:
Guo Xu
E-mail: guoxu@dlufl.edu.cn
|
[1] |
Soler J, Wanner L . On the Relevance of Syntactic and Discourse Features for Author Profiling and Identification[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Spain. EACL, 2017(2):681-687.
|
[2] |
祁瑞华, 郭旭, 刘彩虹 . 中文微博作者身份识别研究[J]. 情报学报, 2017,36(1):76-82.
|
[2] |
( Qi Ruihua, Guo Xu, Liu Caihong . Authorship Attribution of Chinese Microblog[J]. Journal of the China Society for Scientific and Technical Information, 2017,36(1):76-82.)
|
[3] |
Szwed P. Authorship Attribution for Polish Texts Based on Part of Speech Tagging[C]// Proceedings of the International Conference: Beyond Databases, Architectures and Structures, Poland. 2017: 316-328.
|
[4] |
Rocha A, Scheirer W J, Forstall C W , et al. Authorship Attribution for Social Media Forensics[J]. IEEE Transactions on Information Forensics & Security, 2017,12(1):5-33.
|
[5] |
Yu D, Chen N, Jiang F , et al. Constrained NMF-based Semi-Supervised Learning for Social Media Spammer Detection[J]. Knowledge-Based Systems, 2017,125(C):64-73.
doi: 10.1016/j.knosys.2017.03.025
|
[6] |
Ren Y, Ji D. Neural Networks for Deceptive Opinion Spam Detection: An Empirical Study[J]. Information Sciences, 2017, 385- 386:213-224.
|
[7] |
Mohammadi-Ghazi R, Marzouk Y M, Büyüköztürk O . Conditional Classifiers and Boosted Conditional Gaussian Mixture Model for Novelty Detection[J]. Pattern Recognition, 2018,81:601-614.
doi: 10.1016/j.patcog.2018.03.022
|
[8] |
Puig X, Font M, Ginebra J . A Unified Approach to Authorship Attribution and Verification[J]. The American Statistician, 2017,70(3):232-242.
doi: 10.1080/00031305.2016.1148630
|
[9] |
Koppel M, Winter Y . Determining if Two Documents are Written by the Same Author[J]. Journal of the Association for Information Science and Technology, 2014,65(1):178-187.
doi: 10.1002/asi.22954
|
[10] |
Halvani O, Winter C, Pflug A . Authorship Verification for Different Languages, Genres and Topics[J]. Digital Investigation, 2016,16(S):S33-S43.
doi: 10.1016/j.diin.2016.01.006
|
[11] |
张艳梅, 黄莹莹, 甘世杰 , 等. 基于贝叶斯模型的微博网络水军识别算法研究[J]. 通信学报, 2017,38(1):44-53.
|
[11] |
( Zhang Yanmei, Huang Yingying, Gan Shijie , et al. Weibo Spammers’ Identification Algorithm Based on Bayesian Model[J]. Journal on Communications, 2017,38(1):44-53.)
|
[12] |
Tarassenko L, Hayton P, Cerneaz N, et al. Novelty Detection for the Identification of Masses in Mammograms[C] // Proceedings of the 4th International Conference on Artificial Neural Networks, UK. IEEE, 1995: 442-447.
|
[13] |
Yeung Di-Y, Chow C. Parzen-window Network Intrusion Detectors[C]// Proceedings of the Object Recognition Supported by User Interaction for Service Robots, Canada. IEEE, 2002: 385-388.
|
[14] |
Liu F T, Ting K M, Zhou Z H . Isolation Forest[C]// Proceedings of the IEEE International Conference on Data Mining. IEEE, 2008(1):413-422.
|
[15] |
Liu F T, Ting K M, Zhou Z H. Isolation-Based Anomaly Detection[J]. ACM Transactions on Knowledge Discovery from Data, 2012, 6(1): Article No. 3.
|
[16] |
Schölkopf B, Burges C, Vapnik V. Extracting Support Data for a Given Task[C] // Proceedings of the 1st International Conference on Knowledge Discovery & Data Mining. AAAI Press, 1995: 252-257.
|
[17] |
Tax D, Duin R . Support Vector Data Description[J]. Machine Learning, 2004,54(1):45-66.
doi: 10.1023/B:MACH.0000008084.60811.49
|
[18] |
祁瑞华, 杨德礼, 郭旭 , 等. 基于多层面文体特征的博客作者身份识别研究[J]. 情报学报, 2015,34(6):628-634.
|
[18] |
( Qi Ruihua, Yang Deli, Guo Xu , et al. Blogger Identification Based on Multidimensional Stylistic Features[J]. Journal of the China Society for Scientific and Technical Information, 2015,34(6):628-634.)
|
[19] |
Manning C D, Surdeanu M, Finkel J, et al. The Stanford CoreNLP Natural Language Processing Toolkit[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 2014: 55-60.
|
[20] |
郭旭, 祁瑞华 . 作者身份识别中不规范文本特征选择方法的研究[J]. 现代图书情报技术, 2016(11):27-33.
|
[20] |
( Guo Xu, Qi Ruihua . Using Non-standard Text Features to Identify Authors[J]. New Technology of Library and Information Service, 2016(11):27-33.)
|
[21] |
Shrestha P, Sierra S, Gonzalez F A, et al. Convolutional Neural Networks for Authorship Attribution of Short Texts[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Spain. 2017. DOI: 10.18653/v1/E17-2106.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|