[Objective] This paper proposes a new method to indentify the gender of Chinese microblog author with the help of dependency features. [Methods] This study collected public posts from Tencent Microblogs and extracted the dependency features, which were analyzed and compared with existing vocabulary, structure, function words, and part-of-speech tagging features. [Results] A controlled experiment showed that the proposed method obtained the highest values of precision, recall and F-measure. [Limitations] The new method needs to be examined with larger corpus. [Conclusions] The proposed method is the most effective way to identify the gender of microblog author.
祁瑞华. 基于依存关系的中文微博作者性别识别*[J]. 数据分析与知识发现, 2017, 1(2): 58-63.
Qi Ruihua. Identifying Chinese Microblog Author Gender Based on Dependency. Data Analysis and Knowledge Discovery, 2017, 1(2): 58-63.
(Sina Science and Technology. 32 Million Twitter Account Stolen [R/OL]. [2016-06-09].
[2]
新浪科技.微博月活跃用户增至2.61亿[R/OL]. [2016-05- 12]. .
[2]
(Sina Science and Technology. Micro-blog Monthly Active Users Increased to 261 Million[R/OL].[2016-05-12].
[3]
Burger J D, Henderson J, Kim G, et al.Discriminating Gender on Twitter[C]//Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011: 1301-1309.
(Wang Jingjing, Li Shoushan, Huang Lei.User Gender Classification in Chinese Microblog[J]. Journal of Chinese Information Processing, 2014, 28(6): 150-155, 168.)
doi: 10.3969/j.issn.1003-0077.2014.06.021
[5]
Schler J, Koppel M, Argamon S, et al.Effects of Age and Gender on Blogging[C]// Proceedings of the 2006 Association for the Advance of Artificial Intelligence Spring Symposium: Computational Approaches to Analyzing Weblogs. 2006.
[6]
Argamon S, Koppel M, Pennebaker J W, et al.Automatically Profiling the Author of an Anonymous Text[J]. Communications of the ACM, 2009, 52(2): 119-123.
doi: 10.1145/1461928.1461959
[7]
Argamon S, Koppel M.A Systemic Functional Approach to Automated Authorship Analysis[J]. Journal of Law & Policy, 2013, 12: 299-315.
[8]
Mikros G K, Perifanos K.Authorship Attribution in Greek Tweets Using Author’s Multilevel N-Gram Profiles[C]// Proceedings of the 2013 Association for the Advance of Artificial Intelligence (AAAI) Spring Symposium: Analyzing Microtext. 2013.
[9]
Rangel F, Rosso P.Use of Language and Author Profiling: Identification of Gender and Age[C]//Proceedings of the 10th Workshop on Natural Language Processing and Cognitive Science. 2013.
(Tang Qin, Lin Hongfei.Research on Gender Recognition for Character in Text[J]. Journal of Chinese Information Processing, 2010, 24(2): 46-51.)
doi: 10.3969/j.issn.1003-0077.2010.02.006
(Huang Faliang, Xiong Jinbo, Huang Tianqiang, et al.Gender Identification of Microblog Users Based on Rough Set[J]. Journal of Computer Applications, 2014, 34(8): 2209-2211.)
doi: 10.11772/j.issn.1001-9081.2014.08.2209
[12]
白丽娟. 基于文本挖掘的性别分类研究[D]. 哈尔滨: 哈尔滨工业大学, 2011.
[12]
(Bai Lijuan.Gender Classification Based on Text Mining [D]. Harbin : Harbin Institute of Technology, 2011.)
(Qi Ruihua, Yang Deli, Guo Xu, et al.Blogger Identification Based on Multidimensional Stylistic Features[J]. Journal of the China Society for Scientific and Technical Information, 2015, 34(6): 628-634.)
doi: 10.3772/j.issn.1000-0135.2015.006.008
[14]
Hollingsworth C.Using Dependency-based Annotations for Authorship Identification[M]. Text, Speech and Dialogue, Springer Berlin Heidelberg, 2012: 314-319.
[15]
Zhang C, Wu X, Niu Z, et al.Authorship Identification from Unstructured Texts[J]. Knowledge-Based Systems, 2014, 66: 99-111.
doi: 10.1016/j.knosys.2014.04.025
[16]
Tesnière L, Osborne T, Kahane S.Elements of Structural Syntax[M]. John Benjamins Publishing Company, 2015.
(National Language Resources Monitoring and Research Center. Chinese Language Situation over the Years [R/OL]. [2015-01-01].
[20]
Zheng R, Li J, Chen H, et al.A Framework for Authorship Identification of Online Messages: Writing-style Features and Classification Techniques[J]. Journal of the American Society for Information Science and Technology, 2006, 57(3): 378-393.
doi: 10.1002/asi.20316
[21]
Yu B.Function Words for Chinese Authorship Attribution[C]// Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2012.
[22]
ICTCLAS 2015 [EB/OL]. [2015-01-01]. .
[23]
Silva R S, Laboreiro G, Sarmento L, et al.‘twazn me!!!; (’ Automatic Authorship Analysis of Micro-blogging Messages[M]. Natural Language Processing and Information Systems. Berlin Heidelberg: Springer, 2011: 161-168.
[24]
Machine Learning Group at the University of Waikato. WEKA [EB/OL]. [2015-01-01]. .