|
|
Identifying Chinese Microblog Author Gender Based on Dependency |
Qi Ruihua() |
School of Software, Dalian University of Foreign Languages, Dalian 116044, China |
|
|
Abstract [Objective] This paper proposes a new method to indentify the gender of Chinese microblog author with the help of dependency features. [Methods] This study collected public posts from Tencent Microblogs and extracted the dependency features, which were analyzed and compared with existing vocabulary, structure, function words, and part-of-speech tagging features. [Results] A controlled experiment showed that the proposed method obtained the highest values of precision, recall and F-measure. [Limitations] The new method needs to be examined with larger corpus. [Conclusions] The proposed method is the most effective way to identify the gender of microblog author.
|
Received: 06 October 2016
Published: 27 March 2017
|
|
[1] |
新浪科技.3200万Twitter账号被盗 [R/OL].[2016-06-09]. .
|
[1] |
(Sina Science and Technology. 32 Million Twitter Account Stolen [R/OL]. [2016-06-09].
|
[2] |
新浪科技.微博月活跃用户增至2.61亿[R/OL]. [2016-05- 12]. .
|
[2] |
(Sina Science and Technology. Micro-blog Monthly Active Users Increased to 261 Million[R/OL].[2016-05-12].
|
[3] |
Burger J D, Henderson J, Kim G, et al.Discriminating Gender on Twitter[C]//Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011: 1301-1309.
|
[4] |
王晶晶, 李寿山, 黄磊.中文微博用户性别分类方法研究[J]. 中文信息学报, 2014, 28(6): 150-155, 168.
doi: 10.3969/j.issn.1003-0077.2014.06.021
|
[4] |
(Wang Jingjing, Li Shoushan, Huang Lei.User Gender Classification in Chinese Microblog[J]. Journal of Chinese Information Processing, 2014, 28(6): 150-155, 168.)
doi: 10.3969/j.issn.1003-0077.2014.06.021
|
[5] |
Schler J, Koppel M, Argamon S, et al.Effects of Age and Gender on Blogging[C]// Proceedings of the 2006 Association for the Advance of Artificial Intelligence Spring Symposium: Computational Approaches to Analyzing Weblogs. 2006.
|
[6] |
Argamon S, Koppel M, Pennebaker J W, et al.Automatically Profiling the Author of an Anonymous Text[J]. Communications of the ACM, 2009, 52(2): 119-123.
doi: 10.1145/1461928.1461959
|
[7] |
Argamon S, Koppel M.A Systemic Functional Approach to Automated Authorship Analysis[J]. Journal of Law & Policy, 2013, 12: 299-315.
|
[8] |
Mikros G K, Perifanos K.Authorship Attribution in Greek Tweets Using Author’s Multilevel N-Gram Profiles[C]// Proceedings of the 2013 Association for the Advance of Artificial Intelligence (AAAI) Spring Symposium: Analyzing Microtext. 2013.
|
[9] |
Rangel F, Rosso P.Use of Language and Author Profiling: Identification of Gender and Age[C]//Proceedings of the 10th Workshop on Natural Language Processing and Cognitive Science. 2013.
|
[10] |
唐琴, 林鸿飞. 文本中人物性别识别研究[J]. 中文信息学报, 2010, 24(2): 46-51.
doi: 10.3969/j.issn.1003-0077.2010.02.006
|
[10] |
(Tang Qin, Lin Hongfei.Research on Gender Recognition for Character in Text[J]. Journal of Chinese Information Processing, 2010, 24(2): 46-51.)
doi: 10.3969/j.issn.1003-0077.2010.02.006
|
[11] |
黄发良, 熊金波, 黄添强, 等. 基于粗糙集的微博用户性别识别[J]. 计算机应用, 2014, 34(8): 2209-2211.
doi: 10.11772/j.issn.1001-9081.2014.08.2209
|
[11] |
(Huang Faliang, Xiong Jinbo, Huang Tianqiang, et al.Gender Identification of Microblog Users Based on Rough Set[J]. Journal of Computer Applications, 2014, 34(8): 2209-2211.)
doi: 10.11772/j.issn.1001-9081.2014.08.2209
|
[12] |
白丽娟. 基于文本挖掘的性别分类研究[D]. 哈尔滨: 哈尔滨工业大学, 2011.
|
[12] |
(Bai Lijuan.Gender Classification Based on Text Mining [D]. Harbin : Harbin Institute of Technology, 2011.)
|
[13] |
祁瑞华, 杨德礼, 郭旭, 等. 基于多层面文体特征的博客作者身份识别研究[J]. 情报学报, 2015, 34(6): 628-634.
doi: 10.3772/j.issn.1000-0135.2015.006.008
|
[13] |
(Qi Ruihua, Yang Deli, Guo Xu, et al.Blogger Identification Based on Multidimensional Stylistic Features[J]. Journal of the China Society for Scientific and Technical Information, 2015, 34(6): 628-634.)
doi: 10.3772/j.issn.1000-0135.2015.006.008
|
[14] |
Hollingsworth C.Using Dependency-based Annotations for Authorship Identification[M]. Text, Speech and Dialogue, Springer Berlin Heidelberg, 2012: 314-319.
|
[15] |
Zhang C, Wu X, Niu Z, et al.Authorship Identification from Unstructured Texts[J]. Knowledge-Based Systems, 2014, 66: 99-111.
doi: 10.1016/j.knosys.2014.04.025
|
[16] |
Tesnière L, Osborne T, Kahane S.Elements of Structural Syntax[M]. John Benjamins Publishing Company, 2015.
|
[17] |
Robinson J J.Dependency Structures and Transformational Rules[J]. Language, 1970, 46(2): 259-285.
doi: 10.2307/412278
|
[18] |
Fudan Natural Language Processing Group. FudanNLP [EB /OL]. [2016-01-01]..
|
[19] |
国家语言资源监测与研究中心平面语言媒体中心. 历年中国语言生活状况绿皮书[R/OL]. [2015-01-01]. .
|
[19] |
(National Language Resources Monitoring and Research Center. Chinese Language Situation over the Years [R/OL]. [2015-01-01].
|
[20] |
Zheng R, Li J, Chen H, et al.A Framework for Authorship Identification of Online Messages: Writing-style Features and Classification Techniques[J]. Journal of the American Society for Information Science and Technology, 2006, 57(3): 378-393.
doi: 10.1002/asi.20316
|
[21] |
Yu B.Function Words for Chinese Authorship Attribution[C]// Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2012.
|
[22] |
ICTCLAS 2015 [EB/OL]. [2015-01-01]. .
|
[23] |
Silva R S, Laboreiro G, Sarmento L, et al.‘twazn me!!!; (’ Automatic Authorship Analysis of Micro-blogging Messages[M]. Natural Language Processing and Information Systems. Berlin Heidelberg: Springer, 2011: 161-168.
|
[24] |
Machine Learning Group at the University of Waikato. WEKA [EB/OL]. [2015-01-01]. .
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|