[Objective] This paper uses multi-dimensional information of social media users to automatically classify them. [Methods] First, we defined social media users as individual, media, government, and organization. Then, we extracted the following features from user profiles: demographic characteristics, namings, and self-descriptions. Third, we created a user classification models based on machine learning algorithms and evaluated its performance with real Twitter dataset. [Results] Both precision and recall of the proposed model were greater than 83%. The naming, demographic characteristics, and self-description features posed increasing contributions to the classification model. [Limitations] The sample size needs to be expanded, which helps us better analyzed the characteristics of different users. [Conclusions] The proposed method could accurately identify four types of users, which benefits social media user classification research in the future.
李纲,周华阳,毛进,陈思菁. 基于机器学习的社交媒体用户分类研究 *[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
Gang Li,Huayang Zhou,Jin Mao,Sijing Chen. Classifying Social Media Users with Machine Learning. Data Analysis and Knowledge Discovery, 2019, 3(8): 1-9.
( Chen Jiawei . Exploring the Sense of Community for an Online Sport Community: A Case Study of Nippon Professional Baseball Club[D]. Chaoyang University of Science and Technology, 2006.)
Gomez-Rodriguez M, Leskovec J, Krause A . Inferring Network of Diffusion and Influence [C]// Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010: 1019-1028.
( He Chaobo, Yang Zhenxiong, Hong Shaowen , et al. User Classification Method in Online Social Network Using Random Walks[J]. Computer Science, 2015,42(2):198-202.)
Wu S, Hofman J, Mason W , et al. Who Says What to Whom on Twitter [C]// Proceedings of the 20th International Conference on World Wide Web. 2011: 705-714.
Rao D, Yarowsky D, Shreevats A , et al. Classifying Latent User Attributes in Twitter [C]// Proceedings of the 2nd International Workshop on Search and Mining User-generated Contents. ACM, 2010: 37-44.
Zubiaga A, Körner C, Strohmaier M . Tags vs Shelves: From Social Tagging to Social Classification [C]// Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia. ACM, 2011: 93-102.
Pennacchiotti M, Popescu A M. A Machine Learning Approach to Twitter User Classification[C]// Proceedings of the 5th International AAAI Conference on Weblogs and Social Media. AAAI Press, 2011: 281-288.
Shafiq M Z, Ilyas M U, Liu A X , et al. Identifying Leaders and Followers in Online Social Networks[J]. IEEE Journal on Selected Areas in Communications, 2013,31(9):618-628.
Xie D, Xu J, Lu T C . Automated Classification of Extremist Twitter Accounts Using Content-Based and Network-Based Features [C]// Proceedings of the 4th International Conference on Big Data. IEEE, 2016: 2545-2549.
Abu-Salih B, Wongthontham P, Chan K Y . Twitter Mining for Ontology-Based Domain Discovery Incorporating Machine Learning[J]. Journal of Knowledge Management, 2018,22(5):949-981.
( Zhao Wenbing, Zhu Qinghua, Wu Kewen , et al. Analysis of Micro-blogging User Character and Motivation——Take Micro-blogging of Hexun.com as an Example[J]. New Technology of Library and Information Service, 2011(2):69-75.)
( Xue Yunxia, Li Shoushan, Ruan Jin . Human and Nonhuman User Classification in Micro-blog[J]. Journal of Shanxi University: Natural Science Edition, 2015,38(2):192-198.)
He S, Wang H, Jiang Z H. Identifying User Behavior on Twitter Based on Multi-scale Entropy [C]// Proceedings of the 2014 IEEE International Conference on Security, Pattern Analysis, and Cybernetics. IEEE, 2014: 381-384.
( Jiang Cuiqing, Wang Qilin, Liu Shixi , et al. Semi-supervised Learning for Automobile Defect Identification in the Context of Chinese Social Media[J]. Chinese Journal of Management Science, 2014(S1):677-685.)