Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (6): 51-59    DOI: 10.11925/infotech.2096-3467.2019.1182
Current Issue | Archive | Adv Search |
Automatic Transferring Government Website E-Mails Based on Text Classification
Wang Sidi1,2,Hu Guangwei1,2(),Yang Siyu1,2,Shi Yun1
1School of Information Management, Nanjing University, Nanjing 210023, China
2Government Data Resources Institution of Nanjing University, Nanjing 210023, China
Download: PDF (1183 KB)   HTML ( 6
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This research proposes a method to automatically transferring e-mails received by government websites, aiming to reduce labor costs of managing public email boxes. [Methods] First, we chose four representative classification algorithms, including Naïve Bayes, Decision Tree, Random Forest and Multi-Layer Perception, and compared their classification resutls of e-mails received by the websites of Mayor’s Offices in Beijing, Hefei and Shenzhen. Then, we designed a method of automatically transferring these emails. Finally, we gave suggestions on the application of our method in the real world settings. [Results] Multi-Layer Perception yielded the best performance in our study, with the macro average precision and recall reaching more than 0.85, and all micro average indicators reaching more than 0.93. Naïve Bayes took the second place. Random Forest had a high macro average precision, but poor recall score. Decision Tree had an average precision and recall results. [Limitations] We did not examine the impacts of skewed distribution of received emails and eliminated the departments receiving few emails. [Conclusions] The proposed method optimizes the operation of public e-mails, which improves the efficiency of online government and reduces administrative costs.

Key wordsLeader’s Mailbox      Automatic Transfer      Text Classification      Multi-Layer Perception      Process Optimization     
Received: 31 October 2019      Published: 07 July 2020
ZTFLH:  TP39 G35  
Corresponding Authors: Hu Guangwei     E-mail: hugw@nju.edu.cn

Cite this article:

Wang Sidi,Hu Guangwei,Yang Siyu,Shi Yun. Automatic Transferring Government Website E-Mails Based on Text Classification. Data Analysis and Knowledge Discovery, 2020, 4(6): 51-59.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2019.1182     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I6/51

城市 部门数(个) 数据量(条)
北京市 16 10 703
合肥市 27 36 142
深圳市 33 37 053
Dataset
Experimental Procedure
算法 分类效果指标 宏平均 微平均
北京 合肥 深圳 北京 合肥 深圳
NB Precision 0.9085 0.8762 0.8470 0.9514 0.8985 0.9228
Recall 0.9048 0.8368 0.8260 0.9514 0.8985 0.9228
F1值 0.9035 0.8527 0.8323 0.9514 0.8985 0.9228
AUC 0.9952 0.9890 0.9852 0.9967 0.9946 0.9941
DT Precision 0.8227 0.7222 0.7383 0.9052 0.8386 0.8697
Recall 0.8037 0.7045 0.7017 0.9052 0.8386 0.8697
F1值 0.8103 0.7112 0.7163 0.9052 0.8386 0.8697
AUC 0.8985 0.8490 0.8487 0.9494 0.9162 0.9328
RF Precision 0.9621 0.9484 0.9204 0.9393 0.8590 0.9104
Recall 0.7844 0.5880 0.6755 0.9393 0.8590 0.9104
F1值 0.8396 0.6659 0.7463 0.9393 0.8590 0.9104
AUC 0.9975 0.9886 0.9912 0.9969 0.9918 0.9958
MLP Precision 0.9367 0.9133 0.8828 0.9650 0.9347 0.9440
Recall 0.9184 0.8893 0.8574 0.9650 0.9347 0.9440
F1值 0.9256 0.8999 0.8679 0.9650 0.9347 0.9440
AUC 0.9990 0.9950 0.9940 0.9995 0.9970 0.9975
Classification Performance
ROC Curve of Four Algorithms
Classification Result of Four Algorithms
样本数 Precision Recall F1值
样本数 1.000 0 0.417 1 0.379 3 0.430 7*
Precision 1.000 0 0.601 0** 0.819 7***
Recall 1.000 0 0.950 1***
F1值 1.000 0
Correlation Analysis Between the Number of Samples and Classification Result
Automatic Transfer Process of the Mailbox on Government Website
[1] 孙宗锋, 赵兴华. 网络情境下地方政府政民互动研究——基于青岛市市长信箱的大数据分析[J]. 电子政务, 2019(5):12-26.
[1] ( Sun Zongfeng, Zhao Xinghua. A Study on the Interaction Between the Government and the People in the Internet-Based on the Big Data Analysis of the Mayor’s Mailbox of Qingdao[J]. E-Government, 2019(5):12-26.)
[2] 于君博, 李慧龙, 于书鳗. “网络问政”中的回应性——对K市领导信箱的一个探索性研究[J]. 长白学刊, 2018(2):65-74.
[2] ( Yu Junbo, Li Huilong, Yu Shuman. Responsiveness in “Governing Online”—An Exploratory Study on K City’s Leader Mailbox[J]. Changbai Journal, 2018(2):65-74.)
[3] 郑俊田, 郜媛莹, 顾清. 地方政府权力清单制度体系建设的实践与完善[J]. 中国行政管理, 2016(2):6-9.
[3] ( Zheng Juntian, Gao Yuanying, Gu Qing. Practice and Perfection of Local Governmental Administrative Power List System Construction[J]. Chinese Public Administration, 2016(2):6-9.)
[4] 王珺. 基于文本特征识别的电子档案自动归类系统研究[J]. 现代电子技术, 2019,42(18):45-49.
[4] ( Wang Jun. Research on Electronic Archive Automatic Classification System Based on Text Feature Recognition[J]. Modern Electronics Technique, 2019,42(18):45-49.)
[5] 李湘东, 徐朋, 黄莉, 等. 基于KNN算法的文本自动分类方法研究——以学术期刊栏目自动归类为例[J]. 图书情报知识, 2010(4):71-76.
[5] ( Li Xiangdong, Xu Peng, Huang Li, et al. Research of Journals Manuscript Categorization Based on KNN Algorithm[J]. Document, Information & Knowledge, 2010(4):71-76.)
[6] 李成铭. 基于文本特征提取技术的在线人职匹配研究及应用[D]. 成都:电子科技大学, 2017.
[6] ( Li Chengming. Research and Application of Talent Job Online Matching Based on Text Feature Extraction Technology[D]. Chengdu:University of Electronic Science and Technology of China, 2017.)
[7] 王若佳, 张璐, 王继民. 基于机器学习的在线问诊平台智能分诊研究[J]. 数据分析与知识发现, 2019,3(9):88-97.
[7] ( Wang Ruojia, Zhang Lu, Wang Jimin. Automatic Triage of Online Doctor Services Based on Machine Learning[J]. Data Analysis and Knowledge Discovery, 2019,3(9):88-97.)
[8] Kim K, Zzang S Y. Trigonometric Comparison Measure: A Feature Selection Method for Text Categorization[J]. Data & Knowledge Engineering, DOI: 10.1016/j.datak.2018.10.003.
doi: 10.1016/j.datak.2011.03.009 pmid: 21765568
[9] Ghareb A S, Bakara A A Al-Radaideh Q A, et al. Enhanced Filter Feature Selection Methods for Arabic Text Categorization[J]. International Journal of Information Retrieval Research (IJIRR), 2018,8(2):1-24.
[10] Hartmann J, Huppertz J, Schamp C, et al. Comparing Automated Text Classification Methods[J]. International Journal of Research in Marketing, 2019,36(1):20-38.
doi: 10.1016/j.ijresmar.2018.09.009
[11] 田欢, 李红莲, 吕学强, 等. 基于改进BP神经网络的学术活动文本分类[J]. 北京信息科技大学学报(自然科学版), 2018,33(5):38-44.
[11] ( Tian Huan, Li Honglian, Lv Xueqiang, et al. Text Categorization of Academic Activities Based on an Improved BP Neural Network[J]. Journal of Beijing Information Science & Technology University, 2018,33(5):38-44.)
[12] 刘浏, 王东波. 基于论文自动分类的社科类学科跨学科性研究[J]. 数据分析与知识发现, 2018,2(3):30-38.
[12] ( Liu Liu, Wang Dongbo. Identifying Interdisciplinary Social Science Research Based on Article Classification[J]. Data Analysis and Knowledge Discovery, 2018,2(3):30-38.)
[13] Gauld R, Flett J, McComb S, et al. How Responsive are Government Agencies When Contacted by Email? Findings from a Longitudinal Study in Australia and New Zealand[J]. Government Information Quarterly, 2016,33(2):283-290.
doi: 10.1016/j.giq.2016.03.004
[14] 李慧龙, 于君博. 数字政府治理的回应性陷阱——基于东三省“地方领导留言板”的考察[J]. 电子政务, 2019(3):72-87.
[14] ( Li Huilong, Yu Junbo. The Responsive Trap of Digital Government Governance-Based on the Investigation of “Message Board of Local Leaders” in Three Northeastern Provinces[J]. E-Government, 2019(3):72-87.)
[15] Ong C S, Wang S W. Managing Citizen-Initiated Email Contacts[J]. Government Information Quarterly, 2009,26(3):498-504.
doi: 10.1016/j.giq.2008.07.005
[16] 胡佳妮, 徐蔚然, 郭军, 等. 中文文本分类中的特征选择算法研究[J]. 光通信研究, 2005(3):44-46.
[16] ( Hu Jiani, Xu Weiran, Guo Jun, et al. Study on Feature Selection Methods in Chinese Text Categorization[J]. Study on Optical Communications, 2005(3):44-46.)
[17] 张志飞, 苗夺谦, 高灿. 基于LDA 主题模型的短文本分类方法[J]. 计算机应用, 2013,33(6):1587-1590.
doi: 10.3724/SP.J.1087.2013.01587
[17] ( Zhang Zhifei, Miao Duoqian, Gao Can. Short Text Classification Using Latent Dirichlet Allocation[J]. Journal of Computer Applications, 2013,33(6):1587-1590.)
doi: 10.3724/SP.J.1087.2013.01587
[18] Salton G, Wong A, Yang C S. A Vector Space Model for Automatic Indexing[J]. Communications of the ACM, 1975,18(11):613-620.
doi: 10.1145/361219.361220
[19] Manning C, Raghavan P, Schütze H. Introduction to Information Retrieval[M]. Cambridge University Press, 2008.
[20] Breiman L. Random Forests[J]. Machine Learning, 2001,45(1):5-32.
doi: 10.1023/A:1010933404324
[21] Breiman L, Friedman J, Stone C J, et al. Classification and Regression Trees[M]. CRC Press, 1984.
[22] Hinton G E. Connectionist Learning Procedures[J]. Artificial Intelligence, 1989,40(1-3):185-234.
doi: 10.1016/0004-3702(89)90049-0
[1] Xu Yuemei,Liu Yunwen,Cai Lianqiao. Predicitng Retweets of Government Microblogs with Deep-combined Features[J]. 数据分析与知识发现, 2020, 4(2/3): 18-28.
[2] Bengong Yu,Yumeng Cao,Yangnan Chen,Ying Yang. Classification of Short Texts Based on nLD-SVM-RF Model[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[3] Weimin Nie,Yongzhou Chen,Jing Ma. A Text Vector Representation Model Merging Multi-Granularity Information[J]. 数据分析与知识发现, 2019, 3(9): 45-52.
[4] Yunfei Shao,Dongsu Liu. Classifying Short-texts with Class Feature Extension[J]. 数据分析与知识发现, 2019, 3(9): 60-67.
[5] Heran Qin,Liu Liu,Bin Li,Dongbo Wang. Automatic Classification of Ancient Classics with Entity Features[J]. 数据分析与知识发现, 2019, 3(9): 68-76.
[6] Guo Chen,Tianxiang Xu. Sentence Function Recognition Based on Active Learning[J]. 数据分析与知识发现, 2019, 3(8): 53-61.
[7] Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[8] Zhiyong Tao,Xiaobing Li,Ying Liu,Xiaofang Liu. Classifying Short Texts with Improved-Attention Based Bidirectional Long Memory Network[J]. 数据分析与知识发现, 2019, 3(12): 21-29.
[9] Yuman Li,Zhibo Chen,Fu Xu. Classifying Texts with KACC Model[J]. 数据分析与知识发现, 2019, 3(10): 89-97.
[10] Zixuan Zhang,Hao Wang,Liping Zhu,Sanhong eng. Identifying Risks of HS Codes by China Customs[J]. 数据分析与知识发现, 2019, 3(1): 72-84.
[11] Xinlei Li,Hao Wang,Xiaomin Liu,Sanhong Deng. Comparing Text Vector Generators for Weibo Short Text Classification[J]. 数据分析与知识发现, 2018, 2(8): 41-50.
[12] Liu Liu,Dongbo Wang. Identifying Interdisciplinary Social Science Research Based on Article Classification[J]. 数据分析与知识发现, 2018, 2(3): 30-38.
[13] Xiangdong Li,Tao Ruan,Kang Liu. Automatic Classification of Documents from Wikipedia[J]. 数据分析与知识发现, 2017, 1(10): 43-52.
[14] Yonghe Lu,Jinghuang Chen. Optimizing Feature Selection Method for Text Classification with Shuffled Frog Leaping Algorithm[J]. 数据分析与知识发现, 2017, 1(1): 91-101.
[15] Qun Zhang, Hongjun Wang, Lunwen Wang. Classifying Short Texts with Word Embedding and LDA Model[J]. 数据分析与知识发现, 2016, 32(12): 27-35.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn