Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (11): 84-91     https://doi.org/10.11925/infotech.2096-3467.2020.0536
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于LSTM网络的盗窃犯罪时间序列预测研究*
颜靖华1,2,3(),侯苗苗3
1中国科学院文献情报中心 北京 100190
2中国科学院大学经济与管理学院图书情报与档案管理系 北京 100190
3中国人民公安大学信息网络安全学院 北京 100038
Predicting Time Series of Theft Crimes Based on LSTM Network
Yan Jinghua1,2,3(),Hou Miaomiao3
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China
2Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
3School of Information and Network Security, People’s Public Security University of China, Beijing 100038, China
全文: PDF (2975 KB)   HTML ( 20
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 研究日盗窃犯罪数量的预测问题。【方法】 基于LSTM网络,利用中国北方某大型城市2005年1月1日至2007年2月24日以及2009年1月1日至2011年1月7日的每日实际盗窃犯罪数据,设置三个算例分别进行时间序列预测研究及验证,并与ARIMA、支持向量回归、随机森林以及XGBoost方法的预测结果进行对比。【结果】 LSTM网络模型能够较好地预测日盗窃犯罪数量的变化趋势,三个算例中的百分比均方根误差分别为18.4%、11.7%、41.9%,性能均优于ARIMA、支持向量回归、随机森林以及XGBoost模型。【局限】 对日盗窃犯罪数量波动较大时段的预测需要开展进一步研究。【结论】 本文的研究结果预期可以为社区安全防范措施的调整、巡逻警力测算与部署等具体业务工作提供决策支持。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
颜靖华
侯苗苗
关键词 犯罪预测时间序列LSTM网络盗窃    
Abstract

[Objective] This paper tries to predict the daily number of theft activities. [Methods] We used LSTM network to analyze theft data from a large city in north China. First, we retrieved our data from January 1, 2005 to February 24, 2007 and from January 1, 2009 to January 7, 2011, respectively. Then, we set three different cases to examine the time series prediction of the daily number. Finally, we compared our results with those of ARIMA, Support Vector Regression, Random Forest and XGBoost with the same data set. [Results] The percentage root mean square error (PRMSE) of our model were 18.4%, 11.7% and 41.9%, respectively, which were better than those of ARIMA, Support Vector Regression, Random Forest or XGBoost model. [Limitations] More research is needed to predict the period when the number of theft crimes fluctuates dramatically. [Conclusions] The proposed model could improve the decision makings for community safety, police patrol and other specific missions.

Key wordsCrime Prediction    Time Series    LSTM Network    Theft
收稿日期: 2020-06-08      出版日期: 2020-12-04
ZTFLH:  D917  
基金资助:*本文系国家重点研发计划项目(2018YFC0809700);公安部科技强警基础工作专项项目的研究成果之一(2018GABJC01)
通讯作者: 颜靖华     E-mail: yanjinghua@ppsuc.edu.cn
引用本文:   
颜靖华,侯苗苗. 基于LSTM网络的盗窃犯罪时间序列预测研究*[J]. 数据分析与知识发现, 2020, 4(11): 84-91.
Yan Jinghua,Hou Miaomiao. Predicting Time Series of Theft Crimes Based on LSTM Network. Data Analysis and Knowledge Discovery, 2020, 4(11): 84-91.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0536      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I11/84
Fig.1  LSTM模型的构建及预测性能评估过程[13,14,15]
Fig.2  A市盗窃案件数量的时间分布
特征名称 特征取值
month 当前时刻所处月份
weekend 0-工作日,1-非工作日
holiday 0-非节假日,1-节假日
weekday_avg 每工作日盗窃犯罪案件数量均值
weekend_avg 每非工作日盗窃犯罪案件数量均值
month_avg 每月盗窃犯罪案件数量均值
count_lag1 上一时刻的盗窃犯罪案件数量
Table 1  数据特征
训练集 ADF检验 1%检验
临界值
t统计量 p
{X1t} -3.91 0.00 -3.44
{X2t}差分前 -2.82 0.06 -3.44
{X2t}差分后 -14.80 0.00 -3.44
{X3t} -3.77 0.00 -3.44
Table 2  盗窃数据时间序列的ADF单位根检验结果
Fig.3  LSTM模型拟合结果
测试集 预测模型 PRMSE
算例1 LSTM 18.4%
ARIMA(4, 1, 1) 32.4%
支持向量回归 24.4%
随机森林 24.1%
XGBoost 25.3%
算例2 LSTM模型 11.7%
ARIMA(1, 1, 2) 19.6%
支持向量回归 15.1%
随机森林 20.0%
XGBoost 20.3%
算例3 LSTM模型 41.9%
ARIMA(4, 1, 1) 65.5%
支持向量回归 76.8%
随机森林 82.6%
XGBoost 84.9%
Table 3  模型预测效果对比
[1] 韩一士, 范英盛, 李国军, 等. 基于ARIMA模型的通讯网络诈骗犯罪增长趋势分析——以浙江省衢州市为例[J]. 理论观察, 2017(5):101-103.
[1] ( Han Yishi, Fan Yingsheng, Li Guojun, et al. Analysis on the Growth Trend of Communication Network Fraud Based on ARIMA Model——Take Quzhou City of Zhejiang Province as an Example[J]. Theoretic Observation, 2017(5):101-103.)
[2] 屈茂辉, 郝士铭 . 基于 ARMA 模型的中国财产类犯罪人数预测研究[J]. 中国刑事法杂志, 2013,23(4):100-106.
[2] ( Qu Maohui, Hao Shiming. Research on the Prediction of the Number of Property Crimes in China Based on ARMA Model[J]. Criminal Science, 2013,23(4):100-106.)
[3] Gorr W, Olligschlaeger A, Thompson Y. Short-Term Forecasting of Crime[J]. International Journal of Forecasting, 2003,19(4):579-594.
doi: 10.1016/S0169-2070(03)00092-X
[4] 卢睿. 基于灰色理论的网络犯罪形势预测[J]. 警察技术, 2015 ( 4):62-64.
[4] ( Lu Rui. Prediction of Cybercrime Situation Based on Grey Theory[J]. Police Technology, 2015(4):62-64.)
[5] Alves L G, Ribeiro H V, Rodrigues F A. Crime Prediction Through Urban Metrics and Statistical Learning[J]. Physica A: Statistical Mechanics and Its Applications, 2018,505:435-443.
doi: 10.1016/j.physa.2018.03.084
[6] 陈鹏, 胡啸峰, 陈建国. 基于模糊信息粒化的支持向量机在犯罪时序预测中的应用[J]. 科学技术与工程, 2015,15(35):54-57, 63.
[6] ( Chen Peng, Hu Xiaofeng, Chen Jianguo. The Application of Fuzzy Information Granulation and Support Vector Machine in Crime Forecasting[J]. Science Technology and Engineering, 2015,15(35):54-57, 63.)
[7] 于红志, 刘凤鑫, 邹开其. 改进的模糊BP神经网络及在犯罪预测中的应用[J]. 辽宁工程技术大学学报(自然科学版), 2012(2):244-247.
[7] ( Yu Hongzhi, Liu Fengxin, Zou Kaiqi. Improved Fuzzy BP Neural Network and Its Application in Crime Prediction[J]. Journal of Liaoning Technical University(Natural Science), 2012(2):244-247.)
[8] Dash S K, Safro I, Srinivasamurthy R S. Spatio-temporal Prediction of Crimes Using Network Analytic Approach[C]// Proceedings of 2018 IEEE International Conference on Big Data. DOI: 10.1109/BigData.2018.8622041.
[9] 肖延辉, 王欣, 冯文刚, 等. 基于长短记忆型卷积神经网络的犯罪地理位置预测方法[J]. 数据分析与知识发现, 2018,2(10):15-20.
[9] ( Xiao Yanhui, Wang Xin, Feng Wen’gang, et al. Predicting Crime Locations Based on Long Short Term Memory and Convolutional Neural Networks[J]. Data Analysis and Knowledge Discovery, 2018,2(10):15-20.)
[10] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997,9(8):1735-1780.
pmid: 9377276
[11] Breuel T M. High Performance Text Recognition Using a Hybrid Convolutional-LSTM Implementation[C]// Proceedings of the 14th IAPR International Conference on Document Analysis & Recognition. 2017: 11-16.
[12] Sun L F, Kang S K, Li K, et al. Voice Conversion Using Deep Bidirectional Long Short-Term Memory Based Recurrent Neural Networks[C]// Proceedings of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015: 4869-4873.
[13] 冯晨, 陈志德. 基于XGBoost和LSTM加权组合模型在销售预测的应用[J]. 计算机系统应用, 2019,28(10):226-232.
[13] ( Feng Chen, Chen Zhide. Application of Weighted Combination Model Based on XGBoost and LSTM in Sales Forecasting[J]. Computer Systems and Applications, 2019,28(10):226-232.)
[14] 沈寒蕾, 张虎, 张耀峰, 等. 基于长短期记忆模型的入室盗窃犯罪预测研究[J]. 统计与信息论坛, 2019,34(11):107-115.
[14] ( Shen Hanlei, Zhang Hu, Zhang Yaofeng, et al. Prediction of Burglary Crime Based on LSTM[J]. Statistics and Information Forum, 2019,34(11):107-115.)
[15] 周瑞, 魏正英, 张育斌, 等. 基于LSTM递归神经网络的番茄目标产量时间序列预测[J]. 节水灌溉, 2018(8):66-70.
[15] ( Zhou Rui, Wei Zhengying, Zhang Yubin, et al. Time Series Prediction of Tomato Yield Based on LSTM Recurrent Neural Network[J]. Water Saving Irrigation, 2018(8):66-70.)
[16] Box G E, Jenkins G M, Reinsel G C, et al. Time Series Analysis: Forecasting and Control[M]. John Wiley & Sons, 2015.
[17] 龚安, 马光明, 郭文婷, 等. 基于LSTM循环神经网络的核电设备状态预测[J]. 计算机技术与发展, 2019,29(10):41-45.
[17] ( Gong An, Ma Guangming, Guo Wenting, et al. Nuclear Power Equipment Status Prediction Based on LSTM Recurrent Neural Network[J]. Computer Technology and Development, 2019,29(10):41-45.)
[18] Ma X L, Tao Z M, Wang Y H, et al. Long Short-Term Memory Neural Network for Traffic Speed Prediction Using Remote Microwave Sensor Data[J]. Transportation Research Part C: Emerging Technologies, 2015,54:187-197.
doi: 10.1016/j.trc.2015.03.014
[19] Srivastava N, Mansimov E, Salakhudinov R. Unsupervised Learning of Video Representations Using LSTMs[C]// Proceedings of the 32nd International Conference on Machine Learning. 2015: 843-852.
[20] Hu X F, Li D, Huang H, et al. Modeling and Sensitivity Analysis of Transport and Deposition of Radionuclides from the Fukushima Dai-ichi Accident[J]. Atmospheric Chemistry & Physics, 2014,14(20):11065-11092.
[21] Drucker H, Burges C J C, Kaufman L, et al. Support Vector Regression Machines[C]// Proceedings of the 9th International Conference on Neural Information Processing Systems. 1996.
[22] Liaw A, Wiener M. Classification and Regression by Random Forest[J]. R News, 2002,2(3):18-22.
[23] Chen T Q, Guestrin C. XGBoost: A Scalable Tree Boosting System[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016: 785-794.
[1] 丁浩, 艾文华, 胡广伟, 李树青, 索炜. 融合用户兴趣波动时序的个性化推荐模型*[J]. 数据分析与知识发现, 2021, 5(11): 45-58.
[2] 丁浩,李树青. 基于用户多类型兴趣波动趋势预测分析的个性化推荐方法 *[J]. 数据分析与知识发现, 2019, 3(11): 43-51.
[3] 王伟军, 鲍丽倩, 刘凯. 时间维度的云服务发展态势研究[J]. 现代图书情报技术, 2014, 30(3): 42-48.
[4] 洪娜, 张智雄, 乐小虬. 基于能量演化线索的潜在爆发词探测方法[J]. 现代图书情报技术, 2010, 26(11): 45-52.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn