Data Analysis and Knowledge Discovery

Select

Review of Techniques Detecting Online Extremism and Radicalization

Wang Xin,Feng Wen’gang

Data Analysis and Knowledge Discovery. 2018, 2(10): 2-8. https://doi.org/10.11925/infotech.2096-3467.2018.0742

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper reviews the technical solutions for detecting online extremism and radicalization. [Methods] First, we retrieved the needed literature by conducting keyword search with several popular academic databases. Then, we reviewed these papers and summarized their theoretical frameworks, data sources, labelling method, and algorithms. [Results] Researchers have obtained insights from the latest psychology and sociology studies, which helped them refine the detection indicators and methods. The two popular techniques used in this field were based on lexicon method and machine learning algorithm. Although machine-learning methods had the advantages of better accuracy and faster speed, it is very hard for us to construct the training data sets. [Limitations] We did not compare the effectiveness of different solutions. [Conclusions] The reviewed techniques are still developing and more quantitative research is required to analyze the radicalization process. We need to co-operate with sociology and psychology researchers to develop new models and better training data sets.

Select

Using Bayes Theory to Classify Counter Terrorism Intelligence

Li Yongnan

Data Analysis and Knowledge Discovery. 2018, 2(10): 9-14. https://doi.org/10.11925/infotech.2096-3467.2018.0708

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study modifies Naive Bayes Classifier according to the features of counterterrorism intelligence, aiming to provide a simple and practical way to categorize these data. [Methods] Firstly, we deleted the outliers of terrorism related data, discretized continuous attributes, as well as finished reduction of data with high level correlation. Secondly, we computed conditional probabilities of different attributes. Lastly, we classified new sample dataset based on maximum posteriori hypothesis. [Results] After categorizing the data, we raised probability threshold to partially offset the influence of the data dependence. Only some data of high-level sensitivity needs to be process manually. [Limitations] This method has some restrictions on data independence. In practice, it must be combined with other classification method such as decision tree to cover more intelligence data, and provide information for early warning. [Conclusions] The proposed method, which increases the efficiency of intelligence analysis, is ease of use and has fewer restrictions on the intelligence analysts.

Select

Predicting Crime Locations Based on Long Short Term Memory and Convolutional Neural Networks

Xiao Yanhui,Wang Xin,Feng Wen’gang,Tian Huawei,Wu Shaozhong,Li Lihua

Data Analysis and Knowledge Discovery. 2018, 2(10): 15-20. https://doi.org/10.11925/infotech.2096-3467.2018.0741

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to predict the locations of suspects based on historical activity trajectory data, aiming to locate, track, monitor or arrest the suspects. [Methods] First, we proposed long short term memory (LSTM) and convolutional neural networks (CNN) models to predict crime locations. Then, we used the CNN model to extrct location features of key suspects and analyze their spatial correlations. Finally, we utlized the LSTM model to maintain the temporal continuity and obtain the future locations. [Results] Compared with previous models, the proposed method increased the prediction accuracy from 0.71 to 0.79 with the trajectory GeoLife dataset. [Limitations] The model was only examined with the Geolife dataset. [Conclusions] The proposed method fully exploits the spatial correlation and temporal continuity of data, which improves the effectiveness of public security intelligence analysis.

Select

Risk Assessment of Civil Aviation Terrorism Based on K-means Clustering

Liu Minghui

Data Analysis and Knowledge Discovery. 2018, 2(10): 21-26. https://doi.org/10.11925/infotech.2096-3467.2018.0768

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective]This paper tries to assess the terrorism risks facing civil aviation industry quantitatively and objectively. [Methods] We proposed a risk assessment model based on K-means clustering, and then examined it with the data of terrorist attacks from 1992 to 2015. We calculated the risk of different types of attacks and their targets objectively. [Results] The risk of aircraft bombing, armed assault against the airport and airline staff were the highest, the risk of hijacking, bombing/explosion aginst the airport or airline staff were at medium level, and the risk of other attacks were relatively low. We used this method to predict the risk of terrorist attacks against the civil aviation in 2016, and the prediction accuracy was up to 92.3%. [Limitations] The proposed method for risk assessment is only suitable for processing numerical data. [Conclusions] The K-means clustering method can assess risk based on statistical data without human intervention, which could be applied to similar studies.

Select

Risk Assessment and Decision Analysis of Civil Aviation Security with Risk Ranking and Decision Tree

Feng Wen’gang,Li Yan,Li Fuhai,Wang Xin,Zhou Xiping

Data Analysis and Knowledge Discovery. 2018, 2(10): 27-36. https://doi.org/10.11925/infotech.2096-3467.2018.0763

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper conducts risk assessment and decision-making analysis of civil aviation, aiming to address the security challenges facing this industry. [Methods] Based on the risk assessment results of civil aviation, we built the decision tree for civil aviation counter-terrorism, which examined the probabilities, deterrence effect, substitution effect, effectiveness of countermeasures and consequences of the potential civil aviation terrotist attacks. [Results] We evaluated the effects of various countermeasures based on the analysis of the potential terrorist attack threats. [Limitations] Only examined the proposed model with terrorist incidents happened in the past, which is difficult to measure future events. [Conclusions] This paper studies the attributes of possible terrorist attacks against the civil aviation system, including their probabilities, countermeasures, and the consequences.

Select

Optimizing Anti-terrorist Policing with Queueing Theory

Liu Zhongyi,Hu Chenwang,Tan Kun,Gao Yan

Data Analysis and Knowledge Discovery. 2018, 2(10): 37-45. https://doi.org/10.11925/infotech.2096-3467.2018.0769

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper optimizes the deployment of anti-terrorist police resources based on the queueing theory, aiming to improve the effectiveness of counterterrorism actions. [Methods] First, we proposed two optimal anti-terrorist policing strategies based on the M/M/1/∞ and M/M/N/∞ queueing models. Then, we compared the performance of the two models with simulation cases on four factors. [Results] We found that the M/M/N/∞ model had better performance. [Limitations] We did not examine the proposed model with real world anti-terrorism and policing data. [Conclusions] The M/M/N/∞ queueing model could help us create better anti-terrorist policing strategies.

Select

Early Warning for Civil Aviation Security Checks Based on Deep Learning

Feng Wengang,Huang Jing

Data Analysis and Knowledge Discovery. 2018, 2(10): 46-53. https://doi.org/10.11925/infotech.2096-3467.2018.0812

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective]This paper proposes a hierarcical classification screening method, aiming to improve the airport security system and passenger experience. [Methods] We proposed a feature deep learning method based on the civil aviation and public security databases. Then, we trained the deep neural network of three layers: time seriers, space seriers, and environmental features to obtain the joint representing feature of passenger risk factors. Finally, we generated the early warning models for passenger security check. [Results] The proposed early warning models could relieve the pressure of civil aviation security check. [Limitaions] More research is needed to examine the proposed model with data from small airports. [Conclusions] The early-warning model based on deep learning could effectively improve work efficiency of airport security checks and passanger experience.

Select

Dividing Time Windows of Dynamic Topic Model

Wang Tingting,Wang Yu,Qin Linjie

Data Analysis and Knowledge Discovery. 2018, 2(10): 54-64. https://doi.org/10.11925/infotech.2096-3467.2018.0196

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a Document Influence Model (DIM) based on Dynamic Automatic Time, aiming to solve the time window dividing issue of dynamic topic model. [Methods] Firstly, we processed the text corpora with the traditional LDA model and word vector model. Secondly, we constructed a comprehensive index reflecting the differences between time windows and similarity within the time windows. Finally, we built a new model based on this index and conducted an empirical study with news corpus of the “Belt and Road” International Cooperation Summit Forum. [Results] The proposed model could quickly and effectively divide the time windows, which not only ensured the comparability of the topics under different windows, but also evaluated the influence factors of the document. [Limitations] We built the similarity index of time windows based on the traditional LDA model, which could be improved by the latest LDA models. [Conclusions] The new model is able to divide the time series text effectively, which improves the performance of traditional dynamic topic model.

Select

Predicting Credit Risks of P2P Loans in China Based on Ensemble Learning Methods

Cao Wei,Li Can,He Tingting,Zhu Weidong

Data Analysis and Knowledge Discovery. 2018, 2(10): 65-76. https://doi.org/10.11925/infotech.2096-3467.2018.0026

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper examines several popular ensemble-learning methods with real-world data, aiming to find the most suitable way to monitor the P2P credit risks facing China. [Methods] We extracted the borrower’s features from five aspects, and identified the most remarkable ones with Random Forest method. Then, we compared the prediction models based on four ensemble-learning methods and five base classifiers. [Results] We found that the Rotation Forest method had the highest accuracy rate of 99.32% and the lowest error rate of 1.71% . Feature selection processing based on Random Forest could improve the performance of all related models significantly. [Limitations] The sample dataset needs to be expanded. [Conclusions] The proposed method could identify credit risks more effectively.

Select

Recognizing Metaphor with Convolution Neural Network and SVM

Huang Xiaoxi,Li Hanyu,Wang Rongbo,Wang Xiaohua,Chen Zhiqun

Data Analysis and Knowledge Discovery. 2018, 2(10): 77-83. https://doi.org/10.11925/infotech.2096-3467.2018.0114

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper presents a new method to recognize metaphor, from the Chinese and English datasets. [Methods] First, we mapped the experimental dataset to vector space, which was also input to a convolutional neural network along with the property and keyword features. Then, we extracted the needed features with the help of convolutional and pooled layers, as well as classified them using SVM. Finally, we combined the Max-Pooling and Mean-Pooling to improve the extracted features’ accuracy. [Results] Compared with the traditional models, our method increased the accuracy of extracted features from the corpus of English verb-object, English adjective-noun and Chinese metaphor by 4.12%, 0.84% and 4.50% respectively. [Limitations] The Chinese word segmentation affects the training of word vector model. We need to add more layers to the convolutional neural networks. [Conclusions] The proposed method could effectively identify metaphor from Chinese and English corpus.

Select

Comparing on Community Detection Algorithms for Information Mining

Chen Yunwei,Zhang Ruihong

Data Analysis and Knowledge Discovery. 2018, 2(10): 84-94. https://doi.org/10.11925/infotech.2096-3467.2018.0542

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper compares community detection algorithms in the field of complex network analysis, aiming to support related information science studies. [Methods] First, we identified the similarities and differences of several community detection algorithms (i.e. theoretical frameworks and calculation methods). Then, we examined these algorithms with small data sets. Third, we expanded the sample size, and evaluated the performance of Louvain algorithm, Louvain algorithm with multilevel refinement, and the SLM algorithm with the collaboration and citation networks. [Results] On small dataset, the detection results of GN and FN algorithms were similar, and the results of SLM algorithm were better than those of the Louvain algorithm and Louvain algorithm with multilevel refinement. In the field of library and information science, setting the resolution at 0.5 could help us analyze the detection results. The results of SLM algorithm were different to those of the Louvain algorithm or Louvain algorithm with multilevel refinement. Results of the latter two were almost the same, which were different with the resolution of 1.0. [Limitations] The dataset needs to be expanded. [Conclusions] The Louvain algorithm, Louvain algorithm with multilevel refinement and SLM algorithm are better than traditional algorithms. Among them, the SLM algorithm is the best option for us to analyze the community of citation network.

Select

Constructing Sentiment Dictionary with Deep Learning: Case Study of Financial Data

Hu Jiaheng,Cen Yonghua,Wu Chengyao

Data Analysis and Knowledge Discovery. 2018, 2(10): 95-102. https://doi.org/10.11925/infotech.2096-3467.2018.0169

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a new method to construct a working sentiment dictionary for sentiment analysis in the field of finance. [Methods] Our method built a sentiment dictionary based on the characteristics of corpus and knowledge base. It also mapped the textual information into vector space using word vector method. With the help of existing general sentiment dictionary, we automatically indexed the training corpus, and created training and forecasting sets with a ratio of 9: 1. Finally, we used Python to establish the neural network classifier of deep learning, and evaluated the emotional polarity of the candidate words in the new dictionary. [Results] The accuracy of the proposed neural network classifier with the training set was 95.02%, while the accuracy with the forecasting set was 95.00%. Our results are better than the existing models. [Limitations] The method of extracting seed words could be further optimized. [Conclusions] The proposed method increases the size of corpus to train the neural network classifiers more effectively. It also extracts the emotion information from the semantic relevance of word vectors. The new sentiment dictionary provides possible directions for future research.

Select

Xu Jianmin,Xu Caiyun

Data Analysis and Knowledge Discovery. 2018, 2(10): 103-109. https://doi.org/10.11925/infotech.2096-3467.2018.0211

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a new method to calculate the similarity of science and technology documents combining the information of texts and formulas, aiming to improve the performance of traditional methods. [Methods] Firstly, we mapped feature elements of single formula into position vector, which helped us calculate the similarity of single formula. Secondly, we computed the coverage and similarity of formula between documents. Finally, the similarity of science and technology documents were calculated by combining information of texts and formulas. [Results] We compared the classification results of the new method and the traditional ones. We found that the macro average F-score of the new method was increased by 6.7%. [Limitations] The test sets do not collect formula information of documents, which need to be expanded. [Conclusions] The new method could calculate document similarity more accurately.

Please choose a citation manager

Content to export

25 October 2018, Volume 2 Issue 10

模态框（Modal）标题

Please choose a citation manager

Content to export

25 October 2018, Volume 2 Issue 10