Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (2/3): 385-395    DOI: 10.11925/infotech.2096-3467.2021.0949
Automatic Identifying Abnormal Behaviors of International Journals
Wu Jinhong(),Mu Keliang()
School of Management, Wuhan Textile University, Wuhan 430200, China
[Objective] This paper creates an early warning mechanism for international journals, aiming to predict their quality changes and help researchers choose better publishing platforms. [Methods] We constructed an early-warning index system for scholarly journals with their impact strength, influencing timeline, characteristics, and author demographics. Then, we combined Pearson correlation coefficient and the important values of XGBoost to select features. Third, we analyzed the features with XGBoost, SVM, logistic regression, and Stacking fusion to identify the abnormal behaviors. Finally, we ranked these features with XGBoost information gain. [Results] We examined our method with three sample datasets from medical and scientific journals. The generalization of the model could be improved with feature screening, which could also slightly reduce the early warning performance. Feature screening and expansion could improve the accuracy of the early warning model. The self-citation and submission acceptance rates play significant roles for the model. [Limitations] Due to the actual acquisition of data, the range of disciplines involved is small and the training data is small, and journal features related to article processing charge are not included. [Conclusions] The proposed model could help institutions and researchers improve decision making on the quality of international journals.

Key wordsEarly Warning of International Journals      Feature Selection      Indicator System      Machine Learning     
Received: 31 August 2021      Published: 14 April 2022
ZTFLH:  TP18  
Fund:Preliminary Funded Project of 2020 Hubei Social Science Foundation(20ZD053)
Corresponding Authors: Wu Jinhong,ORCID:0000-0003-2903-6372,Mu Keliang,ORCID:0000-0002-6264-9732     E-mail:;

Wu Jinhong, Mu Keliang. Automatic Identifying Abnormal Behaviors of International Journals. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 385-395.

特征维度 指标变量 变量解释
期刊影响力强度 影响因子X1 某期刊前两年发表的论文在该报告年份(JCRyear)中被引用总次数与该期刊在这两年内发表的论文总数之比
5年影响因子X2 某期刊前5年发表的论文在统计当年的被引用总次数与该期刊在前5年内发表的论文总数之比
去除自引影响因子X3 某期刊去除自引后的影响因子
特征因子X4 引证次数与论文总数的比值
即年指数X5 某期刊在指定年被引用的总次数与该刊在指定年发表的论文总数之比
论文影响分值X6 特征因子除以该刊所发表的论文标准化比值
SJR X7 既考虑了期刊被引数量,又考虑了期刊被引质量的指标
SNIP X8 某期刊过去3年的论文当年被引用次数与过去3年的论文总数之比
标准化特征因子X9 特征因子标准化后的结果
CiteScore X10 某期刊前3年发表的论文在统计当年的被引用总次数与该期刊在前3年内发表的论文总数之比
期刊影响力时效性 被引半衰期X11 某期刊论文在某年被引用的全部次数中,较新的一半被引论文发表的时间跨度
引用半衰期X12 某期刊引用的全部参考文献中,较新一半发表的时间跨度
期刊特性 审稿周期X13 投稿到审稿至杂志社通知文章被录用并寄出用稿通知的时间跨度
投稿命中率X14 作者投稿某期刊被录用的概率
载文量X15 某期刊在一定时期内所刊载的相关学科论文数量
自引率X16 某期刊在一定时期内自引次数在该期刊总被引次数中所占的比例
作者来源 单一国家发文比例X17 某一国家在某期刊发文所占比例
Dimensions and Corresponding Indicators of Early Warning of International Journals
Heat Map of Index Correlations of Medical Journals
Heat Map of Index Correlation of Engineering Science and Technology Journals
医学类期刊 工程科技类期刊
指标名称 重要性系数 指标名称 重要性系数
单一国家发文占比X17 0.483 1 单一国家发文占比X17 0.258 5
自引率X16 0.126 2 影响因子X1 0.227 3
载文量X15 0.056 5 自引率X16 0.081 8
CiteScore X10 0.056 3 SJR X7 0.058 4
即年指数X5 0.046 3 CiteScore X10 0.057 2
投稿命中率X14 0.040 8 去除自引影响因子X3 0.054 3
SJR X7 0.038 0 引用半衰期X12 0.044 8
引用半衰期X12 0.029 3 5年影响因子X2 0.037 2
去除自引影响因子X3 0.027 5 载文量X15 0.033 5
影响因子X1 0.023 4 特征因子X4 0.032 8
SNIP X8 0.022 5 被引半衰期X11 0.030 5
特征因子X4 0.015 2 投稿命中率X14 0.025 5
论文影响分值X6 0.015 0 SNIP X8 0.020 9
审稿周期X13 0.009 9 审稿周期X13 0.017 7
被引半衰期X11 0.008 9 论文影响分值X6 0.009 7
标准化特征因子X9 0.001 0 即年指数X5 0.008 4
5年影响因子X2 0.000 0 标准化特征因子X9 0.001 2
The Importance Score of Each Indicator
A Framework for Identification and Early Warning of Abnormal Behavior in International Journals
期刊A 载文量 影响因子 即年指数 被引
特征因子 自引率 SNIP CiteScore 审稿周期 投稿
2013 204 1.095 0.113 4.7 7.4 0.006 2 0.042 0.65 2.0 3.91 0.605 0.429
2014 146 1.438 0.247 4.8 8.3 0.006 7 0.026 0.75 2.5 3.91 0.605 0.451
2015 89 1.431 0.303 5.2 8.4 0.006 3 0.003 0.79 3.2 3.91 0.605 0.475
2016 177 1.323 0.124 6.1 8.1 0.005 0 0.022 0.78 2.9 3.91 0.605 0.500
2017 211 1.023 0.180 7.0 8.0 0.004 0 0.015 0.61 2.0 3.91 0.605 0.527
2018 549 1.351 0.255 7.1 8.2 0.003 7 0.027 0.61 1.2 3.91 0.605 0.525
2019 655 1.287 0.188 7.1 7.6 0.004 7 0.026 0.63 1.4 3.91 0.605 0.614
2020 1 346 1.671 0.189 4.9 6.9 0.006 9 0.035 0.66 1.4 3.91 0.605 0.691
Original Indicator Matrix of Journal A
样本数据 指标 LR SVM DecisionTree RandomForest XGBoost SLX-Stacking
原始 Accuracy 0.959 0.947 0.913 0.924 0.965 0.953
Recall 0.825 0.901 0.851 0.876 0.938 0.925
F1-Score 0.955 0.941 0.901 0.916 0.962 0.949
特征筛选 Accuracy 0.947 0.924 0.913 0.913 0.959 0.947
Recall 0.925 0.876 0.851 0.888 0.938 0.925
F1-Score 0.943 0.916 0.901 0.905 0.955 0.943
特征筛选并拓展 Accuracy 0.966 0.966 0.940 0.933 0.973 0.966
Recall 0.986 0.986 0.891 0.918 0.986 0.986
F1-Score 0.966 0.966 0.936 0.931 0.973 0.966
Classification Accuracy of Early Warning Models for Medical Journals
样本数据 指标 LR SVM DecisionTree RandomForest XGBoost SLX-Stacking
原始 Accuracy 0.923 0.923 0.769 0.365 0.923 0.923
Recall 0.878 0.878 0.636 0.000 0.878 0.878
F1-Score 0.935 0.935 0.777 0.000 0.935 0.935
特征筛选 Accuracy 0.903 0.884 0.769 0.365 0.903 0.923
Recall 0.848 0.818 0.636 0.000 0.848 0.878
F1-Score 0.918 0.900 0.777 0.000 0.918 0.935
特征筛选并拓展 Accuracy 0.955 0.911 0.822 0.533 0.911 0.955
Recall 0.916 0.833 0.708 1.000 0.875 0.916
F1-Score 0.956 0.909 0.809 0.695 0.913 0.956
Classification Accuracy of Early Warning Models for Engineering Science and Technology Journals
ROC Curve of Early Warning Model for Medical Journals
ROC Curve of Early Warning Model for Engineering Science and Technology Journals
预警等级 概率 P
高风险 [80%,100%]
中风险 [60%,80%)
低风险 [40%,60%)
正常 [0%,40%)
Classification of Warning Levels
Ranking of the Importance of Features in Medical Journals
Ranking of the Importance of Features in Engineering Science and Technology Journals
