Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (2/3): 385-395    DOI: 10.11925/infotech.2096-3467.2021.0949
Current Issue | Archive | Adv Search |
Automatic Identifying Abnormal Behaviors of International Journals
Wu Jinhong(),Mu Keliang()
School of Management, Wuhan Textile University, Wuhan 430200, China
Download: PDF (1792 KB)   HTML ( 14
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper creates an early warning mechanism for international journals, aiming to predict their quality changes and help researchers choose better publishing platforms. [Methods] We constructed an early-warning index system for scholarly journals with their impact strength, influencing timeline, characteristics, and author demographics. Then, we combined Pearson correlation coefficient and the important values of XGBoost to select features. Third, we analyzed the features with XGBoost, SVM, logistic regression, and Stacking fusion to identify the abnormal behaviors. Finally, we ranked these features with XGBoost information gain. [Results] We examined our method with three sample datasets from medical and scientific journals. The generalization of the model could be improved with feature screening, which could also slightly reduce the early warning performance. Feature screening and expansion could improve the accuracy of the early warning model. The self-citation and submission acceptance rates play significant roles for the model. [Limitations] Due to the actual acquisition of data, the range of disciplines involved is small and the training data is small, and journal features related to article processing charge are not included. [Conclusions] The proposed model could help institutions and researchers improve decision making on the quality of international journals.

Key wordsEarly Warning of International Journals      Feature Selection      Indicator System      Machine Learning     
Received: 31 August 2021      Published: 14 April 2022
ZTFLH:  TP18  
Fund:Preliminary Funded Project of 2020 Hubei Social Science Foundation(20ZD053)
Corresponding Authors: Wu Jinhong,ORCID:0000-0003-2903-6372,Mu Keliang,ORCID:0000-0002-6264-9732     E-mail: 14514576@qq.com;704266922@qq.com

Cite this article:

Wu Jinhong, Mu Keliang. Automatic Identifying Abnormal Behaviors of International Journals. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 385-395.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0949     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I2/3/385

特征维度 指标变量 变量解释
期刊影响力强度 影响因子X1 某期刊前两年发表的论文在该报告年份(JCRyear)中被引用总次数与该期刊在这两年内发表的论文总数之比
5年影响因子X2 某期刊前5年发表的论文在统计当年的被引用总次数与该期刊在前5年内发表的论文总数之比
去除自引影响因子X3 某期刊去除自引后的影响因子
特征因子X4 引证次数与论文总数的比值
即年指数X5 某期刊在指定年被引用的总次数与该刊在指定年发表的论文总数之比
论文影响分值X6 特征因子除以该刊所发表的论文标准化比值
SJR X7 既考虑了期刊被引数量,又考虑了期刊被引质量的指标
SNIP X8 某期刊过去3年的论文当年被引用次数与过去3年的论文总数之比
标准化特征因子X9 特征因子标准化后的结果
CiteScore X10 某期刊前3年发表的论文在统计当年的被引用总次数与该期刊在前3年内发表的论文总数之比
期刊影响力时效性 被引半衰期X11 某期刊论文在某年被引用的全部次数中,较新的一半被引论文发表的时间跨度
引用半衰期X12 某期刊引用的全部参考文献中,较新一半发表的时间跨度
期刊特性 审稿周期X13 投稿到审稿至杂志社通知文章被录用并寄出用稿通知的时间跨度
投稿命中率X14 作者投稿某期刊被录用的概率
载文量X15 某期刊在一定时期内所刊载的相关学科论文数量
自引率X16 某期刊在一定时期内自引次数在该期刊总被引次数中所占的比例
作者来源 单一国家发文比例X17 某一国家在某期刊发文所占比例
Dimensions and Corresponding Indicators of Early Warning of International Journals
Heat Map of Index Correlations of Medical Journals
Heat Map of Index Correlation of Engineering Science and Technology Journals
医学类期刊 工程科技类期刊
指标名称 重要性系数 指标名称 重要性系数
单一国家发文占比X17 0.483 1 单一国家发文占比X17 0.258 5
自引率X16 0.126 2 影响因子X1 0.227 3
载文量X15 0.056 5 自引率X16 0.081 8
CiteScore X10 0.056 3 SJR X7 0.058 4
即年指数X5 0.046 3 CiteScore X10 0.057 2
投稿命中率X14 0.040 8 去除自引影响因子X3 0.054 3
SJR X7 0.038 0 引用半衰期X12 0.044 8
引用半衰期X12 0.029 3 5年影响因子X2 0.037 2
去除自引影响因子X3 0.027 5 载文量X15 0.033 5
影响因子X1 0.023 4 特征因子X4 0.032 8
SNIP X8 0.022 5 被引半衰期X11 0.030 5
特征因子X4 0.015 2 投稿命中率X14 0.025 5
论文影响分值X6 0.015 0 SNIP X8 0.020 9
审稿周期X13 0.009 9 审稿周期X13 0.017 7
被引半衰期X11 0.008 9 论文影响分值X6 0.009 7
标准化特征因子X9 0.001 0 即年指数X5 0.008 4
5年影响因子X2 0.000 0 标准化特征因子X9 0.001 2
The Importance Score of Each Indicator
A Framework for Identification and Early Warning of Abnormal Behavior in International Journals
期刊A 载文量 影响因子 即年指数 被引
半衰期
引用
半衰期
特征因子 自引率 SNIP CiteScore 审稿周期 投稿
命中率
单一国家占比
2013 204 1.095 0.113 4.7 7.4 0.006 2 0.042 0.65 2.0 3.91 0.605 0.429
2014 146 1.438 0.247 4.8 8.3 0.006 7 0.026 0.75 2.5 3.91 0.605 0.451
2015 89 1.431 0.303 5.2 8.4 0.006 3 0.003 0.79 3.2 3.91 0.605 0.475
2016 177 1.323 0.124 6.1 8.1 0.005 0 0.022 0.78 2.9 3.91 0.605 0.500
2017 211 1.023 0.180 7.0 8.0 0.004 0 0.015 0.61 2.0 3.91 0.605 0.527
2018 549 1.351 0.255 7.1 8.2 0.003 7 0.027 0.61 1.2 3.91 0.605 0.525
2019 655 1.287 0.188 7.1 7.6 0.004 7 0.026 0.63 1.4 3.91 0.605 0.614
2020 1 346 1.671 0.189 4.9 6.9 0.006 9 0.035 0.66 1.4 3.91 0.605 0.691
Original Indicator Matrix of Journal A
样本数据 指标 LR SVM DecisionTree RandomForest XGBoost SLX-Stacking
原始 Accuracy 0.959 0.947 0.913 0.924 0.965 0.953
Recall 0.825 0.901 0.851 0.876 0.938 0.925
F1-Score 0.955 0.941 0.901 0.916 0.962 0.949
特征筛选 Accuracy 0.947 0.924 0.913 0.913 0.959 0.947
Recall 0.925 0.876 0.851 0.888 0.938 0.925
F1-Score 0.943 0.916 0.901 0.905 0.955 0.943
特征筛选并拓展 Accuracy 0.966 0.966 0.940 0.933 0.973 0.966
Recall 0.986 0.986 0.891 0.918 0.986 0.986
F1-Score 0.966 0.966 0.936 0.931 0.973 0.966
Classification Accuracy of Early Warning Models for Medical Journals
样本数据 指标 LR SVM DecisionTree RandomForest XGBoost SLX-Stacking
原始 Accuracy 0.923 0.923 0.769 0.365 0.923 0.923
Recall 0.878 0.878 0.636 0.000 0.878 0.878
F1-Score 0.935 0.935 0.777 0.000 0.935 0.935
特征筛选 Accuracy 0.903 0.884 0.769 0.365 0.903 0.923
Recall 0.848 0.818 0.636 0.000 0.848 0.878
F1-Score 0.918 0.900 0.777 0.000 0.918 0.935
特征筛选并拓展 Accuracy 0.955 0.911 0.822 0.533 0.911 0.955
Recall 0.916 0.833 0.708 1.000 0.875 0.916
F1-Score 0.956 0.909 0.809 0.695 0.913 0.956
Classification Accuracy of Early Warning Models for Engineering Science and Technology Journals
ROC Curve of Early Warning Model for Medical Journals
ROC Curve of Early Warning Model for Engineering Science and Technology Journals
预警等级 概率 P
高风险 [80%,100%]
中风险 [60%,80%)
低风险 [40%,60%)
正常 [0%,40%)
Classification of Warning Levels
Ranking of the Importance of Features in Medical Journals
Ranking of the Importance of Features in Engineering Science and Technology Journals
[1] 关于规范高等学校SCI论文相关指标使用树立正确评价导向的若干意见的通知[EB/OL]. [2021-08-14]. http://www.govcn/zhengce/zhengceku/2020-03/03/content_5486229.htm.
[1] (Notice on Several Opinions on Regulating the Use of Indicators Related to SCI Papers in Higher Education Institutions to Establish Correct Evaluation Guidance[EB/OL]. [2021-08-14]. http://www.govcn/zhengce/zhengceku/2020-03/03/content_5486229.htm. )
[2] 国际期刊预警名单(试行)[EB/OL]. [2021-08-14]. http://www.igg.cas.cn/xwzx/kyjz/202101/t20210101_5849507.html.
[2] (International Journal Alert List (Trial) [EB/OL]. [2021-08-14]. http://www.igg.cas.cn/xwzx/kyjz/202101/t20210101_5849507.html. )
[3] 中科院《国际期刊预警名单》发布!全网最全黑名单集[EB/OL]. [2021-08-14]. http://www.360doc.com/content/21/0629/10/49059453_984304568.shtml.
[3] ( CAS “International Journal Alert List” Released! The Most Complete Set of Blacklist [EB/OL]. [2021-08-14]. http://www.360doc.com/content/21/0629/10/49059453_984304568.shtml. )
[4] 中共中央办公厅国务院办公厅印发《关于进一步加强科研诚信建设的若干意见》[EB/OL]. [2021-08-14]. http://www.gov.cn/zhengce/2018-05/30/content_5294886.htm.
[4] (General Office of the CPC Central Committee General Office of the State Council Issued“Several Opinions on Further Strengthening the Construction of Scientific Research Integrity”[EB/OL]. [2021-08-14]. http://www.gov.cn/zhengce/2018-05/30/content_5294886.htm. )
[5] Falagas M E, Alexiou V G. The Top-Ten in Journal Impact Factor Manipulation[J]. Archivum Immunologiae et Therapiae Experimentalis, 2008, 56(4):223-226.
doi: 10.1007/s00005-008-0024-5
[6] 戴琦, 袁曦临. 开放获取期刊的学术声誉风险及其预警研究[J]. 中国科技期刊研究, 2018, 29(11):1063-1071.
[6] ( Dai Qi, Yuan Xilin. Academic Reputation Risk Analysis and Early Warning Research of Open Access Journals[J]. Chinese Journal of Scientific and Technical Periodicals, 2018, 29(11):1063-1071.)
[7] 鞠秀芳, 郑彦宁, 潘云涛. 期刊引用操纵行为研究综述[J]. 西南民族大学学报(人文社会科学版), 2013, 34(4):224-228.
[7] ( Ju Xiufang, Zheng Yanning, Pan Yuntao. A Review of Journal Citation Manipulation[J]. Journal of Southwest University for Nationalities (Humanities and Social Science), 2013, 34(4):224-228.)
[8] Frandsen T F. Attracted to Open Access Journals: A Bibliometric Author Analysis in the Field of Biology[J]. Journal of Documentation, 2009, 65(1):58-82.
doi: 10.1108/00220410910926121
[9] Garfield E. Can Citation Indexing Be Automated?[J]. Statistical Association. Methods for Mechanized Documentation, 1964, 269:84-90.
[10] 陈秀娟, 彭媛媛, 陈雪飞, 等. 2016年国际开放获取期刊发展动态分析[J]. 数字图书馆论坛, 2017(9):9-15.
[10] ( Chen Xiujuan, Peng Yuanyuan, Chen Xuefei, et al. Analysis on the International Development of Open Access Journal in 2016[J]. Digital Library Forum, 2017(9):9-15.)
[11] 郭建顺, 张学东, 李文红, 等. 我国科技期刊的高自引率及其不合理自引的甄别[J]. 中国科技期刊研究, 2010, 21(4):455-458.
[11] ( Guo Jianshun, Zhang Xuedong, Li Wenhong, et al. The High Self-Citation Rate of Chinese Scientific Journals and the Screening of Their Unreasonable Self-Citations[J]. Chinese Journal of Scientific and Technical Periodicals, 2010, 21(4):455-458.)
[12] 2015年度GoOA OA期刊排行榜报告[EB/OL]. [2021-08-14]. http://gooa.las.ac.cn/document/OAjournalrankingreports2015.pdf.
[12] (2015 OA Journal Ranking Reports: From Go OA, NSL of CAS[EB/OL]. [2021-08-14]. http://gooa.las.ac.cn/document/OAjournalrankingreports2015.pdf. )
[13] Prasad R. Researchers from National Institutes Publish in Predatory Journals[EB/OL]. [2021-08-14]. http://www.thehindu.com/sci-tech/science/Researchers-from-national-institutes-publish- in-predatory-journals/article16806280.ece
[14] 杨东辉. 期刊非正常行为的模式识别[D]. 哈尔滨: 哈尔滨工业大学, 2009.
[14] ( Yang Donghui. Pattern Recognition of Journal Abnormal Self-Citaion[D]. Harbin: Harbin Institute of Technology, 2009.)
[15] 王丹, 张祥合, 赵浩宇. 对建立健全学术期刊预警制度的思考[J]. 中国传媒科技, 2019(4):18-20.
[15] ( Wang Dan, Zhang Xianghe, Zhao Haoyu. Reflections on the Establishment of a Sound Early Warning System for Academic Journals[J]. Science & Technology for China’s Mass Media, 2019(4):18-20.)
[16] 江晓原, 穆蕴秋. 警惕国外“开放存取杂志”及“掠夺性期刊”的陷阱[J]. 出版发行研究, 2017(8):70-71.
[16] ( Jiang Xiaoyuan, Mu Yunqiu. Be Wary of the Pitfalls of Foreign “Open Access Journals” and “Predatory Journals”[J]. Publishing Research, 2017(8):70-71.)
[17] 苏金燕. 人文社会科学期刊评价中同行评议与影响因子相关性分析[J]. 图书情报知识, 2020(3):128-137.
[17] ( Su Jinyan. Correlation Analysis on Peer Review and Impact Factor of Journal Evaluation in Humanities and Social Science[J]. Documentation, Information & Knowledge, 2020(3):128-137.)
[18] 程维红, 任胜利. 世界主要国家SCI论文的OA发表费用调查[J]. 科学通报, 2016, 61(26):2861-2868.
[18] ( Cheng Weihong, Ren Shengli. Investigation on Article Processing Charge for OA Papers from the World’s Major Countries[J]. Chinese Science Bulletin, 2016, 61(26):2861-2868.)
[19] 刘意, 文庭孝. 学术迹在期刊评价中的应用及影响因素研究[J]. 情报资料工作, 2019, 40(4):69-76.
[19] ( Liu Yi, Wen Tingxiao. Research on the Influence Factors and Application of Academic Trace in Journal Evaluation[J]. Information and Documentation Services, 2019, 40(4):69-76.)
[20] 王志娟, 姚亚楠, 杨克魁. 基于因子分析法的科技期刊学术影响力综合评价及发展建议——以广东省医药卫生期刊为统计源[J]. 中国科技期刊研究, 2018, 29(10):1036-1041.
[20] ( Wang Zhijuan, Yao Yanan, Yang Kekui. Comprehensive Evaluation of Academic Influence of Scientific Journals Based on Factor Analysis and Corresponding Development Suggestion: Taking Medical Journals in Guangdong Province as Examples[J]. Chinese Journal of Scientific and Technical Periodicals, 2018, 29(10):1036-1041.)
[21] 程维红, 任胜利. 国外学术期刊OA出版论文处理费(APC)调查[J]. 编辑学报, 2017, 29(2):192-195.
[21] ( Cheng Weihong, Ren Shengli. Investigation on Article Processing Charges for OA Papers in Foreign Countries[J]. Acta Editologica, 2017, 29(2):192-195.)
[1] Wang Ruojia, Yan Chengxi, Guo Fengying, Wang Jimin. Predicting Churners of Online Health Communities Based on the User Persona[J]. 数据分析与知识发现, 2022, 6(2/3): 80-92.
[2] Hu Yamin, Wu Xiaoyan, Chen Fang. Review of Technology Term Recognition Studies Based on Machine Learning[J]. 数据分析与知识发现, 2022, 6(2/3): 7-17.
[3] Che Hongxin,Wang Tong,Wang Wei. Comparing Prediction Models for Prostate Cancer[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[4] Wang Hanxue,Cui Wenjuan,Zhou Yuanchun,Du Yi. Identifying Pathogens of Foodborne Diseases with Machine Learning[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[5] Chen Donghua,Zhao Hongmei,Shang Xiaopu,Zhang Runtong. Optimizing Large Hospital Operating Rooms with Data Analytics[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[6] Su Qiang, Hou Xiaoli, Zou Ni. Predicting Surgical Infections Based on Machine Learning[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[7] Cao Rui,Liao Bin,Li Min,Sun Ruina. Predicting Prices and Analyzing Features of Online Short-Term Rentals Based on XGBoost[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[8] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[9] Xiang Zhuoyuan,Liu Zhicong,Wu Yu. Adaptive Recommendation Model Based on User Behaviors[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[10] Liang Jiaming, Zhao Jie, Zheng Peng, Huang Liushen, Ye Minqi, Dong Zhenning. Framework for Computing Trust in Online Short-Rent Platform Using Feature Selection of Images and Texts[J]. 数据分析与知识发现, 2021, 5(2): 129-140.
[11] Zhou Zhichao. Review of Automatic Citation Classification Based on Machine Learning[J]. 数据分析与知识发现, 2021, 5(12): 14-24.
[12] Chai Guorong,Wang Bin,Sha Yongzhong. Public Health Risk Forecasting with Multiple Machine Learning Methods Combined:Case Study of Influenza Forecasting in Lanzhou, China[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
[13] Chen Dong,Wang Jiandong,Li Huiying,Cai Sihang,Huang Qianqian,Yi Chengqi,Cao Pan. Forecasting Poultry Turnovers with Machine Learning and Multiple Factors[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[14] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[15] Yang Heng,Wang Sili,Zhu Zhongming,Liu Wei,Wang Nan. Recommending Domain Knowledge Based on Parallel Collaborative Filtering Algorithm[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn