国际期刊异常行为的自动识别与预警研究<sup>*</sup>

doi:10.11925/infotech.2096-3467.2021.0949

数据分析与知识发现

2022, Vol. 6

Issue (2/3): 385-395 https://doi.org/10.11925/infotech.2096-3467.2021.0949

专辑

本期目录 | 过刊浏览 | 高级检索

国际期刊异常行为的自动识别与预警研究^*

吴金红(

),穆克亮(

)

武汉纺织大学管理学院武汉 430200

Automatic Identifying Abnormal Behaviors of International Journals

Wu Jinhong(

),Mu Keliang(

)

School of Management, Wuhan Textile University, Wuhan 430200, China

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF (1792 KB) HTML ( 14 )
输出: BibTeX | EndNote (RIS)

摘要

【目的】 通过基于机器学习的模型计算国际期刊的预警值,预测国际期刊质量变化趋势,为提醒科研人员审慎选择成果发表平台,帮助有关决策部门审查期刊质量提供智能化手段。【方法】 构建期刊影响力强度、期刊影响力时效性、期刊特性、作者来源等4个维度的预警期刊指标体系,采用Pearson相关系数与XGBoost特征重要值相结合的方法进行特征的筛选,并对筛选后的特征进行时序性特征拓展,考虑学科差异性,在以医学类、工程科技类期刊为代表的标注数据集上通过XGBoost、SVM、逻辑回归以及Stacking融合等模型实现国际期刊异常行为识别和比较,最后基于XGBoost信息增益得到特征重要性排序。【结果】 在医学类、工程科技类期刊上三种样本方案的研究结果表明,特征筛选后虽然会提升模型泛化性,但会轻微降低预警性能;特征筛选并拓展后能够提高期刊预警模型精度;自引率和投稿命中率等指标对模型具有较大贡献。【局限】 限于数据实际获取情况,涉及学科范围较小且训练数据偏少,未加入论文处理费相关的期刊特征。【结论】 构建的国际期刊异常行为预警模型适用于多学科环境,可以辅助机构和专家进行更有针对性的预警决策,提供了一种新的期刊质量管理方法。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	吴金红
	穆克亮

关键词 ：国际期刊预警, 特征选择, 指标体系, 机器学习

Abstract：

[Objective] This paper creates an early warning mechanism for international journals, aiming to predict their quality changes and help researchers choose better publishing platforms. [Methods] We constructed an early-warning index system for scholarly journals with their impact strength, influencing timeline, characteristics, and author demographics. Then, we combined Pearson correlation coefficient and the important values of XGBoost to select features. Third, we analyzed the features with XGBoost, SVM, logistic regression, and Stacking fusion to identify the abnormal behaviors. Finally, we ranked these features with XGBoost information gain. [Results] We examined our method with three sample datasets from medical and scientific journals. The generalization of the model could be improved with feature screening, which could also slightly reduce the early warning performance. Feature screening and expansion could improve the accuracy of the early warning model. The self-citation and submission acceptance rates play significant roles for the model. [Limitations] Due to the actual acquisition of data, the range of disciplines involved is small and the training data is small, and journal features related to article processing charge are not included. [Conclusions] The proposed model could help institutions and researchers improve decision making on the quality of international journals.

Key words： Early Warning of International Journals Feature Selection Indicator System Machine Learning

收稿日期: 2021-08-31 出版日期: 2022-04-14

ZTFLH:

TP18

基金资助:* 2020年度湖北省社会科学基金前期资助项目的研究成果之一(20ZD053)

通讯作者: 吴金红,ORCID：0000-0003-2903-6372,穆克亮,ORCID：0000-0002-6264-9732 E-mail: 14514576@qq.com;704266922@qq.com

引用本文:

吴金红, 穆克亮. 国际期刊异常行为的自动识别与预警研究^*[J]. 数据分析与知识发现, 2022, 6(2/3): 385-395.
Wu Jinhong, Mu Keliang. Automatic Identifying Abnormal Behaviors of International Journals. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 385-395.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.0949 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I2/3/385

Table 1 国际期刊预警的维度及相应指标

Fig.1 医学类期刊指标相关性热力图

Fig.2 工程科技类期刊指标相关性热力图

Table 2 各指标重要性分数值

Fig.3 国际期刊异常行为的识别与预警框架

Table 3 期刊A原始指标矩阵

Table 4 医学类期刊预警模型分类精度对比

Table 5 工程科技类期刊预警模型分类精度对比

Fig.4 医学类期刊预警模型ROC曲线

Fig.5 工程科技类期刊预警模型ROC曲线

预警等级	概率 $P$
高风险	[80%,100%]
中风险	[60%,80%）
低风险	[40%,60%）
正常	[0%,40%）

Table 6 预警等级划分

Fig.6 医学类期刊特征重要性排序

Fig.7 工程科技类期刊特征重要性排序

[1]	关于规范高等学校SCI论文相关指标使用树立正确评价导向的若干意见的通知[EB/OL]. [2021-08-14]. http://www.govcn/zhengce/zhengceku/2020-03/03/content_5486229.htm.
[1]	(Notice on Several Opinions on Regulating the Use of Indicators Related to SCI Papers in Higher Education Institutions to Establish Correct Evaluation Guidance[EB/OL]. [2021-08-14]. http://www.govcn/zhengce/zhengceku/2020-03/03/content_5486229.htm. )
[2]	国际期刊预警名单(试行)[EB/OL]. [2021-08-14]. http://www.igg.cas.cn/xwzx/kyjz/202101/t20210101_5849507.html.
[2]	(International Journal Alert List (Trial) [EB/OL]. [2021-08-14]. http://www.igg.cas.cn/xwzx/kyjz/202101/t20210101_5849507.html. )
[3]	中科院《国际期刊预警名单》发布!全网最全黑名单集[EB/OL]. [2021-08-14]. http://www.360doc.com/content/21/0629/10/49059453_984304568.shtml.
[3]	( CAS “International Journal Alert List” Released! The Most Complete Set of Blacklist [EB/OL]. [2021-08-14]. http://www.360doc.com/content/21/0629/10/49059453_984304568.shtml. )
[4]	中共中央办公厅国务院办公厅印发《关于进一步加强科研诚信建设的若干意见》[EB/OL]. [2021-08-14]. http://www.gov.cn/zhengce/2018-05/30/content_5294886.htm.
[4]	(General Office of the CPC Central Committee General Office of the State Council Issued“Several Opinions on Further Strengthening the Construction of Scientific Research Integrity”[EB/OL]. [2021-08-14]. http://www.gov.cn/zhengce/2018-05/30/content_5294886.htm. )
[5]	Falagas M E, Alexiou V G. The Top-Ten in Journal Impact Factor Manipulation[J]. Archivum Immunologiae et Therapiae Experimentalis, 2008, 56(4):223-226. doi: 10.1007/s00005-008-0024-5
[6]	戴琦, 袁曦临. 开放获取期刊的学术声誉风险及其预警研究[J]. 中国科技期刊研究, 2018, 29(11):1063-1071.
[6]	( Dai Qi, Yuan Xilin. Academic Reputation Risk Analysis and Early Warning Research of Open Access Journals[J]. Chinese Journal of Scientific and Technical Periodicals, 2018, 29(11):1063-1071.)
[7]	鞠秀芳, 郑彦宁, 潘云涛. 期刊引用操纵行为研究综述[J]. 西南民族大学学报(人文社会科学版), 2013, 34(4):224-228.
[7]	( Ju Xiufang, Zheng Yanning, Pan Yuntao. A Review of Journal Citation Manipulation[J]. Journal of Southwest University for Nationalities (Humanities and Social Science), 2013, 34(4):224-228.)
[8]	Frandsen T F. Attracted to Open Access Journals: A Bibliometric Author Analysis in the Field of Biology[J]. Journal of Documentation, 2009, 65(1):58-82. doi: 10.1108/00220410910926121
[9]	Garfield E. Can Citation Indexing Be Automated?[J]. Statistical Association. Methods for Mechanized Documentation, 1964, 269:84-90.
[10]	陈秀娟, 彭媛媛, 陈雪飞, 等. 2016年国际开放获取期刊发展动态分析[J]. 数字图书馆论坛, 2017(9):9-15.
[10]	( Chen Xiujuan, Peng Yuanyuan, Chen Xuefei, et al. Analysis on the International Development of Open Access Journal in 2016[J]. Digital Library Forum, 2017(9):9-15.)
[11]	郭建顺, 张学东, 李文红, 等. 我国科技期刊的高自引率及其不合理自引的甄别[J]. 中国科技期刊研究, 2010, 21(4):455-458.
[11]	( Guo Jianshun, Zhang Xuedong, Li Wenhong, et al. The High Self-Citation Rate of Chinese Scientific Journals and the Screening of Their Unreasonable Self-Citations[J]. Chinese Journal of Scientific and Technical Periodicals, 2010, 21(4):455-458.)
[12]	2015年度GoOA OA期刊排行榜报告[EB/OL]. [2021-08-14]. http://gooa.las.ac.cn/document/OAjournalrankingreports2015.pdf.
[12]	(2015 OA Journal Ranking Reports: From Go OA, NSL of CAS[EB/OL]. [2021-08-14]. http://gooa.las.ac.cn/document/OAjournalrankingreports2015.pdf. )
[13]	Prasad R. Researchers from National Institutes Publish in Predatory Journals[EB/OL]. [2021-08-14]. http://www.thehindu.com/sci-tech/science/Researchers-from-national-institutes-publish- in-predatory-journals/article16806280.ece
[14]	杨东辉. 期刊非正常行为的模式识别[D]. 哈尔滨: 哈尔滨工业大学, 2009.
[14]	( Yang Donghui. Pattern Recognition of Journal Abnormal Self-Citaion[D]. Harbin: Harbin Institute of Technology, 2009.)
[15]	王丹, 张祥合, 赵浩宇. 对建立健全学术期刊预警制度的思考[J]. 中国传媒科技, 2019(4):18-20.
[15]	( Wang Dan, Zhang Xianghe, Zhao Haoyu. Reflections on the Establishment of a Sound Early Warning System for Academic Journals[J]. Science & Technology for China’s Mass Media, 2019(4):18-20.)
[16]	江晓原, 穆蕴秋. 警惕国外“开放存取杂志”及“掠夺性期刊”的陷阱[J]. 出版发行研究, 2017(8):70-71.
[16]	( Jiang Xiaoyuan, Mu Yunqiu. Be Wary of the Pitfalls of Foreign “Open Access Journals” and “Predatory Journals”[J]. Publishing Research, 2017(8):70-71.)
[17]	苏金燕. 人文社会科学期刊评价中同行评议与影响因子相关性分析[J]. 图书情报知识, 2020(3):128-137.
[17]	( Su Jinyan. Correlation Analysis on Peer Review and Impact Factor of Journal Evaluation in Humanities and Social Science[J]. Documentation, Information & Knowledge, 2020(3):128-137.)
[18]	程维红, 任胜利. 世界主要国家SCI论文的OA发表费用调查[J]. 科学通报, 2016, 61(26):2861-2868.
[18]	( Cheng Weihong, Ren Shengli. Investigation on Article Processing Charge for OA Papers from the World’s Major Countries[J]. Chinese Science Bulletin, 2016, 61(26):2861-2868.)
[19]	刘意, 文庭孝. 学术迹在期刊评价中的应用及影响因素研究[J]. 情报资料工作, 2019, 40(4):69-76.
[19]	( Liu Yi, Wen Tingxiao. Research on the Influence Factors and Application of Academic Trace in Journal Evaluation[J]. Information and Documentation Services, 2019, 40(4):69-76.)
[20]	王志娟, 姚亚楠, 杨克魁. 基于因子分析法的科技期刊学术影响力综合评价及发展建议——以广东省医药卫生期刊为统计源[J]. 中国科技期刊研究, 2018, 29(10):1036-1041.
[20]	( Wang Zhijuan, Yao Yanan, Yang Kekui. Comprehensive Evaluation of Academic Influence of Scientific Journals Based on Factor Analysis and Corresponding Development Suggestion: Taking Medical Journals in Guangdong Province as Examples[J]. Chinese Journal of Scientific and Technical Periodicals, 2018, 29(10):1036-1041.)
[21]	程维红, 任胜利. 国外学术期刊OA出版论文处理费(APC)调查[J]. 编辑学报, 2017, 29(2):192-195.
[21]	( Cheng Weihong, Ren Shengli. Investigation on Article Processing Charges for OA Papers in Foreign Countries[J]. Acta Editologica, 2017, 29(2):192-195.)

[1]	王若佳, 严承希, 郭凤英, 王继民. 基于用户画像的在线健康社区用户流失预测研究^*[J]. 数据分析与知识发现, 2022, 6(2/3): 80-92.
[2]	胡雅敏, 吴晓燕, 陈方. 基于机器学习的技术术语识别研究综述[J]. 数据分析与知识发现, 2022, 6(2/3): 7-17.
[3]	车宏鑫,王桐,王伟. 前列腺癌预测模型对比研究*[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[4]	王寒雪,崔文娟,周园春,杜一. 基于机器学习的食源性疾病致病菌识别方法*[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[5]	陈东华,赵红梅,尚小溥,张润彤. 数据驱动的大型医院手术室运营预测与优化方法研究*[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[6]	苏强, 侯校理, 邹妮. 基于机器学习组合优化方法的术后感染预测模型研究^*[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[7]	曹睿,廖彬,李敏,孙瑞娜. 基于XGBoost的在线短租市场价格预测及特征分析模型^*[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[8]	钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述^*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[9]	向卓元,刘志聪,吴玉. 基于用户行为自适应推荐模型研究 ^*[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[10]	梁家铭, 赵洁, 郑鹏, 黄流深, 叶敏祺, 董振宁. 特征选择下融合图像和文本分析的在线短租平台信任计算框架 ^*[J]. 数据分析与知识发现, 2021, 5(2): 129-140.
[11]	周志超. 基于机器学习技术的自动引文分类研究综述^*[J]. 数据分析与知识发现, 2021, 5(12): 14-24.
[12]	柴国荣,王斌,沙勇忠. 基于多机器学习方法联合的公共卫生风险预测研究——以兰州市流感预测为例*[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
[13]	陈东,王建冬,李慧颖,蔡思航,黄倩倩,易成岐,曹攀. 融合机器学习算法和多因素的禽肉交易量预测方法研究 ^*[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[14]	梁野,李小元,许航,胡伊然. CLOpin:一种面向舆情分析与预警领域的跨语言知识图谱架构*[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[15]	杨恒,王思丽,祝忠明,刘巍,王楠. 基于并行协同过滤算法的领域知识推荐模型研究*[J]. 数据分析与知识发现, 2020, 4(6): 15-21.

Viewed

Full text

Abstract

Cited

Shared

Discussed