Automatic Data Processing Strategy of Citation Anomie Based on Feature Fusion
Li Junlian1,2,3(),Wu Yingjie3,Deng Panpan3,Leng Fuhai4
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China 2Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China 3Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing 100020, China 4Institute of Science and Development, Chinese Academy of Sciences, Beijing 100190, China
[Objective] To normalize different expressions of the same citation document, realize standard control and management of periodical citation data, and alleviate the data quality problems caused by citation anomie.[Methods] Taking the construction of the periodical citation database as the target scenario, the core characteristics of periodical citation data were analyzed according to the reference standards. The subsets of effective features were obtained based on the decision tree and accuracy, the execution priority of decision rules was specified and an automatic data processing strategy was constructed based on multi-feature fusion.[Results] 10,000 periodical citation sample data and 10,000 validation data sets were selected from the Chinese Biomedical Citation Index (CBMCI) for the experiment. The results show that our proposed feature fusion approach achieved 99.72% and 98.70% accuracy of the journal citation normalization on these two datasets, respectively.[Limitations] This article only explored the Chinese periodical citation anomie data and has not yet covered the citations of other languages and types.[Conclusions] The proposed method could automatically standardize large-scale journal citation data with high efficiency, thus reduce the burden of labor-intensive manual intervention. The idea of feature fusion can be also applied to the automatic normalization strategies of other types of citation documents.
李军莲,吴英杰,邓盼盼,冷伏海. 基于特征融合的引文失范数据自动处理策略研究*[J]. 数据分析与知识发现, 2020, 4(5): 38-45.
Li Junlian,Wu Yingjie,Deng Panpan,Leng Fuhai. Automatic Data Processing Strategy of Citation Anomie Based on Feature Fusion. Data Analysis and Knowledge Discovery, 2020, 4(5): 38-45.
( General Administration of Quality Supervision, Inspection and Quarantine of the People’s Republic of China, Standardization Administration of the People’s Republic of China. GB/T 7714-2005 Descriptive Rules for Bibliographic References[S]. Beijing: Standards Press of China, 2005.)
( General Administration of Quality Supervision, Inspection and Quarantine of the People’s Republic of China, Standardization Administration of the People’s Republic of China. GB/T 7714-2015 Information and Documentation- rules for Bibliographic References and Citations to Information Resources [S]. Beijing: Standards Press of China, 2015.)
[3]
刘应竹. 学术论文中的引文失范问题刍议[J]. 编辑学报, 2014,26(1):7-9.
[3]
( Liu Yingzhu. Citation Anomie in Academic Papers[J]. Acta Editologica, 2014,26(1):7-9.)
[4]
胡玥. 引文统计分析中引文规范化问题分析研究[J].图书与情报, 2013(6):84-88.
[4]
( Hu Yue. Study of Citation Standard in Citation Analysis[J]. Library & Information, 2013(6):84-88.)
( Zhao Ping, Xu Ping. The Problems and Suggestions of Affecting the CSTPC Retrieving Efficiency[J]. New Technology of Library and Information Service, 1999(4):35-36, 66.)
[6]
苏新宁. 引文索引数据质量控制研究[J]. 中国图书馆学报, 2001,27(2):76-78.
[6]
( Su Xinning. Quality Control of Data in Citation Indexes[J]. Journal of the Library Science in China, 2001,27(2):76-78.)
( Wang Lingyun. An Empirical Study on Data Quality Problems of CSSCI Cited Documents: Taking the Cited Data of Library and Information Work from 2007 to 2016 as an Example[J]. Journal of Library and Information Science, 2019,4(8):64-70.)
( General Administration of Quality Supervision, Inspection and Quarantine of the People’s Republic of China, Standardization Administration of the People’s Republic of China. GB/T 36067-2018 Information and Documentation-Specification for Data Processing of Citation Databases[S]. Beijing: Standards Press of China, 2018.)
( Ren Huiling, Yang Bin, Huang Lihui, et al. Study on Work Flow and Technology of Processing of Foreign Medical Journals Citation Data in NSTL Database of International Science Citation[J]. Journal of Medical Informatics, 2009,30(3):19-21.)
( Zeng Hongying. Discussion on the Regular Expression-Based Reference Format Verification Technology[J]. Journal of Library and Information Sciences in Agriculture, 2014,26(8):138-140.)
( Wang Shanshan, Chen Chen, Xiao Ming. Design and Implementation of Ontology-based Citation Knowledge Service Prototype System[J]. Library and Information Service, 2019,63(2):132-143.)
( Xian Guojian, Zhao Ruixue, Jin Chen. Study and Practice on Automatically Splitting of NSTL’s Foreign Journals’ Citation Data[J]. Digital Library Forum, 2010(10):91-95.)
[15]
祝清松, 冷伏海. 引文类型识别研究进展[J].图书情报知识, 2013(6):70-76.
[15]
( Zhu Qingsong, Leng Fuhai. Review of Citation Type Recognition[J]. Document, Information & Knowledge, 2013(6):70-76.)
( Jiang Lin, Wang Dongbo. Automatically Detecting and Tagging Foreign Language Citation Metadata[J]. Data Analysis and Knowledge Discovery, 2017,1(1):47-54.)
[17]
Brennan D. Simple Export of Journal Citation Data to Excel Using Any Reference Manager[J]. Journal of the Medical Library Association, 2016,104(1):72-75.
[18]
Falagas M E, Pitsouni E I, Malietzis G A, et al. Comparison of PubMed, Scopus, Web of Science, and Google Scholar: Strengths and Weaknesses[J]. FASEB Journal, 2008,22(2):338-342.
[19]
Adriaanse L S, Rensleigh C. Web of Science, Scopus and Google Scholar a Content Comprehensiveness Comparison[J]. The Electronic Library, 2013,31(6):727-744.