Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (5): 38-45    DOI: 10.11925/infotech.2096-3467.2020.0201
Automatic Data Processing Strategy of Citation Anomie Based on Feature Fusion
Li Junlian1,2,3(),Wu Yingjie3,Deng Panpan3,Leng Fuhai4
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China
2Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
3Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing 100020, China
4Institute of Science and Development, Chinese Academy of Sciences, Beijing 100190, China
[Objective] To normalize different expressions of the same citation document, realize standard control and management of periodical citation data, and alleviate the data quality problems caused by citation anomie.[Methods] Taking the construction of the periodical citation database as the target scenario, the core characteristics of periodical citation data were analyzed according to the reference standards. The subsets of effective features were obtained based on the decision tree and accuracy, the execution priority of decision rules was specified and an automatic data processing strategy was constructed based on multi-feature fusion.[Results] 10,000 periodical citation sample data and 10,000 validation data sets were selected from the Chinese Biomedical Citation Index (CBMCI) for the experiment. The results show that our proposed feature fusion approach achieved 99.72% and 98.70% accuracy of the journal citation normalization on these two datasets, respectively.[Limitations] This article only explored the Chinese periodical citation anomie data and has not yet covered the citations of other languages and types.[Conclusions] The proposed method could automatically standardize large-scale journal citation data with high efficiency, thus reduce the burden of labor-intensive manual intervention. The idea of feature fusion can be also applied to the automatic normalization strategies of other types of citation documents.

Key wordsCitation Data      Citation Anomie      Standard Control      Feature Fusion     
Received: 16 March 2020      Published: 15 June 2020
ZTFLH:  TP391  
Li Junlian

Li Junlian,Wu Yingjie,Deng Panpan,Leng Fuhai. Automatic Data Processing Strategy of Citation Anomie Based on Feature Fusion. Data Analysis and Knowledge Discovery, 2020, 4(5): 38-45.

Data Automatic Processing Strategy of Citation Anomie Based on Feature Fusion
Decision Tree of Effective Feature Subset
决策规则 有效特征子集 特征数 Pr
Rule_1 ta,firstauthor,vi,dp,pg_start 5 0.94
Rule_2 ta,vi,ip,dp,pg_start 5 0.94
Rule_3 ta,firstauthor,vi,ip,dp 5 0.94
Rule_4 ta,firstauthor,ip,dp,pg_start 5 0.93
Rule_5 ta,firstauthor,vi,ip,pg_start 5 0.93
Rule_6 ti_format,ta,vi,ip,dp 5 0.91
Rule_7 ta,firstauthor,ip,dp 4 0.95
Rule_8 ta,ip,dp,pg_start 4 0.94
Rule_9 ta,vi,ip,pg_start 4 0.94
Rule_10 ta,firstauthor,dp,pg_start 4 0.94
Rule_11 ta,firstauthor,vi,pg_start 4 0.94
Rule_12 ta,firstauthor,vi,ip 4 0.94
Rule_13 ti_format,ta,firstauthor,dp 4 0.91
Rule_14 ti_format,firstauthor,dp,pg_start 4 0.90
Rule_15 ti_format,ta,dp,pg_start 4 0.90
Rule_16 firstauthor,dp,pg_start 3 0.96
Rule_17 firstauthor,vi,pg_start 3 0.95
Rule_18 ta,firstauthor,pg_start 3 0.95
Rule_19 ti_format,firstauthor,dp 3 0.92
Rule_20 ti_format,ta,dp 3 0.92
Rule_21 ti_format,dp,pg_start 3 0.91
Rule_22 ti_format,ta,firstauthor 3 0.91
Decision Rules of Journal Citation Standardization
数据 规模(条) 准确率AC
样本数据集 10 000 99.72%
验证数据集 10 000 98.70%
Results of Citation Standardization
