Key Laboratory of Rich-media Knowledge Organization and Service of Digital Publishing Content, Beijing 100038, China;Key Laboratory of Rich-media Knowledge Organization and Service of Digital Publishing Content, Beijing 100038, China
[Objective] This paper proposes a new classficiation method based on Convolutional Neural Network(CNN), aiming to improve the indexing accuracy of the skewed datasets.[Methods] Compared with stacking fusion methods, we stacked each base model’s distribution information of the classification label probabilities as CNN inputs. Our method does not need to manually set the weight for each base model. We examined the proposed model with the third-level categories of the Chinese Library Classification (CLC).[Results] The accuracy of our method was upto 60%, which was 19% higher than the performance of baselinemodels.[Limitations] Our method needs to design convolution kernels, which can only be determined with experiments. Meanwhile, the complexity of classifier training at the fusion stage depends on the number of categories and base models.[Conclusions] The porposed method can effectively improve the indexing accuracy of imbalanced datasets. With the help of hierarchical classification strategy, it can automatically finish classification and indexing tasks of CLC.
( He Lin, Liu Jing, Hou Hanqing. Analysis of Influential Factors of Multi-layered Automatic Classification Based on Chinese Library Classification[J]. Journal of Library Science in China, 2009,35(6):49-55.)
( He Lin, Liu Jing, Hou Hanqing. Multi-level Automatic Classification Based on the Combination of Indexing Experience and Machine Learning[J]. Journal of the China Society for Scientific and Technical Information, 2006,26(4):725-729.)
( Li Yanxia, Chai Yi, Hu Youqiang, et al. Review of Imbalanced Data Classification Methods[J]. Control and Decision, 2019,34(4):673-688.)
[4]
Galar M, Fernandez A, Barrenechea E, et al. A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches[J]. IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), 2012,42(4):463-484.
[5]
Somasundaram A, Reddy S. Modelling a Stable Classifier for Handling Large Scale Data with Noise and Imbalance[C] //Proceedings of the 2017 International Conference on Computational Intelligence in Data Science. 2017: 1-6.
[6]
Wei Y Y, Li T S, Ge Z H. Combining Distributed Classifies by Stacking[C] //Proceedings of the 3rd International Conference on Genetic and Evolutionary Computing. 2009: 418-421.
[7]
Yan J, Han S. Classifying Imbalanced Data Sets by a Novel RE-sample and Cost-sensitive Stacked Generalization Method[J]. Mathematical Problems in Engineering, DOI: 10.1155/2018/5036710.
pmid: 29578548
( Guo Limin, Liu Wei, Wu Peijuan, et al. Machine Learning and Its Application in Library:Take TensorFlow as an Example[J]. Journal of Academic Libraries, 2017,35(6):31-40.)
[9]
郭利敏. 基于卷积神经网络的文献自动分类研究[J]. 图书与情报, 2017(6):96-103.
[9]
( Guo Limin. Study of Automatic Classification of Literature Based on Convolution Neural Network[J]. Library & Information, 2017(6):96-103.)
[10]
张玉芳. 基于知识库的多层次文本自动分类研究[D]. 南京:南京理工大学, 2014.
[10]
( Zhang Yufang. The Research of Hierarchical Automatic Text Classification Based on the Knowledge Database[D]. Nanjing:Nanjing University of Science and Technology, 2014.)
[11]
Wolpert D. Stacked Generalization[J]. Neural Networks, 1992,5(2):241-260.
[12]
Ting K M, Witten I H. Issues in Stacked Generalization[J]. Journal of Artificial Intelligence Research, 1999,10(1):271-289.
[13]
Xiang Y, Xie Y P. Imbalanced Data Classification Method Based on Ensemble Learning[A]//Communications, Signal Processing, and Systems[M]. Berlin, German:Springer, 2018: 18-24.
[14]
Tsoumakas G, Vlahavas I. Distributed Data Mining of Large Classifier Ensembles[C] // Proceedings of the 2nd Hellenic Conference on AI. 2002: 249-256.
[15]
Yoon K. Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint, arXiv: 1408. 5882.
( Tu Manshu, Pan Jielin. How Features Transferred in Very Deep Neural Networks on Cross Domain Sentiment Classification[J]. Technology Intelligence Engineering, 2018,4(6):13-24.)
( Zhai Wenjie, Yan Yan, Zhang Bowen, et al. A Model for Text Representation and Classification Based on Hybrid Deep Belief Networks[J]. Technology Intelligence Engineering, 2016,2(5):30-40.)
[18]
Ran Y X, Han H Q, Zhang Y L, et al. Hierarchical Classification Algorithm Based on FastText[C] //Proceedings of the 7th International Conference on Computational and Information Sciences. 2019: 909-916.