Key Laboratory of Rich-media Knowledge Organization and Service of Digital Publishing Content, Beijing 100038, China;Key Laboratory of Rich-media Knowledge Organization and Service of Digital Publishing Content, Beijing 100038, China
[Objective] This paper proposes a new classficiation method based on Convolutional Neural Network(CNN), aiming to improve the indexing accuracy of the skewed datasets.[Methods] Compared with stacking fusion methods, we stacked each base model’s distribution information of the classification label probabilities as CNN inputs. Our method does not need to manually set the weight for each base model. We examined the proposed model with the third-level categories of the Chinese Library Classification (CLC).[Results] The accuracy of our method was upto 60%, which was 19% higher than the performance of baselinemodels.[Limitations] Our method needs to design convolution kernels, which can only be determined with experiments. Meanwhile, the complexity of classifier training at the fusion stage depends on the number of categories and base models.[Conclusions] The porposed method can effectively improve the indexing accuracy of imbalanced datasets. With the help of hierarchical classification strategy, it can automatically finish classification and indexing tasks of CLC.
( He Lin, Liu Jing, Hou Hanqing. Analysis of Influential Factors of Multi-layered Automatic Classification Based on Chinese Library Classification[J]. Journal of Library Science in China, 2009,35(6):49-55.)
( He Lin, Liu Jing, Hou Hanqing. Multi-level Automatic Classification Based on the Combination of Indexing Experience and Machine Learning[J]. Journal of the China Society for Scientific and Technical Information, 2006,26(4):725-729.)
( Li Yanxia, Chai Yi, Hu Youqiang, et al. Review of Imbalanced Data Classification Methods[J]. Control and Decision, 2019,34(4):673-688.)
Galar M, Fernandez A, Barrenechea E, et al. A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches[J]. IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), 2012,42(4):463-484.
Somasundaram A, Reddy S. Modelling a Stable Classifier for Handling Large Scale Data with Noise and Imbalance[C] //Proceedings of the 2017 International Conference on Computational Intelligence in Data Science. 2017: 1-6.
Wei Y Y, Li T S, Ge Z H. Combining Distributed Classifies by Stacking[C] //Proceedings of the 3rd International Conference on Genetic and Evolutionary Computing. 2009: 418-421.
Yan J, Han S. Classifying Imbalanced Data Sets by a Novel RE-sample and Cost-sensitive Stacked Generalization Method[J]. Mathematical Problems in Engineering, DOI: 10.1155/2018/5036710.
( Zhai Wenjie, Yan Yan, Zhang Bowen, et al. A Model for Text Representation and Classification Based on Hybrid Deep Belief Networks[J]. Technology Intelligence Engineering, 2016,2(5):30-40.)
Ran Y X, Han H Q, Zhang Y L, et al. Hierarchical Classification Algorithm Based on FastText[C] //Proceedings of the 7th International Conference on Computational and Information Sciences. 2019: 909-916.