[Objective] This study constructs a hierarchical multi-label classification model for children's literature, aiming to realize the automatic classification of children's books, guiding young readers to select books suitable for their development needs. [Methods] We materialized the concept of graded reading into a hierarchical classification label system for children's literature. Then, we built ERNIE-HAM model using deep learning techniques and applied it to the hierarchical multi-label text classification system. [Results] Compared with the four pre-training models, the ERNIE-HAM model performed well in the second and third hierarchical classification levels for children's books. Compared to the single-level algorithm, the hierarchical algorithm improved the values for the second and third levels by about 11%. Compared to the two hierarchical multi-label classification models, HFT-CNN and HMCN, the ERNIE-HAM model improved the third level by 12.79% and 6.48% in the classification results, respectively. [Limitations] The overall classification performance of the proposed model can be further improved, and future work should focus on expanding the dataset and refining the algorithm design. [Conclusions] The ERNIE-HAM model is effective in the hierarchical multi-label classification for children's literature.
(The Working Group of National Reading Survey of Chinese Academy of Press & Publications. The Main Findings of the 18th National Reading Survey[J]. Publishing Research, 2021(4): 19-24.)
(Ma Xiaocui, Bu Lu. Research on the Theory and Practice of Children's Book Graded Reading[J]. Library Science Research & Work, 2020(9): 50-53, 63.)
[4]
McGeown S P, Osborne C, Warhurst A, et al. Understanding Children's Reading Activities: Reading Motivation, Skill and Child Characteristics as Predictors[J]. Journal of Research in Reading, 2016, 39(1): 109-125.
doi: 10.1111/jrir.v39.1
(Zhang Xiaoqin, Li Xiaoying, Wang Hao. A Survey of Children's Reading Status and Graded Reading Needs in Nanjing[J]. Library Theory and Practice, 2019(8): 74-78.)
(Wang Hao, Yan Ming, Su Xinning. Research on Automatic Classification for Chinese Bibliography Based on Machine Learning[J]. Journal of Library Science in China, 2010, 36(6): 28-39.)
(Pan Hui. Automated Book Information Classification Technology Based on Extreme Learning Machine[J]. Modern Electronics Technique, 2019, 42(17): 183-186.)
(Deng Sanhong, Fu Yuyangzi, Wang Hao. Multi-Label Classification of Chinese Books with LSTM Model[J]. Data Analysis and Knowledge Discovery, 2017, 1(7): 52-60.)
(Jiang Yanting, Hu Renfen. Representation Learning and Multi-Label Classification of Books Based on BERT[J]. New Century Library, 2020(9): 38-44.)
[13]
Huang W, Chen E H, Liu Q, et al. Hierarchical Multi-Label Text Classification: An Attention-Based Recurrent Network Approach[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019: 1051-1060.
[14]
Gong J B, Teng Z Y, Teng Q, et al. Hierarchical Graph Transformer-Based Deep Learning Model for Large-Scale Multi-Label Text Classification[J]. IEEE Access, 2020(8): 30885-30896.
[15]
Sinha K, Dong Y, Cheung J C K, et al. A Hierarchical Neural Attention-Based Text Classifier[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 817-823.
[16]
Banerjee S, Akkaya C, Perez-Sorrosal F, et al. Hierarchical Transfer Learning for Multi-Label Text Classification[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 6295-6300.
[17]
Peng H, Li J X, Wang S Z, et al. Hierarchical Taxonomy-Aware and Attentional Graph Capsule RCNNS for Large-Scale Multi-Label Text Classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(6): 2505-2519.
doi: 10.1109/TKDE.2019.2959991
[18]
Mao Y N, Tian J J, Han J W, et al. Hierarchical Text Classification with Reinforced Label Assignment[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 445-455.
[19]
Wu J W, Xiong W H, Wang W Y. Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 4353-4363.
[20]
Zhou J, Ma C P, Long D K, et al. Hierarchy-Aware Global Model for Hierarchical Text Classification[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 1106-1117.
[21]
皮亚杰. 发生认识论原理[M]. 王宪钿,译. 北京: 商务印书馆, 1981:132-134.
[21]
(Piaget J. Principles of Genetic Epistemology[M]. Translate by Wang Xiantian. Beijing: The Commercial Press, 1981:132-134.)
(Ministry of Education of the People's Republic of China. Early Learning and Development Guideline[EB/OL]. [2012-10-09]. http://www.moe.gov.cn/jyb_xwfb/xw_zt/moe_357/jyzt_2015nztzl/xueqianjiaoyu/yaowen/202104/W020210820338905908083.pdf.)
[23]
Sun Y, Wang S H, Li Y K, et al. ERNIE: Enhanced Representation Through Knowledge Integration[OL]. arXiv Preprint, arXiv: 1904.09223.
[24]
Vaswani A, Shazeer N, Parmar N, et al. Attention is all You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[25]
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 4171-4186.
[26]
Paszke A, Gross S, Chintala S, et al. Automatic Differentiation in PyTorch[C]// Proceedings of the 31st Conference on Neural Information Processing Systems. 2017.
[27]
He K M, Zhang X Y, Ren S Q, et al. Deep Residual Learning for Image Recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778.
[28]
Loshchilov I, Hutter F. Decoupled Weight Decay Regularization[OL]. arXiv Preprint, arXiv: 1711.05101.
[29]
Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A Simple Way to Prevent Neural Networks from Overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[30]
Lan Z Z, Chen M D, Goodman S, et al. ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations[OL]. arXiv Preprint, arXiv: 1909.11942.
[31]
Liu Y H, Ott M, Goyal N, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach[OL]. arXiv Preprint, arXiv: 1907.11692.
[32]
Shimura K, Li J Y, Fukumoto F. HFT-CNN: Learning Hierarchical Category Structure for Multi-Label Short Text Categorization[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 811-816.
[33]
Wehrmann J, Cerri R, Barros R C. Hierarchical Multi-Label Classification Networks[C]// Proceedings of the 35th International Conference on Machine Learning. 2018:5075-5084.