Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (7): 156-169    DOI: 10.11925/infotech.2096-3467.2022.0649
Current Issue | Archive | Adv Search |
Hierarchical Multi-label Classification of Children's Literature for Graded Reading
Cheng Quan(),Dong Jia
School of Economics and Management, Fuzhou University, Fuzhou 350116, China
Download: PDF (1466 KB)   HTML ( 16
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study constructs a hierarchical multi-label classification model for children's literature, aiming to realize the automatic classification of children's books, guiding young readers to select books suitable for their development needs. [Methods] We materialized the concept of graded reading into a hierarchical classification label system for children's literature. Then, we built ERNIE-HAM model using deep learning techniques and applied it to the hierarchical multi-label text classification system. [Results] Compared with the four pre-training models, the ERNIE-HAM model performed well in the second and third hierarchical classification levels for children's books. Compared to the single-level algorithm, the hierarchical algorithm improved the A U ( P R C ˉ ) values for the second and third levels by about 11%. Compared to the two hierarchical multi-label classification models, HFT-CNN and HMCN, the ERNIE-HAM model improved the third level by 12.79% and 6.48% in the classification results, respectively. [Limitations] The overall classification performance of the proposed model can be further improved, and future work should focus on expanding the dataset and refining the algorithm design. [Conclusions] The ERNIE-HAM model is effective in the hierarchical multi-label classification for children's literature.

Key wordsGraded Reading      Classification of Children's Books      Hierarchical Multi-label Text Classification      Classification System     
Received: 23 June 2022      Published: 07 September 2023
ZTFLH:  G254  
Fund:National Social Science Fund of China(19BTQ072)
Corresponding Authors: Cheng Quan,ORCID:0000-0002-7302-4527,E-mail: chengquan@fzu.edu.cn。   

Cite this article:

Cheng Quan, Dong Jia. Hierarchical Multi-label Classification of Children's Literature for Graded Reading. Data Analysis and Knowledge Discovery, 2023, 7(7): 156-169.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0649     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I7/156

Text Data Annotation Process of a Single Children's Book
Hierarchical Classification Label System and Data Distribution of Children's Books
The Model Structure of ERNIE-HAM
Schematic Diagram of ERNIE Model Input Layer
Internal Structure of HAM
Text Length Distribution in Corpus
参数 设置
文本长度 512
ERNIE隐藏层单元数量 728
全连接层神经元个数 256
优化器 AdamW
L2正则项系数 0.001
学习率(Learning Rate) 2×10-5
学习率预热步数 500
随机失活率(Dropout) 0.3
Experimental Parameter Setting
Training Process of ERNIE-HAM
对比内容 ERNIE BERT ALBERT RoBERTa
调用模型名称 ERNIE 1.0 BERT-Base-chinese ALBERT-tiny RoBERTa-wwm
训练数据集 异构中文语料库 中文维基百科 中文维基百科,CNA中文新闻 中文维基百科
词表大小 17 965 21 128 21 128 21 128
隐藏层神经元数量 768 768 312 768
是否改进MLM任务 带有先验知识的MLM MLM MLM 动态中文全词遮掩
是否训练NSP任务 句间连贯性预测
Comparison of Four Pre-training Models
模型 A U ( P R C ˉ )/%
第一层级 第二层级 第三层级
ERNIE 78.14 66.37 52.59
BERT 78.63 63.74 51.12
ALBERT 66.81 57.92 48.66
RoBERTa 78.44 62.43 51.28
Classification Performance of Different Pre-training Models
Comparison Results of Hierarchical and Single-level Algorithms
模型 A U ( P R C ˉ )/%
第一层级 第二层级 第三层级
HFT-CNN 69.32 56.28 39.80
HMCN 74.42 60.24 46.11
ERNIE-HAM 78.14 66.37 52.59
Classification Performance of Multi-label Models at Different Levels
[1] 中国新闻出版研究院全国国民阅读调查课题组. 第十八次全国国民阅读调查主要发现[J]. 出版发行研究, 2021(4): 19-24.
[1] (The Working Group of National Reading Survey of Chinese Academy of Press & Publications. The Main Findings of the 18th National Reading Survey[J]. Publishing Research, 2021(4): 19-24.)
[2] 周力虹, 刘芳. 图书馆未成年人数字分级阅读服务研究[J]. 图书馆建设, 2014(12): 59-62.
[2] (Zhou Lihong, Liu Fang. Research on the Digital Grade Reading Service Oriented to Minors in the Library[J]. Library Development, 2014(12): 59-62.)
[3] 马小翠, 卜璐. 少儿图书分级阅读的理论与实践研究[J]. 图书馆研究与工作, 2020(9): 50-53, 63.
[3] (Ma Xiaocui, Bu Lu. Research on the Theory and Practice of Children's Book Graded Reading[J]. Library Science Research & Work, 2020(9): 50-53, 63.)
[4] McGeown S P, Osborne C, Warhurst A, et al. Understanding Children's Reading Activities: Reading Motivation, Skill and Child Characteristics as Predictors[J]. Journal of Research in Reading, 2016, 39(1): 109-125.
doi: 10.1111/jrir.v39.1
[5] 黄宁. 浅析图书分级对儿童阅读的影响[J]. 图书馆工作与研究, 2015(3): 102-104.
[5] (Huang Ning. The Influence of Book Classification to the Children's Reading[J]. Library Work and Study, 2015(3): 102-104.)
[6] 张小琴, 李孝滢, 王昊. 南京地区儿童阅读现状及分级阅读需求调查研究[J]. 图书馆理论与实践, 2019(8): 74-78.
[6] (Zhang Xiaoqin, Li Xiaoying, Wang Hao. A Survey of Children's Reading Status and Graded Reading Needs in Nanjing[J]. Library Theory and Practice, 2019(8): 74-78.)
[7] 王昊, 严明, 苏新宁. 基于机器学习的中文书目自动分类研究[J]. 中国图书馆学报, 2010, 36(6): 28-39.
[7] (Wang Hao, Yan Ming, Su Xinning. Research on Automatic Classification for Chinese Bibliography Based on Machine Learning[J]. Journal of Library Science in China, 2010, 36(6): 28-39.)
[8] 邹鼎杰. 基于知识图谱和贝叶斯分类器的图书分类[J]. 计算机工程与设计, 2020, 41(6): 1796-1801.
[8] (Zou Dingjie. Book Classification Based on Knowledge Graph and Bayesian Classifier[J]. Computer Engineering and Design, 2020, 41(6): 1796-1801.)
[9] 潘辉. 基于极限学习机的自动化图书信息分类技术[J]. 现代电子技术, 2019, 42(17): 183-186.
[9] (Pan Hui. Automated Book Information Classification Technology Based on Extreme Learning Machine[J]. Modern Electronics Technique, 2019, 42(17): 183-186.)
[10] 潘峻. 基于双向LSTM的图书分类系统的设计与实现[J]. 信息技术, 2020, 44(1): 67-70, 74.
[10] (Pan Jun. Development of Book Classification System Based on Bi-LSTM[J]. Information Technology, 2020, 44(1): 67-70, 74.)
[11] 邓三鸿, 傅余洋子, 王昊. 基于LSTM模型的中文图书多标签分类研究[J]. 数据分析与知识发现, 2017, 1(7): 52-60.
[11] (Deng Sanhong, Fu Yuyangzi, Wang Hao. Multi-Label Classification of Chinese Books with LSTM Model[J]. Data Analysis and Knowledge Discovery, 2017, 1(7): 52-60.)
[12] 蒋彦廷, 胡韧奋. 基于BERT模型的图书表示学习与多标签分类研究[J]. 新世纪图书馆, 2020(9): 38-44.
[12] (Jiang Yanting, Hu Renfen. Representation Learning and Multi-Label Classification of Books Based on BERT[J]. New Century Library, 2020(9): 38-44.)
[13] Huang W, Chen E H, Liu Q, et al. Hierarchical Multi-Label Text Classification: An Attention-Based Recurrent Network Approach[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019: 1051-1060.
[14] Gong J B, Teng Z Y, Teng Q, et al. Hierarchical Graph Transformer-Based Deep Learning Model for Large-Scale Multi-Label Text Classification[J]. IEEE Access, 2020(8): 30885-30896.
[15] Sinha K, Dong Y, Cheung J C K, et al. A Hierarchical Neural Attention-Based Text Classifier[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 817-823.
[16] Banerjee S, Akkaya C, Perez-Sorrosal F, et al. Hierarchical Transfer Learning for Multi-Label Text Classification[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 6295-6300.
[17] Peng H, Li J X, Wang S Z, et al. Hierarchical Taxonomy-Aware and Attentional Graph Capsule RCNNS for Large-Scale Multi-Label Text Classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(6): 2505-2519.
doi: 10.1109/TKDE.2019.2959991
[18] Mao Y N, Tian J J, Han J W, et al. Hierarchical Text Classification with Reinforced Label Assignment[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 445-455.
[19] Wu J W, Xiong W H, Wang W Y. Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 4353-4363.
[20] Zhou J, Ma C P, Long D K, et al. Hierarchy-Aware Global Model for Hierarchical Text Classification[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 1106-1117.
[21] 皮亚杰. 发生认识论原理[M]. 王宪钿,译. 北京: 商务印书馆, 1981:132-134.
[21] (Piaget J. Principles of Genetic Epistemology[M]. Translate by Wang Xiantian. Beijing: The Commercial Press, 1981:132-134.)
[22] 中华人民共和国教育部.3-6 岁儿童学习与发展指南[EB/OL]. [2012-10-09]. http://www.moe.gov.cn/jyb_xwfb/xw_zt/moe_357/jyzt_2015nztzl/xueqianjiaoyu/yaowen/202104/W020210820338905908083.pdf.
[22] (Ministry of Education of the People's Republic of China. Early Learning and Development Guideline[EB/OL]. [2012-10-09]. http://www.moe.gov.cn/jyb_xwfb/xw_zt/moe_357/jyzt_2015nztzl/xueqianjiaoyu/yaowen/202104/W020210820338905908083.pdf.)
[23] Sun Y, Wang S H, Li Y K, et al. ERNIE: Enhanced Representation Through Knowledge Integration[OL]. arXiv Preprint, arXiv: 1904.09223.
[24] Vaswani A, Shazeer N, Parmar N, et al. Attention is all You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[25] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 4171-4186.
[26] Paszke A, Gross S, Chintala S, et al. Automatic Differentiation in PyTorch[C]// Proceedings of the 31st Conference on Neural Information Processing Systems. 2017.
[27] He K M, Zhang X Y, Ren S Q, et al. Deep Residual Learning for Image Recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778.
[28] Loshchilov I, Hutter F. Decoupled Weight Decay Regularization[OL]. arXiv Preprint, arXiv: 1711.05101.
[29] Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A Simple Way to Prevent Neural Networks from Overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[30] Lan Z Z, Chen M D, Goodman S, et al. ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations[OL]. arXiv Preprint, arXiv: 1909.11942.
[31] Liu Y H, Ott M, Goyal N, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach[OL]. arXiv Preprint, arXiv: 1907.11692.
[32] Shimura K, Li J Y, Fukumoto F. HFT-CNN: Learning Hierarchical Category Structure for Multi-Label Short Text Categorization[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 811-816.
[33] Wehrmann J, Cerri R, Barros R C. Hierarchical Multi-Label Classification Networks[C]// Proceedings of the 35th International Conference on Machine Learning. 2018:5075-5084.
[1] Hu Zhengyin, Fang Shu, Wen Yi, Zhang Xian, Liang Tian. Study on Automatic Classification of Patents Oriented to TRIZ[J]. 现代图书情报技术, 2015, 31(1): 66-74.
[2] Xiao Xin,Yuan Zhongzhi,Liao Jufang. The Investigations of Classification System of Chemical Resource on Internet[J]. 现代图书情报技术, 2002, 18(2): 69-71.
[3] Mao Jun. Use of Classification System in the OPAC[J]. 现代图书情报技术, 2001, 17(4): 14-16.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn