Hierarchical Multi-label Classification of Children's Literature for Graded Reading

doi:10.11925/infotech.2096-3467.2022.0649

Data Analysis and Knowledge Discovery

2023, Vol. 7

Issue (7): 156-169 DOI: 10.11925/infotech.2096-3467.2022.0649

Current Issue | Archive | Adv Search

Hierarchical Multi-label Classification of Children's Literature for Graded Reading

Cheng Quan(

),Dong Jia

School of Economics and Management, Fuzhou University, Fuzhou 350116, China

Download: PDF (1466 KB) HTML ( 16 )
Export: BibTeX | EndNote (RIS)

Abstract

[Objective] This study constructs a hierarchical multi-label classification model for children's literature, aiming to realize the automatic classification of children's books, guiding young readers to select books suitable for their development needs. [Methods] We materialized the concept of graded reading into a hierarchical classification label system for children's literature. Then, we built ERNIE-HAM model using deep learning techniques and applied it to the hierarchical multi-label text classification system. [Results] Compared with the four pre-training models, the ERNIE-HAM model performed well in the second and third hierarchical classification levels for children's books. Compared to the single-level algorithm, the hierarchical algorithm improved the $A U (P R C ˉ)$ values for the second and third levels by about 11%. Compared to the two hierarchical multi-label classification models, HFT-CNN and HMCN, the ERNIE-HAM model improved the third level by 12.79% and 6.48% in the classification results, respectively. [Limitations] The overall classification performance of the proposed model can be further improved, and future work should focus on expanding the dataset and refining the algorithm design. [Conclusions] The ERNIE-HAM model is effective in the hierarchical multi-label classification for children's literature.

Key words： Graded Reading Classification of Children's Books Hierarchical Multi-label Text Classification Classification System

Received: 23 June 2022 Published: 07 September 2023

ZTFLH:

G254

Fund:National Social Science Fund of China(19BTQ072)

Corresponding Authors: Cheng Quan，ORCID：0000-0002-7302-4527，E-mail： chengquan@fzu.edu.cn。

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Quan Cheng
	Jia Dong

Cite this article:

Cheng Quan, Dong Jia. Hierarchical Multi-label Classification of Children's Literature for Graded Reading. Data Analysis and Knowledge Discovery, 2023, 7(7): 156-169.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0649 OR https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I7/156

Text Data Annotation Process of a Single Children's Book

Hierarchical Classification Label System and Data Distribution of Children's Books

The Model Structure of ERNIE-HAM

Schematic Diagram of ERNIE Model Input Layer

Internal Structure of HAM

Text Length Distribution in Corpus

Experimental Parameter Setting

Training Process of ERNIE-HAM

Comparison of Four Pre-training Models

模型	$A U (P R C ˉ)$ /%
模型	第一层级	第二层级	第三层级
ERNIE	78.14	66.37	52.59
BERT	78.63	63.74	51.12
ALBERT	66.81	57.92	48.66
RoBERTa	78.44	62.43	51.28

Classification Performance of Different Pre-training Models

Comparison Results of Hierarchical and Single-level Algorithms

模型	$A U (P R C ˉ)$ /%
模型	第一层级	第二层级	第三层级
HFT-CNN	69.32	56.28	39.80
HMCN	74.42	60.24	46.11
ERNIE-HAM	78.14	66.37	52.59

Classification Performance of Multi-label Models at Different Levels

[1]	中国新闻出版研究院全国国民阅读调查课题组. 第十八次全国国民阅读调查主要发现[J]. 出版发行研究, 2021(4): 19-24.
[1]	(The Working Group of National Reading Survey of Chinese Academy of Press & Publications. The Main Findings of the 18th National Reading Survey[J]. Publishing Research, 2021(4): 19-24.)
[2]	周力虹, 刘芳. 图书馆未成年人数字分级阅读服务研究[J]. 图书馆建设, 2014(12): 59-62.
[2]	(Zhou Lihong, Liu Fang. Research on the Digital Grade Reading Service Oriented to Minors in the Library[J]. Library Development, 2014(12): 59-62.)
[3]	马小翠, 卜璐. 少儿图书分级阅读的理论与实践研究[J]. 图书馆研究与工作, 2020(9): 50-53, 63.
[3]	(Ma Xiaocui, Bu Lu. Research on the Theory and Practice of Children's Book Graded Reading[J]. Library Science Research & Work, 2020(9): 50-53, 63.)
[4]	McGeown S P, Osborne C, Warhurst A, et al. Understanding Children's Reading Activities: Reading Motivation, Skill and Child Characteristics as Predictors[J]. Journal of Research in Reading, 2016, 39(1): 109-125. doi: 10.1111/jrir.v39.1
[5]	黄宁. 浅析图书分级对儿童阅读的影响[J]. 图书馆工作与研究, 2015(3): 102-104.
[5]	(Huang Ning. The Influence of Book Classification to the Children's Reading[J]. Library Work and Study, 2015(3): 102-104.)
[6]	张小琴, 李孝滢, 王昊. 南京地区儿童阅读现状及分级阅读需求调查研究[J]. 图书馆理论与实践, 2019(8): 74-78.
[6]	(Zhang Xiaoqin, Li Xiaoying, Wang Hao. A Survey of Children's Reading Status and Graded Reading Needs in Nanjing[J]. Library Theory and Practice, 2019(8): 74-78.)
[7]	王昊, 严明, 苏新宁. 基于机器学习的中文书目自动分类研究[J]. 中国图书馆学报, 2010, 36(6): 28-39.
[7]	(Wang Hao, Yan Ming, Su Xinning. Research on Automatic Classification for Chinese Bibliography Based on Machine Learning[J]. Journal of Library Science in China, 2010, 36(6): 28-39.)
[8]	邹鼎杰. 基于知识图谱和贝叶斯分类器的图书分类[J]. 计算机工程与设计, 2020, 41(6): 1796-1801.
[8]	(Zou Dingjie. Book Classification Based on Knowledge Graph and Bayesian Classifier[J]. Computer Engineering and Design, 2020, 41(6): 1796-1801.)
[9]	潘辉. 基于极限学习机的自动化图书信息分类技术[J]. 现代电子技术, 2019, 42(17): 183-186.
[9]	(Pan Hui. Automated Book Information Classification Technology Based on Extreme Learning Machine[J]. Modern Electronics Technique, 2019, 42(17): 183-186.)
[10]	潘峻. 基于双向LSTM的图书分类系统的设计与实现[J]. 信息技术, 2020, 44(1): 67-70, 74.
[10]	(Pan Jun. Development of Book Classification System Based on Bi-LSTM[J]. Information Technology, 2020, 44(1): 67-70, 74.)
[11]	邓三鸿, 傅余洋子, 王昊. 基于LSTM模型的中文图书多标签分类研究[J]. 数据分析与知识发现, 2017, 1(7): 52-60.
[11]	(Deng Sanhong, Fu Yuyangzi, Wang Hao. Multi-Label Classification of Chinese Books with LSTM Model[J]. Data Analysis and Knowledge Discovery, 2017, 1(7): 52-60.)
[12]	蒋彦廷, 胡韧奋. 基于BERT模型的图书表示学习与多标签分类研究[J]. 新世纪图书馆, 2020(9): 38-44.
[12]	(Jiang Yanting, Hu Renfen. Representation Learning and Multi-Label Classification of Books Based on BERT[J]. New Century Library, 2020(9): 38-44.)
[13]	Huang W, Chen E H, Liu Q, et al. Hierarchical Multi-Label Text Classification: An Attention-Based Recurrent Network Approach[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019: 1051-1060.
[14]	Gong J B, Teng Z Y, Teng Q, et al. Hierarchical Graph Transformer-Based Deep Learning Model for Large-Scale Multi-Label Text Classification[J]. IEEE Access, 2020(8): 30885-30896.
[15]	Sinha K, Dong Y, Cheung J C K, et al. A Hierarchical Neural Attention-Based Text Classifier[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 817-823.
[16]	Banerjee S, Akkaya C, Perez-Sorrosal F, et al. Hierarchical Transfer Learning for Multi-Label Text Classification[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 6295-6300.
[17]	Peng H, Li J X, Wang S Z, et al. Hierarchical Taxonomy-Aware and Attentional Graph Capsule RCNNS for Large-Scale Multi-Label Text Classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(6): 2505-2519. doi: 10.1109/TKDE.2019.2959991
[18]	Mao Y N, Tian J J, Han J W, et al. Hierarchical Text Classification with Reinforced Label Assignment[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 445-455.
[19]	Wu J W, Xiong W H, Wang W Y. Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 4353-4363.
[20]	Zhou J, Ma C P, Long D K, et al. Hierarchy-Aware Global Model for Hierarchical Text Classification[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 1106-1117.
[21]	皮亚杰. 发生认识论原理[M]. 王宪钿,译. 北京: 商务印书馆, 1981:132-134.
[21]	(Piaget J. Principles of Genetic Epistemology[M]. Translate by Wang Xiantian. Beijing: The Commercial Press, 1981:132-134.)
[22]	中华人民共和国教育部.3-6 岁儿童学习与发展指南[EB/OL]. [2012-10-09]. http://www.moe.gov.cn/jyb_xwfb/xw_zt/moe_357/jyzt_2015nztzl/xueqianjiaoyu/yaowen/202104/W020210820338905908083.pdf.
[22]	(Ministry of Education of the People's Republic of China. Early Learning and Development Guideline[EB/OL]. [2012-10-09]. http://www.moe.gov.cn/jyb_xwfb/xw_zt/moe_357/jyzt_2015nztzl/xueqianjiaoyu/yaowen/202104/W020210820338905908083.pdf.)
[23]	Sun Y, Wang S H, Li Y K, et al. ERNIE: Enhanced Representation Through Knowledge Integration[OL]. arXiv Preprint, arXiv: 1904.09223.
[24]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[25]	Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 4171-4186.
[26]	Paszke A, Gross S, Chintala S, et al. Automatic Differentiation in PyTorch[C]// Proceedings of the 31st Conference on Neural Information Processing Systems. 2017.
[27]	He K M, Zhang X Y, Ren S Q, et al. Deep Residual Learning for Image Recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778.
[28]	Loshchilov I, Hutter F. Decoupled Weight Decay Regularization[OL]. arXiv Preprint, arXiv: 1711.05101.
[29]	Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A Simple Way to Prevent Neural Networks from Overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[30]	Lan Z Z, Chen M D, Goodman S, et al. ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations[OL]. arXiv Preprint, arXiv: 1909.11942.
[31]	Liu Y H, Ott M, Goyal N, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach[OL]. arXiv Preprint, arXiv: 1907.11692.
[32]	Shimura K, Li J Y, Fukumoto F. HFT-CNN: Learning Hierarchical Category Structure for Multi-Label Short Text Categorization[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 811-816.
[33]	Wehrmann J, Cerri R, Barros R C. Hierarchical Multi-Label Classification Networks[C]// Proceedings of the 35th International Conference on Machine Learning. 2018:5075-5084.

[1]	Hu Zhengyin, Fang Shu, Wen Yi, Zhang Xian, Liang Tian. Study on Automatic Classification of Patents Oriented to TRIZ[J]. 现代图书情报技术, 2015, 31(1): 66-74.
[2]	Xiao Xin,Yuan Zhongzhi,Liao Jufang. The Investigations of Classification System of Chemical Resource on Internet[J]. 现代图书情报技术, 2002, 18(2): 69-71.
[3]	Mao Jun. Use of Classification System in the OPAC[J]. 现代图书情报技术, 2001, 17(4): 14-16.

Viewed

Full text

Abstract

Cited

Shared

Discussed