Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (7): 156-169     https://doi.org/10.11925/infotech.2096-3467.2022.0649
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
面向分级阅读的儿童读物层级多标签分类研究*
成全(),董佳
福州大学经济与管理学院 福州 350116
Hierarchical Multi-label Classification of Children's Literature for Graded Reading
Cheng Quan(),Dong Jia
School of Economics and Management, Fuzhou University, Fuzhou 350116, China
全文: PDF (1466 KB)   HTML ( 16
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 构建儿童读物层级多标签分类模型,实现对儿童读物的自动化分类,以引导儿童读者选择适合自身发展情况的读物。【方法】 将分级阅读的理念具化成儿童读物层级分类标签体系,采用深度学习技术构建ERNIE-HAM模型,并将其应用于儿童读物的层级多标签文本分类。【结果】 通过对比4种预训练模型,ERNIE-HAM模型在儿童读物层级分类的第二层级、第三层级分类中具有较好的表现;对比单层级算法,层级算法在第二层级和第三层级的 A U ( P R C ˉ )值都提升了约11个百分点;对比HFT-CNN和HMCN两个层级多标签分类模型,ERNIE-HAM模型在第三层级的分类结果中 A U ( P R C ˉ )值分别提升12.79和6.48个百分点。【局限】 ERNIE-HAM模型的整体分类效果有待进一步提升,未来在数据集的体量扩充和算法设计上需要进一步完善和探索。【结论】 ERNIE-HAM模型在儿童读物层级多标签分类任务上具有有效性。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
成全
董佳
关键词 分级阅读儿童读物分类层级多标签文本分类分类体系    
Abstract

[Objective] This study constructs a hierarchical multi-label classification model for children's literature, aiming to realize the automatic classification of children's books, guiding young readers to select books suitable for their development needs. [Methods] We materialized the concept of graded reading into a hierarchical classification label system for children's literature. Then, we built ERNIE-HAM model using deep learning techniques and applied it to the hierarchical multi-label text classification system. [Results] Compared with the four pre-training models, the ERNIE-HAM model performed well in the second and third hierarchical classification levels for children's books. Compared to the single-level algorithm, the hierarchical algorithm improved the A U ( P R C ˉ ) values for the second and third levels by about 11%. Compared to the two hierarchical multi-label classification models, HFT-CNN and HMCN, the ERNIE-HAM model improved the third level by 12.79% and 6.48% in the classification results, respectively. [Limitations] The overall classification performance of the proposed model can be further improved, and future work should focus on expanding the dataset and refining the algorithm design. [Conclusions] The ERNIE-HAM model is effective in the hierarchical multi-label classification for children's literature.

Key wordsGraded Reading    Classification of Children's Books    Hierarchical Multi-label Text Classification    Classification System
收稿日期: 2022-06-23      出版日期: 2023-09-07
ZTFLH:  G254  
基金资助:*国家社会科学基金项目的研究成果之一(19BTQ072)
通讯作者: 成全,ORCID:0000-0002-7302-4527,E-mail: chengquan@fzu.edu.cn。   
引用本文:   
成全, 董佳. 面向分级阅读的儿童读物层级多标签分类研究*[J]. 数据分析与知识发现, 2023, 7(7): 156-169.
Cheng Quan, Dong Jia. Hierarchical Multi-label Classification of Children's Literature for Graded Reading. Data Analysis and Knowledge Discovery, 2023, 7(7): 156-169.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0649      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I7/156
Fig.1  单条儿童读物文本数据标注流程
Fig.2  儿童读物层级分类标签体系及数据分布
Fig.3  ERNIE-HAM模型结构
Fig.4  ERNIE模型输入层示意图
Fig.5  HAM内部结构
Fig.6  语料库中文本长度分布
参数 设置
文本长度 512
ERNIE隐藏层单元数量 728
全连接层神经元个数 256
优化器 AdamW
L2正则项系数 0.001
学习率(Learning Rate) 2×10-5
学习率预热步数 500
随机失活率(Dropout) 0.3
Table 1  实验参数设置情况
Fig.7  ERNIE-HAM模型的训练过程
对比内容 ERNIE BERT ALBERT RoBERTa
调用模型名称 ERNIE 1.0 BERT-Base-chinese ALBERT-tiny RoBERTa-wwm
训练数据集 异构中文语料库 中文维基百科 中文维基百科,CNA中文新闻 中文维基百科
词表大小 17 965 21 128 21 128 21 128
隐藏层神经元数量 768 768 312 768
是否改进MLM任务 带有先验知识的MLM MLM MLM 动态中文全词遮掩
是否训练NSP任务 句间连贯性预测
Table 2  4种预训练模型的区别对比
模型 A U ( P R C ˉ )/%
第一层级 第二层级 第三层级
ERNIE 78.14 66.37 52.59
BERT 78.63 63.74 51.12
ALBERT 66.81 57.92 48.66
RoBERTa 78.44 62.43 51.28
Table 3  不同预训练模型分类性能比较
Fig.8  多层级与单层级算法对比结果
模型 A U ( P R C ˉ )/%
第一层级 第二层级 第三层级
HFT-CNN 69.32 56.28 39.80
HMCN 74.42 60.24 46.11
ERNIE-HAM 78.14 66.37 52.59
Table 4  不同层级多标签模型在各层级分类性能比较
[1] 中国新闻出版研究院全国国民阅读调查课题组. 第十八次全国国民阅读调查主要发现[J]. 出版发行研究, 2021(4): 19-24.
[1] (The Working Group of National Reading Survey of Chinese Academy of Press & Publications. The Main Findings of the 18th National Reading Survey[J]. Publishing Research, 2021(4): 19-24.)
[2] 周力虹, 刘芳. 图书馆未成年人数字分级阅读服务研究[J]. 图书馆建设, 2014(12): 59-62.
[2] (Zhou Lihong, Liu Fang. Research on the Digital Grade Reading Service Oriented to Minors in the Library[J]. Library Development, 2014(12): 59-62.)
[3] 马小翠, 卜璐. 少儿图书分级阅读的理论与实践研究[J]. 图书馆研究与工作, 2020(9): 50-53, 63.
[3] (Ma Xiaocui, Bu Lu. Research on the Theory and Practice of Children's Book Graded Reading[J]. Library Science Research & Work, 2020(9): 50-53, 63.)
[4] McGeown S P, Osborne C, Warhurst A, et al. Understanding Children's Reading Activities: Reading Motivation, Skill and Child Characteristics as Predictors[J]. Journal of Research in Reading, 2016, 39(1): 109-125.
doi: 10.1111/jrir.v39.1
[5] 黄宁. 浅析图书分级对儿童阅读的影响[J]. 图书馆工作与研究, 2015(3): 102-104.
[5] (Huang Ning. The Influence of Book Classification to the Children's Reading[J]. Library Work and Study, 2015(3): 102-104.)
[6] 张小琴, 李孝滢, 王昊. 南京地区儿童阅读现状及分级阅读需求调查研究[J]. 图书馆理论与实践, 2019(8): 74-78.
[6] (Zhang Xiaoqin, Li Xiaoying, Wang Hao. A Survey of Children's Reading Status and Graded Reading Needs in Nanjing[J]. Library Theory and Practice, 2019(8): 74-78.)
[7] 王昊, 严明, 苏新宁. 基于机器学习的中文书目自动分类研究[J]. 中国图书馆学报, 2010, 36(6): 28-39.
[7] (Wang Hao, Yan Ming, Su Xinning. Research on Automatic Classification for Chinese Bibliography Based on Machine Learning[J]. Journal of Library Science in China, 2010, 36(6): 28-39.)
[8] 邹鼎杰. 基于知识图谱和贝叶斯分类器的图书分类[J]. 计算机工程与设计, 2020, 41(6): 1796-1801.
[8] (Zou Dingjie. Book Classification Based on Knowledge Graph and Bayesian Classifier[J]. Computer Engineering and Design, 2020, 41(6): 1796-1801.)
[9] 潘辉. 基于极限学习机的自动化图书信息分类技术[J]. 现代电子技术, 2019, 42(17): 183-186.
[9] (Pan Hui. Automated Book Information Classification Technology Based on Extreme Learning Machine[J]. Modern Electronics Technique, 2019, 42(17): 183-186.)
[10] 潘峻. 基于双向LSTM的图书分类系统的设计与实现[J]. 信息技术, 2020, 44(1): 67-70, 74.
[10] (Pan Jun. Development of Book Classification System Based on Bi-LSTM[J]. Information Technology, 2020, 44(1): 67-70, 74.)
[11] 邓三鸿, 傅余洋子, 王昊. 基于LSTM模型的中文图书多标签分类研究[J]. 数据分析与知识发现, 2017, 1(7): 52-60.
[11] (Deng Sanhong, Fu Yuyangzi, Wang Hao. Multi-Label Classification of Chinese Books with LSTM Model[J]. Data Analysis and Knowledge Discovery, 2017, 1(7): 52-60.)
[12] 蒋彦廷, 胡韧奋. 基于BERT模型的图书表示学习与多标签分类研究[J]. 新世纪图书馆, 2020(9): 38-44.
[12] (Jiang Yanting, Hu Renfen. Representation Learning and Multi-Label Classification of Books Based on BERT[J]. New Century Library, 2020(9): 38-44.)
[13] Huang W, Chen E H, Liu Q, et al. Hierarchical Multi-Label Text Classification: An Attention-Based Recurrent Network Approach[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019: 1051-1060.
[14] Gong J B, Teng Z Y, Teng Q, et al. Hierarchical Graph Transformer-Based Deep Learning Model for Large-Scale Multi-Label Text Classification[J]. IEEE Access, 2020(8): 30885-30896.
[15] Sinha K, Dong Y, Cheung J C K, et al. A Hierarchical Neural Attention-Based Text Classifier[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 817-823.
[16] Banerjee S, Akkaya C, Perez-Sorrosal F, et al. Hierarchical Transfer Learning for Multi-Label Text Classification[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 6295-6300.
[17] Peng H, Li J X, Wang S Z, et al. Hierarchical Taxonomy-Aware and Attentional Graph Capsule RCNNS for Large-Scale Multi-Label Text Classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(6): 2505-2519.
doi: 10.1109/TKDE.2019.2959991
[18] Mao Y N, Tian J J, Han J W, et al. Hierarchical Text Classification with Reinforced Label Assignment[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 445-455.
[19] Wu J W, Xiong W H, Wang W Y. Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 4353-4363.
[20] Zhou J, Ma C P, Long D K, et al. Hierarchy-Aware Global Model for Hierarchical Text Classification[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 1106-1117.
[21] 皮亚杰. 发生认识论原理[M]. 王宪钿,译. 北京: 商务印书馆, 1981:132-134.
[21] (Piaget J. Principles of Genetic Epistemology[M]. Translate by Wang Xiantian. Beijing: The Commercial Press, 1981:132-134.)
[22] 中华人民共和国教育部.3-6 岁儿童学习与发展指南[EB/OL]. [2012-10-09]. http://www.moe.gov.cn/jyb_xwfb/xw_zt/moe_357/jyzt_2015nztzl/xueqianjiaoyu/yaowen/202104/W020210820338905908083.pdf.
[22] (Ministry of Education of the People's Republic of China. Early Learning and Development Guideline[EB/OL]. [2012-10-09]. http://www.moe.gov.cn/jyb_xwfb/xw_zt/moe_357/jyzt_2015nztzl/xueqianjiaoyu/yaowen/202104/W020210820338905908083.pdf.)
[23] Sun Y, Wang S H, Li Y K, et al. ERNIE: Enhanced Representation Through Knowledge Integration[OL]. arXiv Preprint, arXiv: 1904.09223.
[24] Vaswani A, Shazeer N, Parmar N, et al. Attention is all You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[25] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 4171-4186.
[26] Paszke A, Gross S, Chintala S, et al. Automatic Differentiation in PyTorch[C]// Proceedings of the 31st Conference on Neural Information Processing Systems. 2017.
[27] He K M, Zhang X Y, Ren S Q, et al. Deep Residual Learning for Image Recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778.
[28] Loshchilov I, Hutter F. Decoupled Weight Decay Regularization[OL]. arXiv Preprint, arXiv: 1711.05101.
[29] Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A Simple Way to Prevent Neural Networks from Overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[30] Lan Z Z, Chen M D, Goodman S, et al. ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations[OL]. arXiv Preprint, arXiv: 1909.11942.
[31] Liu Y H, Ott M, Goyal N, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach[OL]. arXiv Preprint, arXiv: 1907.11692.
[32] Shimura K, Li J Y, Fukumoto F. HFT-CNN: Learning Hierarchical Category Structure for Multi-Label Short Text Categorization[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 811-816.
[33] Wehrmann J, Cerri R, Barros R C. Hierarchical Multi-Label Classification Networks[C]// Proceedings of the 35th International Conference on Machine Learning. 2018:5075-5084.
[1] 胡正银, 方曙, 文奕, 张娴, 梁田. 面向TRIZ的专利自动分类研究[J]. 现代图书情报技术, 2015, 31(1): 66-74.
[2] 李嘉, 张朋柱, 李欣苗. 面向在线群体研讨的言语行为分类体系设计框架研究[J]. 现代图书情报技术, 2012, 28(2): 1-9.
[3] 刘华 . 超大规模分类语料库的构建[J]. 现代图书情报技术, 2006, 22(1): 71-73.
[4] 汪华明. 我国情报文献工作标准化理论探讨及政策研究[J]. 现代图书情报技术, 1995, 11(4): 16-18.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn