Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (12): 68-75    DOI: 10.11925/infotech.2096-3467.2020.0400
Current Issue | Archive | Adv Search |
Automatic Extraction of Traditional Music Terms of Intangible Cultural Heritage
Liu Liu(),Qin Tianyun,Wang Dongbo
College of Information Management, Nanjing Agricultural University, Nanjing 210095, China
Download: PDF (580 KB)   HTML ( 2
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] Focus on the task of entity recognition of traditional music terms of intangible cultural heritage. [Methods] This research constructed a corpus of national intangible cultural heritage projects based on the China Intangible Cultural Heritage Network, and built an entity recognition framework on traditional music terms based on the CRF, LSTM, LSTM-CRF, and BERT. [Results] According to the performance comparison, the BERT model for recognition of traditional music terms had achieved a better result, with an average F1 value of 91.77%. [Limitations] This study only extract unique terms, and the training set is small. [Conclusions] The entity recognition model constructed by BERT is a valid model for automatically extracting traditional musical terms of intangible cultural heritage. It can provide a reliable reference for the related research of intangible cultural heritage.

Key wordsIntangible Cultural Heritage      Digital Humanities      BERT      Term Extraction      Entity Recognition     
Received: 08 May 2020      Published: 25 December 2020
ZTFLH:  TP391  
Corresponding Authors: Liu Liu     E-mail: liuliu@njau.edu.cn

Cite this article:

Liu Liu,Qin Tianyun,Wang Dongbo. Automatic Extraction of Traditional Music Terms of Intangible Cultural Heritage. Data Analysis and Knowledge Discovery, 2020, 4(12): 68-75.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0400     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I12/68

实体类别 类别标记 示例 数量
非遗名称 ICH-TITLE 江河号子、老河口丝弦、
回族宴席曲
1 518
独有术语 ICH-TERM 横抱琵琶、酒歌、六字真言歌 8 881
传承人姓名 ICH-INHERITOR 陈子敬、沈浩初、沈肇州 217
地名 ICH-PLACE 青海省玉树藏族自治州、卫藏地区、安多 2 772
作品名 ICH-WORKS 《对歌》、《口提哥哥》、
《英雄的村庄》
1 396
工具器具名 ICH-INST 琵琶、笛子、唢呐、小管 2 122
The Summary of Traditional Music Terminology
测试编号 准确率(%) 召回率(%) F1值(%)
1 94.94 84.29 89.30
2 94.59 85.10 89.59
3 94.32 89.02 91.59
4 93.92 86.96 90.31
5 96.12 85.64 90.58
6 95.03 91.55 93.26
7 95.71 84.55 89.78
8 96.77 85.27 90.66
9 96.34 86.23 91.00
10 95.31 89.71 92.42
宏平均 95.31 86.83 90.85
Recognition Results of Traditional Music Entities Based on CRF
测试编号 准确率(%) 召回率(%) F1值(%)
1 84.12 84.12 84.12
2 83.38 85.80 84.57
3 87.80 88.67 88.24
4 84.32 84.54 84.43
5 88.22 81.87 84.92
6 83.52 85.63 84.56
7 89.02 84.75 86.83
8 85.68 84.58 85.12
9 84.66 86.68 85.62
10 86.90 89.46 88.16
宏平均 85.76 85.61 85.66
Recognition Results of Traditional Music Entities Based on LSTM
测试编号 准确率(%) 召回率(%) F1值(%)
1 86.14 86.35 86.25
2 84.96 88.41 86.65
3 87.77 90.15 88.94
4 88.42 86.60 87.50
5 90.96 85.87 88.34
6 85.68 90.99 88.25
7 89.76 84.18 86.88
8 91.29 83.55 87.25
9 91.79 83.99 87.71
10 87.53 91.18 89.32
宏平均 88.43 87.13 87.71
Recognition Results of Traditional Music Entities Based on LSTM-CRF
测试编号 准确率(%) 召回率(%) F1值(%)
1 87.96 93.07 90.44
2 89.30 92.15 90.70
3 91.73 95.44 93.55
4 89.35 92.47 90.89
5 91.60 93.82 92.70
6 89.22 94.03 91.56
7 94.48 93.12 93.80
8 91.27 92.99 92.12
9 89.25 90.13 89.69
10 92.73 91.81 92.27
宏平均 90.69 92.90 91.77
Recognition Results of Traditional Music Entities Based on BERT
模型 平均准确率(%) 平均召回率(%) 平均F1值(%)
CRF 95.31 86.83 90.85
LSTM 85.76 85.61 85.66
LSTM-CRF 88.43 87.13 87.71
BERT 90.69 92.90 91.77
The Average Results of Four Models
[1] 刘浏, 王东波 . 命名实体识别研究综述[J]. 情报学报, 2018,37(3):329-340.
[1] ( Liu Liu, Wang Dongbo . A Review on Named Entity Recognition[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(3):329-340.)
[2] 刘知远, 孙茂松, 林衍凯 , 等. 知识表示学习研究进展[J]. 计算机研究与发展, 2016,53(2):247-261.
[2] ( Liu Zhiyuan, Sun Maosong, Lin Yankai , et al. Knowledge Representation Learning: A Review[J]. Journal of Computer Research and Development, 2016,53(2):247-261.)
[3] 徐增林, 盛泳潘, 贺丽荣 , 等. 知识图谱技术综述[J]. 电子科技大学学报, 2016,45(4):589-606.
[3] ( Xu Zenglin, Sheng Yongpan, He Lirong , et al. Review on Knowledge Graph Techniques[J]. Journal of University of Electronic Science and Technology of China, 2016,45(4):589-606.)
[4] 赖英旭, 李亚娟, 刘静 . 基于本体的水稻育种方法应用知识库构建[J]. 北京工业大学学报, 2019,45(12):1181-1191.
[4] ( Lai Yingxu, Li Yajuan, Liu Jing . Construction of Ontology-based Rice Breeding Method Knowledge Base[J]. Journal of Beijing University of Technology, 2019,45(12):1181-1191.)
[5] 王东波, 高瑞卿, 沈思 , 等. 面向先秦典籍的历史事件基本实体构件自动识别研究[J]. 国家图书馆学刊, 2018,27(1):65-77.
[5] ( Wang Dongbo, Gao Ruiqing, Shen Si , et al. Research on Automatic Recognition of Basic Entity Component of Historic Events for Pre-Qin Classics[J]. Journal of the National Library of China, 2018,27(1):65-77.)
[6] 殷章志, 李欣子, 黄德根 , 等. 融合字词模型的中文命名实体识别研究[J]. 中文信息学报, 2019,33(11):95-100, 106.
[6] ( Yin Zhangzhi, Li Xinzi, Huang Degen , et al. Chinese Named Entity Recognition Ensembled with Character[J]. Journal of Chinese Information Processing, 2019,33(11):95-100, 106.)
[7] 王子牛, 姜猛, 高建瓴 , 等. 基于BERT的中文命名实体识别方法[J]. 计算机科学, 2019,46(S2):138-142.
[7] ( Wang Ziniu, Jiang Meng, Gao Jianling , et al. Chinese Named Entity Recognition Method Based on BERT[J]. Computer Science, 2019,46(S2):138-142.)
[8] 张晓海, 操新文, 张敏 . 基于自注意力机制的军事命名实体识别[J]. 指挥控制与仿真, 2019,41(6):29-33.
[8] ( Zhang Xiaohai, Cao Xinwen, Zhang Min . Military Named Entity Recognition Based on Self-Attention Mechanism[J]. Command Control & Simulation, 2019,41(6):29-33.)
[9] 程钟慧, 陈珂, 陈刚 , 等. 基于强化学习协同训练的命名实体识别方法[J]. 软件工程, 2020,23(1):7-11.
[9] ( Cheng Zhonghui, Chen Ke, Chen Gang , et al. Named Entity Recognition Method Based on Co-training of Reinforcement Learning[J]. Software Engineering, 2020,23(1):7-11.)
[10] 曹依依, 周应华, 申发海 , 等. 基于CNN-CRF的中文电子病历命名实体识别研究[J]. 重庆邮电大学学报(自然科学版), 2019,31(6):869-875.
[10] ( Cao Yiyi, Zhou Yinghua, Shen Fahai , et al. Research on Named Entity Recognition of Chinese Electronic Medical Record Based on CNN-CRF[J]. Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition), 2019,31(6):869-875.)
[11] 王月, 王孟轩, 张胜 , 等. 基于BERT的警情文本命名实体识别[J]. 计算机应用, 2020,40(2):535-540.
[11] ( Wang Yue, Wang Mengxuan, Zhang Sheng , et al. Alarm Text Named Entity Recognition Based on BERT[J]. Journal of Computer Applications, 2020,40(2):535-540.)
[12] 李妮, 关焕梅, 杨飘 , 等. 基于BERT-IDCNN-CRF的中文命名实体识别方法[J]. 山东大学学报(理学版), 2020,55(1):102-109.
[12] ( Li Ni, Guan Huanmei, Yang Piao , et al. BERT-IDCNN-CRF for Named Entity Recognition in Chinese[J]. Journal of Shandong University (Natural Science), 2020,55(1):102-109.)
[13] 黄永林, 谈国新 . 中国非物质文化遗产数字化保护与开发研究[J]. 华中师范大学学报(人文社会科学版), 2012,51(2):49-55.
[13] ( Huang Yonglin, Tan Guoxin . Research on Digital Protection and Development of China’s Intangible Cultural Heritage[J]. Journal of Huazhong Normal University (Humanities and Social Sciences), 2012,51(2):49-55.)
[14] 黄永林 . 数字化背景下非物质文化遗产的保护与利用[J]. 文化遗产, 2015(1):1-10, 157.
[14] ( Huang Yonglin . The Protection and Utilization of Intangible Cultural Heritage Under the Digital Background[J]. Cultural Heritage, 2015(1):1-10, 157.)
[15] 侯西龙, 谈国新, 庄文杰 , 等. 基于关联数据的非物质文化遗产知识管理研究[J]. 中国图书馆学报, 2019,45(2):88-108.
[15] ( Hou Xilong, Tan Guoxin, Zhuang Wenjie , et al. Research on Knowledge Management of Intangible Cultural Heritage Based on Linked Data[J]. Journal of Library Science in China, 2019,45(2):88-108.)
[16] 宋俊华 . 关于非物质文化遗产数字化保护的几点思考[J]. 文化遗产, 2015(2):1-8, 157.
[16] ( Song Junhua . Some Thoughts on Digital Protection of Intangible Cultural Heritage[J]. Cultural Heritage, 2015(2):1-8, 157.)
[17] Lafferty J, Mc Calluma, Prreira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data [C]//Proceedings of the 18th International Conference on Machine Learning. San Francisco: Margan Kaufmann, 2001: 282-289.
[18] Hochreiter S, Schmidhuber J . Long Short-term Memory[J]. Neural Computation, 1997,9(8):1735-1780.
pmid: 9377276
[19] Graves A, Mohamed A, Hinton G. Speech Recognition with Deep Recurrent Neural Networks [C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. 2013: 6645-6649.
[20] Huang Z, Xu W, Yu K . Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508.01991.
[21] Devlin J, Chang M W, Lee K . Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
[1] Liang Jiwen,Jiang Chuan,Wang Dongbo. Chinese-English Sentence Alignment of Ancient Literature Based on Multi-feature Fusion[J]. 数据分析与知识发现, 2020, 4(9): 123-132.
[2] Xu Chenfei, Ye Haiying, Bao Ping. Automatic Recognition of Produce Entities from Local Chronicles with Deep Learning[J]. 数据分析与知识发现, 2020, 4(8): 86-97.
[3] Zhao Yang, Zhang Zhixiong, Liu Huan, Ding Liangping. Classification of Chinese Medical Literature with BERT Model[J]. 数据分析与知识发现, 2020, 4(8): 41-49.
[4] Zhao Ping,Sun Lianying,Tu Shuai,Bian Jianling,Wan Ying. Identifying Scenic Spot Entities Based on Improved Knowledge Transfer[J]. 数据分析与知识发现, 2020, 4(5): 118-126.
[5] Zhang Dongyu,Cui Zijuan,Li Yingxia,Zhang Wei,Lin Hongfei. Identifying Noun Metaphors with Transformer and BERT[J]. 数据分析与知识发现, 2020, 4(4): 100-108.
[6] Gao Yuan,Shi Yuanlei,Zhang Lei,Cao Tianyi,Feng Jun. Reconstructing Tour Routes Based on Travel Notes[J]. 数据分析与知识发现, 2020, 4(2/3): 165-172.
[7] Ma Jianxia,Yuan Hui,Jiang Xiang. Extracting Name Entities from Ecological Restoration Literature with Bi-LSTM+CRF[J]. 数据分析与知识发现, 2020, 4(2/3): 78-88.
[8] Liu Jingru,Song Yang,Jia Rui,Zhang Yipeng,Luo Yong,Ma Jingdong. A BiLSTM-CRF Model for Protected Health Information in Chinese[J]. 数据分析与知识发现, 2020, 4(10): 124-133.
[9] Haici Yang,Jun Wang. Visualizing Knowledge Graph of Academic Inheritance in Song Dynasty[J]. 数据分析与知识发现, 2019, 3(6): 109-116.
[10] Han Huang,Hongyu Wang,Xiaoguang Wang. Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model[J]. 数据分析与知识发现, 2019, 3(6): 66-74.
[11] Yue Yuan,Dongbo Wang,Shuiqing Huang,Bin Li. The Comparative Study of Different Tagging Sets on Entity Extraction of Classical Books[J]. 数据分析与知识发现, 2019, 3(3): 57-65.
[12] Meishan Chen,Chenxi Xia. Identifying Entities of Online Questions from Cancer Patients Based on Transfer Learning[J]. 数据分析与知识发现, 2019, 3(12): 61-69.
[13] Lianjie Xiao,Tao Meng,Wei Wang,Zhixiang Wu. Entity Recognition of Intelligence Method Based on Deep Learning: Taking Area of Security Intelligence for Example[J]. 数据分析与知识发现, 2019, 3(10): 20-28.
[14] Li Yu,Li Qian,Changlei Fu,Huaming Zhao. Extracting Fine-grained Knowledge Units from Texts with Deep Learning[J]. 数据分析与知识发现, 2019, 3(1): 38-45.
[15] Mu Dongmei,Jin Shan,Ju Yuanhong. Finding Association Between Diseases and Genes from Literature Abstracts[J]. 数据分析与知识发现, 2018, 2(8): 98-106.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn