Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (12): 68-75    DOI: 10.11925/infotech.2096-3467.2020.0400
Current Issue | Archive | Adv Search |
Automatic Extraction of Traditional Music Terms of Intangible Cultural Heritage
Liu Liu(),Qin Tianyun,Wang Dongbo
College of Information Management, Nanjing Agricultural University, Nanjing 210095, China
Download: PDF (580 KB)   HTML ( 14
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] Focus on the task of entity recognition of traditional music terms of intangible cultural heritage. [Methods] This research constructed a corpus of national intangible cultural heritage projects based on the China Intangible Cultural Heritage Network, and built an entity recognition framework on traditional music terms based on the CRF, LSTM, LSTM-CRF, and BERT. [Results] According to the performance comparison, the BERT model for recognition of traditional music terms had achieved a better result, with an average F1 value of 91.77%. [Limitations] This study only extract unique terms, and the training set is small. [Conclusions] The entity recognition model constructed by BERT is a valid model for automatically extracting traditional musical terms of intangible cultural heritage. It can provide a reliable reference for the related research of intangible cultural heritage.

Key wordsIntangible Cultural Heritage      Digital Humanities      BERT      Term Extraction      Entity Recognition     
Received: 08 May 2020      Published: 25 December 2020
ZTFLH:  TP391  
Corresponding Authors: Liu Liu     E-mail: liuliu@njau.edu.cn

Cite this article:

Liu Liu,Qin Tianyun,Wang Dongbo. Automatic Extraction of Traditional Music Terms of Intangible Cultural Heritage. Data Analysis and Knowledge Discovery, 2020, 4(12): 68-75.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0400     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I12/68

实体类别 类别标记 示例 数量
非遗名称 ICH-TITLE 江河号子、老河口丝弦、
回族宴席曲
1 518
独有术语 ICH-TERM 横抱琵琶、酒歌、六字真言歌 8 881
传承人姓名 ICH-INHERITOR 陈子敬、沈浩初、沈肇州 217
地名 ICH-PLACE 青海省玉树藏族自治州、卫藏地区、安多 2 772
作品名 ICH-WORKS 《对歌》、《口提哥哥》、
《英雄的村庄》
1 396
工具器具名 ICH-INST 琵琶、笛子、唢呐、小管 2 122
The Summary of Traditional Music Terminology
测试编号 准确率(%) 召回率(%) F1值(%)
1 94.94 84.29 89.30
2 94.59 85.10 89.59
3 94.32 89.02 91.59
4 93.92 86.96 90.31
5 96.12 85.64 90.58
6 95.03 91.55 93.26
7 95.71 84.55 89.78
8 96.77 85.27 90.66
9 96.34 86.23 91.00
10 95.31 89.71 92.42
宏平均 95.31 86.83 90.85
Recognition Results of Traditional Music Entities Based on CRF
测试编号 准确率(%) 召回率(%) F1值(%)
1 84.12 84.12 84.12
2 83.38 85.80 84.57
3 87.80 88.67 88.24
4 84.32 84.54 84.43
5 88.22 81.87 84.92
6 83.52 85.63 84.56
7 89.02 84.75 86.83
8 85.68 84.58 85.12
9 84.66 86.68 85.62
10 86.90 89.46 88.16
宏平均 85.76 85.61 85.66
Recognition Results of Traditional Music Entities Based on LSTM
测试编号 准确率(%) 召回率(%) F1值(%)
1 86.14 86.35 86.25
2 84.96 88.41 86.65
3 87.77 90.15 88.94
4 88.42 86.60 87.50
5 90.96 85.87 88.34
6 85.68 90.99 88.25
7 89.76 84.18 86.88
8 91.29 83.55 87.25
9 91.79 83.99 87.71
10 87.53 91.18 89.32
宏平均 88.43 87.13 87.71
Recognition Results of Traditional Music Entities Based on LSTM-CRF
测试编号 准确率(%) 召回率(%) F1值(%)
1 87.96 93.07 90.44
2 89.30 92.15 90.70
3 91.73 95.44 93.55
4 89.35 92.47 90.89
5 91.60 93.82 92.70
6 89.22 94.03 91.56
7 94.48 93.12 93.80
8 91.27 92.99 92.12
9 89.25 90.13 89.69
10 92.73 91.81 92.27
宏平均 90.69 92.90 91.77
Recognition Results of Traditional Music Entities Based on BERT
模型 平均准确率(%) 平均召回率(%) 平均F1值(%)
CRF 95.31 86.83 90.85
LSTM 85.76 85.61 85.66
LSTM-CRF 88.43 87.13 87.71
BERT 90.69 92.90 91.77
The Average Results of Four Models
[1] 刘浏, 王东波 . 命名实体识别研究综述[J]. 情报学报, 2018,37(3):329-340.
[1] ( Liu Liu, Wang Dongbo . A Review on Named Entity Recognition[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(3):329-340.)
[2] 刘知远, 孙茂松, 林衍凯 , 等. 知识表示学习研究进展[J]. 计算机研究与发展, 2016,53(2):247-261.
[2] ( Liu Zhiyuan, Sun Maosong, Lin Yankai , et al. Knowledge Representation Learning: A Review[J]. Journal of Computer Research and Development, 2016,53(2):247-261.)
[3] 徐增林, 盛泳潘, 贺丽荣 , 等. 知识图谱技术综述[J]. 电子科技大学学报, 2016,45(4):589-606.
[3] ( Xu Zenglin, Sheng Yongpan, He Lirong , et al. Review on Knowledge Graph Techniques[J]. Journal of University of Electronic Science and Technology of China, 2016,45(4):589-606.)
[4] 赖英旭, 李亚娟, 刘静 . 基于本体的水稻育种方法应用知识库构建[J]. 北京工业大学学报, 2019,45(12):1181-1191.
[4] ( Lai Yingxu, Li Yajuan, Liu Jing . Construction of Ontology-based Rice Breeding Method Knowledge Base[J]. Journal of Beijing University of Technology, 2019,45(12):1181-1191.)
[5] 王东波, 高瑞卿, 沈思 , 等. 面向先秦典籍的历史事件基本实体构件自动识别研究[J]. 国家图书馆学刊, 2018,27(1):65-77.
[5] ( Wang Dongbo, Gao Ruiqing, Shen Si , et al. Research on Automatic Recognition of Basic Entity Component of Historic Events for Pre-Qin Classics[J]. Journal of the National Library of China, 2018,27(1):65-77.)
[6] 殷章志, 李欣子, 黄德根 , 等. 融合字词模型的中文命名实体识别研究[J]. 中文信息学报, 2019,33(11):95-100, 106.
[6] ( Yin Zhangzhi, Li Xinzi, Huang Degen , et al. Chinese Named Entity Recognition Ensembled with Character[J]. Journal of Chinese Information Processing, 2019,33(11):95-100, 106.)
[7] 王子牛, 姜猛, 高建瓴 , 等. 基于BERT的中文命名实体识别方法[J]. 计算机科学, 2019,46(S2):138-142.
[7] ( Wang Ziniu, Jiang Meng, Gao Jianling , et al. Chinese Named Entity Recognition Method Based on BERT[J]. Computer Science, 2019,46(S2):138-142.)
[8] 张晓海, 操新文, 张敏 . 基于自注意力机制的军事命名实体识别[J]. 指挥控制与仿真, 2019,41(6):29-33.
[8] ( Zhang Xiaohai, Cao Xinwen, Zhang Min . Military Named Entity Recognition Based on Self-Attention Mechanism[J]. Command Control & Simulation, 2019,41(6):29-33.)
[9] 程钟慧, 陈珂, 陈刚 , 等. 基于强化学习协同训练的命名实体识别方法[J]. 软件工程, 2020,23(1):7-11.
[9] ( Cheng Zhonghui, Chen Ke, Chen Gang , et al. Named Entity Recognition Method Based on Co-training of Reinforcement Learning[J]. Software Engineering, 2020,23(1):7-11.)
[10] 曹依依, 周应华, 申发海 , 等. 基于CNN-CRF的中文电子病历命名实体识别研究[J]. 重庆邮电大学学报(自然科学版), 2019,31(6):869-875.
[10] ( Cao Yiyi, Zhou Yinghua, Shen Fahai , et al. Research on Named Entity Recognition of Chinese Electronic Medical Record Based on CNN-CRF[J]. Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition), 2019,31(6):869-875.)
[11] 王月, 王孟轩, 张胜 , 等. 基于BERT的警情文本命名实体识别[J]. 计算机应用, 2020,40(2):535-540.
[11] ( Wang Yue, Wang Mengxuan, Zhang Sheng , et al. Alarm Text Named Entity Recognition Based on BERT[J]. Journal of Computer Applications, 2020,40(2):535-540.)
[12] 李妮, 关焕梅, 杨飘 , 等. 基于BERT-IDCNN-CRF的中文命名实体识别方法[J]. 山东大学学报(理学版), 2020,55(1):102-109.
[12] ( Li Ni, Guan Huanmei, Yang Piao , et al. BERT-IDCNN-CRF for Named Entity Recognition in Chinese[J]. Journal of Shandong University (Natural Science), 2020,55(1):102-109.)
[13] 黄永林, 谈国新 . 中国非物质文化遗产数字化保护与开发研究[J]. 华中师范大学学报(人文社会科学版), 2012,51(2):49-55.
[13] ( Huang Yonglin, Tan Guoxin . Research on Digital Protection and Development of China’s Intangible Cultural Heritage[J]. Journal of Huazhong Normal University (Humanities and Social Sciences), 2012,51(2):49-55.)
[14] 黄永林 . 数字化背景下非物质文化遗产的保护与利用[J]. 文化遗产, 2015(1):1-10, 157.
[14] ( Huang Yonglin . The Protection and Utilization of Intangible Cultural Heritage Under the Digital Background[J]. Cultural Heritage, 2015(1):1-10, 157.)
[15] 侯西龙, 谈国新, 庄文杰 , 等. 基于关联数据的非物质文化遗产知识管理研究[J]. 中国图书馆学报, 2019,45(2):88-108.
[15] ( Hou Xilong, Tan Guoxin, Zhuang Wenjie , et al. Research on Knowledge Management of Intangible Cultural Heritage Based on Linked Data[J]. Journal of Library Science in China, 2019,45(2):88-108.)
[16] 宋俊华 . 关于非物质文化遗产数字化保护的几点思考[J]. 文化遗产, 2015(2):1-8, 157.
[16] ( Song Junhua . Some Thoughts on Digital Protection of Intangible Cultural Heritage[J]. Cultural Heritage, 2015(2):1-8, 157.)
[17] Lafferty J, Mc Calluma, Prreira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data [C]//Proceedings of the 18th International Conference on Machine Learning. San Francisco: Margan Kaufmann, 2001: 282-289.
[18] Hochreiter S, Schmidhuber J . Long Short-term Memory[J]. Neural Computation, 1997,9(8):1735-1780.
pmid: 9377276
[19] Graves A, Mohamed A, Hinton G. Speech Recognition with Deep Recurrent Neural Networks [C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. 2013: 6645-6649.
[20] Huang Z, Xu W, Yu K . Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508.01991.
[21] Devlin J, Chang M W, Lee K . Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
[1] Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[3] Ma Jiangwei, Lv Xueqiang, You Xindong, Xiao Gang, Han Junmei. Extracting Relationship Among Military Domains with BERT and Relation Position Features[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
[4] Li Wenna, Zhang Zhixiong. Entity Alignment Method for Different Knowledge Repositories with Joint Semantic Representation[J]. 数据分析与知识发现, 2021, 5(7): 1-9.
[5] Wang Hao, Lin Kerou, Meng Zhen, Li Xinlei. Identifying Multi-Type Entities in Legal Judgments with Text Representation and Feature Generation[J]. 数据分析与知识发现, 2021, 5(7): 10-25.
[6] Yu Xuehan, He Lin, Xu Jian. Extracting Events from Ancient Books Based on RoBERTa-CRF[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
[7] Lu Quan, He Chao, Chen Jing, Tian Min, Liu Ting. A Multi-Label Classification Model with Two-Stage Transfer Learning[J]. 数据分析与知识发现, 2021, 5(7): 91-100.
[8] Liu Wenbin, He Yanqing, Wu Zhenfeng, Dong Cheng. Sentence Alignment Method Based on BERT and Multi-similarity Fusion[J]. 数据分析与知识发现, 2021, 5(7): 48-58.
[9] Yin Pengbo,Pan Weimin,Zhang Haijun,Chen Degang. Identifying Clickbait with BERT-BiGA Model[J]. 数据分析与知识发现, 2021, 5(6): 126-134.
[10] Song Ruoxuan,Qian Li,Du Yu. Identifying Academic Creative Concept Topics Based on Future Work of Scientific Papers[J]. 数据分析与知识发现, 2021, 5(5): 10-20.
[11] Chang Chengyang,Wang Xiaodong,Zhang Shenglei. Polarity Analysis of Dynamic Political Sentiments from Tweets with Deep Learning Method[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[12] Hu Haotian,Ji Jinfeng,Wang Dongbo,Deng Sanhong. An Integrated Platform for Food Safety Incident Entities Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
[13] Zhang Qi,Jiang Chuan,Ji Youshu,Feng Minxuan,Li Bin,Xu Chao,Liu Liu. Unified Model for Word Segmentation and POS Tagging of Multi-Domain Pre-Qin Literature[J]. 数据分析与知识发现, 2021, 5(3): 2-11.
[14] Wang Qian,Wang Dongbo,Li Bin,Xu Chao. Deep Learning Based Automatic Sentence Segmentation and Punctuation Model for Massive Classical Chinese Literature[J]. 数据分析与知识发现, 2021, 5(3): 25-34.
[15] Dong Miao, Su Zhongqi, Zhou Xiaobei, Lan Xue, Cui Zhigang, Cui Lei. Improving PubMedBERT for CID-Entity-Relation Classification Using Text-CNN[J]. 数据分析与知识发现, 2021, 5(11): 145-152.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn