Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (5): 118-126    DOI: 10.11925/infotech.2096-3467.2019.0907
Current Issue | Archive | Adv Search |
Identifying Scenic Spot Entities Based on Improved Knowledge Transfer
Zhao Ping1,Sun Lianying2(),Tu Shuai1,Bian Jianling3,Wan Ying1
1Smart City College, Beijing UnionUniversity, Beijing 100101, China
2College of Urban Rail Transit and Logistics, Beijing Union University, Beijing 100101, China
3Beijing China-Power Information Technology Co., LTD, Beijing 100192, China
Download: PDF (849 KB)   HTML ( 17
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper addresses the issues facing labeled data in the recognition of scenic spots.[Methods] We proposed an improved knowledge transfer algorithm for entity recognition and used datasets from the People’s Daily to evaluate our new model.[Results] Our method’s accuracy was 1.62% higher than the model using all labeled data.[Limitations] More research is needed to examine the expansion of samples.[Conclusions] The proposed method uses less labeled data in entity recognition and provides better technical support for tourism recommendation.

Key wordsTransfer Learning      BERT      Conditional Random Fields      Scenery Spot Recognition     
Received: 05 August 2019      Published: 15 June 2020
ZTFLH:  TP393  
Corresponding Authors: Sun Lianying     E-mail: sunlychina@163.com

Cite this article:

Zhao Ping,Sun Lianying,Tu Shuai,Bian Jianling,Wan Ying. Identifying Scenic Spot Entities Based on Improved Knowledge Transfer. Data Analysis and Knowledge Discovery, 2020, 4(5): 118-126.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2019.0907     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I5/118

BBC Entity Recognition Model
Algorithm Structure
数量(个)
实体数 74 430
非实体数 457 040
总量 531 470
Data Distribution
词性 标注
r O
t O
t O
v O
ul O
n B-SE
n I-SE
Labeling Examples
Data Characteristics
方法 P R F1
CRF 86.67% 87.84% 87.25%
BiLSTM 93.25% 87.98% 90.53%
BiLSTM+CRF 94.97% 92.10% 93.52%
BBC 96.79% 96.85% 96.74%
Model Layer Verification
i P R F1
0.40 84.64% 64.01% 72.89%
0.45 87.26% 69.28% 77.24%
0.50 90.93% 53.85% 67.64%
0.55 93.14% 55.42% 69.49%
0.60 91.41% 55.74% 69.25%
Experimental Results with Different Values of i
simsen P R F1
0.40 89.01% 56.07% 68.80%
0.45 91.30% 57.79% 70.78%
0.50 92.05% 58.16% 71.28%
0.55 91.03% 55.99% 69.33%
0.60 90.81% 56.50% 69.66%
Experimental Results with Different Values of simsen
SEA P R F1
0.40 87.26% 79.28% 83.07%
0.45 90.93% 83.85% 87.24%
0.50 93.14% 85.42% 89.11%
0.55 91.41% 85.74% 88.48%
0.60 90.81% 83.50% 87.00%
Experimental Results with Different Values of SEA
μ P R F1
1/5 93.14% 85.42% 89.11%
1/4 95.06% 82.12% 88.12%
1/3 97.91% 89.15% 93.30%
1/2 98.41% 88.09% 92.97%
Experimental Results with Different Values of μ
模型 μ P R F1
BBC 1 96.79% 96.85% 96.74%
1/5 93.14% 85.42% 89.11%
AttTrBBC 1/4 95.06% 82.12% 88.12%
1/3 97.91% 89.15% 93.30%
1/2 98.41% 88.09% 92.97%
Comparison BetweenAll Annotations and a Few Annotations
方法 P R F1
HMM[11] 85.49% 90.14% 87.75%
CRF[10] 83.40% 95.70% 89.10%
CNN[12] 95.03% 92.80% 93.90%
AttTrBBC 98.41% 88.09% 92.97%
Comparative Analysis of Four Methods
[1] Grishman R, Sundheim B . Message Understanding Conference-6:A Brief History [C]// Proceedings of the 16th International Conference on Computational Linguistics, Copenhagen, Denmark. Stroudsburg, PA: ACL, 1996: 466-471.
[2] Hanisch D, Fundel K, Mevissen H T, et al. ProMiner: Rule-based Protein and Gene Entity Recognition[J]. BMC Bioinformatics, 2005,6(1):S14.
[3] Lample G, Ballesteros M, Subramanian S , et al. Neural Architectures for Named Entity Recognition [C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, USA. Stroudsburg, PA: ACL, 2016: 260-270.
[4] Dong C, Zhang J, Zong C , et al. Character-based LSTM-CRF with Radical-level Features for Chinese Named Entity Recognition [C]// Proceedings of the Natural Language Understanding and Intelligent Applications,Kunming, China. Berlin, German:Springer, 2016: 239-250.
[5] Patil N V, Patil A S, Pawar B V . HMM Based Named Entity Recognition for Inflectional Language [C]// Proceedings of the 2017 International Conference on Computer, Communications and Electronics,Jaipur, India. Piscataway, NJ: IEEE, 2017: 565-572.
[6] 薛征山, 郭剑毅, 余正涛, 等. 基于HMM的中文旅游景点的识别[J]. 昆明理工大学学报:理工版, 2009,34(6):44-48.
[6] ( Xue Zhengshan, Guo Jianyi, Yu Zhengtao, et al. Recognition of HMM-Based Chinese Tourist Attractions[J]. Journal of Kunming University of Science and Technology:Science and Technology, 2009,34(6):44-48.)
[7] 郭剑毅, 薛征山, 余正涛, 等. 基于层叠条件随机场的旅游领域命名实体识别[J]. 中文信息学报, 2009,23(5):47-52.
[7] ( Guo Jianyi, Xue Zhengshan, Yu Zhengtao, et al. Named Entity Recognition for the Tourism Domain Based on Cascaded Conditional Random Fields[J]. Journal of Chinese Information Processing, 2009,23(5):47-52.)
[8] Chiu J P C, Nichols E. Named Entity Recognition with Bidirectional LSTM-CNNs[J]. Transactions of the Association for Computational Linguistics, 2016,4:357-370.
[9] 黄菡, 王宏宇, 王晓光. 结合主动学习的条件随机场模型用于法律术语的自动识别[J]. 数据分析与知识发现, 2019,3(6):66-74.
[9] ( Huang Han, Wang Hongyu, Wang Xiaoguang. Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model[J]. Data Analysis and Knowledge Discovery, 2019,3(6):66-74.)
[10] Greenberg N, Bansal T, Verga P , et al. Marginal Likelihood Training of BiLSTM-CRF for Biomedical Named Entity Recognition from Disjoint Label Sets [C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium. Stroudsburg, PA: ACL, 2018: 2824-2829.
[11] 刘小安, 彭涛.基于卷积神经网络的中文景点识别研究[J/OL].计算机工程与应用.[ 2019- 08- 01]. http://kns.cnki.net/kcms/detail/11.2127.TP.20190307.1807.007.html.
[11] ( Liu Xiaoan, Peng Tao. Research on Chinese Scenic Spot Named Entity Recognition Based on Convolutional Neural Network[J/OL]. Computer Engineering and Applications.[ 2019- 08- 01]. http://kns.cnki.net/kcms/detail/11.2127.TP.20190307.1807.007.html.)
[12] Devlin J, Chang M W, Lee K , et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, USA. Stroudsburg, PA: ACL, 2019: 4171-4186.
[13] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997,9(8):1735-1780.
[14] Sutton C, McCallum A. An Introduction to Conditional Random Fields[J]. Foundations and Trends® in Machine Learning, 2012,4(4):267-373.
[15] Peng D L, Wang Y R, Liu C, et al. TL-NER: A Transfer Learning Model for Chinese Named Entity Recognition[J]. Information Systems Frontiers, 2019. https://doi.org/10.1007/s10796-019-09932-y.
[16] Gomaa W H, Fahmy A A. A Survey of Text Similarity Approaches[J]. International Journal of Computer Applications, 2013,68(13):13-18.
[17] Zhang W, Yoshida T, Tang X. A Comparative Study of TF*IDF, LSI and Multi-Words for Text Classification[J]. Expert Systems with Applications, 2011,38(3):2758-2765.
[18] 俞士汶, 段慧明, 吴云芳.现代汉语多级加工语料库[DS/OL].[ 2019- 01- 03]. http://dx.doi.org/10.18170/DVN/SEYRX5.
[18] ( Yu Shiwen, Duan Huiming, Wu Yunfang. Corpus of Multi-Level Processing for Modern Chinese[DS/OL]. [ 2019- 01- 03]. http://dx.doi.org/10.18170/DVN/SEYRX5.)
[1] Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[3] Ma Jiangwei, Lv Xueqiang, You Xindong, Xiao Gang, Han Junmei. Extracting Relationship Among Military Domains with BERT and Relation Position Features[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
[4] Li Wenna, Zhang Zhixiong. Entity Alignment Method for Different Knowledge Repositories with Joint Semantic Representation[J]. 数据分析与知识发现, 2021, 5(7): 1-9.
[5] Wang Hao, Lin Kerou, Meng Zhen, Li Xinlei. Identifying Multi-Type Entities in Legal Judgments with Text Representation and Feature Generation[J]. 数据分析与知识发现, 2021, 5(7): 10-25.
[6] Yu Xuehan, He Lin, Xu Jian. Extracting Events from Ancient Books Based on RoBERTa-CRF[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
[7] Lu Quan, He Chao, Chen Jing, Tian Min, Liu Ting. A Multi-Label Classification Model with Two-Stage Transfer Learning[J]. 数据分析与知识发现, 2021, 5(7): 91-100.
[8] Liu Wenbin, He Yanqing, Wu Zhenfeng, Dong Cheng. Sentence Alignment Method Based on BERT and Multi-similarity Fusion[J]. 数据分析与知识发现, 2021, 5(7): 48-58.
[9] Yin Pengbo,Pan Weimin,Zhang Haijun,Chen Degang. Identifying Clickbait with BERT-BiGA Model[J]. 数据分析与知识发现, 2021, 5(6): 126-134.
[10] Song Ruoxuan,Qian Li,Du Yu. Identifying Academic Creative Concept Topics Based on Future Work of Scientific Papers[J]. 数据分析与知识发现, 2021, 5(5): 10-20.
[11] Chang Chengyang,Wang Xiaodong,Zhang Shenglei. Polarity Analysis of Dynamic Political Sentiments from Tweets with Deep Learning Method[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[12] Hu Haotian,Ji Jinfeng,Wang Dongbo,Deng Sanhong. An Integrated Platform for Food Safety Incident Entities Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
[13] Wang Qian,Wang Dongbo,Li Bin,Xu Chao. Deep Learning Based Automatic Sentence Segmentation and Punctuation Model for Massive Classical Chinese Literature[J]. 数据分析与知识发现, 2021, 5(3): 25-34.
[14] Dong Miao, Su Zhongqi, Zhou Xiaobei, Lan Xue, Cui Zhigang, Cui Lei. Improving PubMedBERT for CID-Entity-Relation Classification Using Text-CNN[J]. 数据分析与知识发现, 2021, 5(11): 145-152.
[15] Liu Huan,Zhang Zhixiong,Wang Yufei. A Review on Main Optimization Methods of BERT[J]. 数据分析与知识发现, 2021, 5(1): 3-15.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn