Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (5): 118-126    DOI: 10.11925/infotech.2096-3467.2019.0907
Current Issue | Archive | Adv Search |
Identifying Scenic Spot Entities Based on Improved Knowledge Transfer
Zhao Ping1,Sun Lianying2(),Tu Shuai1,Bian Jianling3,Wan Ying1
1Smart City College, Beijing UnionUniversity, Beijing 100101, China
2College of Urban Rail Transit and Logistics, Beijing Union University, Beijing 100101, China
3Beijing China-Power Information Technology Co., LTD, Beijing 100192, China
Download: PDF(849 KB)   HTML ( 12
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper addresses the issues facing labeled data in the recognition of scenic spots.[Methods] We proposed an improved knowledge transfer algorithm for entity recognition and used datasets from the People’s Daily to evaluate our new model.[Results] Our method’s accuracy was 1.62% higher than the model using all labeled data.[Limitations] More research is needed to examine the expansion of samples.[Conclusions] The proposed method uses less labeled data in entity recognition and provides better technical support for tourism recommendation.

Key wordsTransfer Learning      BERT      Conditional Random Fields      Scenery Spot Recognition     
Received: 05 August 2019      Published: 15 June 2020
ZTFLH:  TP393  
Corresponding Authors: Sun Lianying     E-mail: sunlychina@163.com

Cite this article:

Zhao Ping,Sun Lianying,Tu Shuai,Bian Jianling,Wan Ying. Identifying Scenic Spot Entities Based on Improved Knowledge Transfer. Data Analysis and Knowledge Discovery, 2020, 4(5): 118-126.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2019.0907     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I5/118

BBC Entity Recognition Model
Algorithm Structure
数量(个)
实体数 74 430
非实体数 457 040
总量 531 470
Data Distribution
词性 标注
r O
t O
t O
v O
ul O
n B-SE
n I-SE
Labeling Examples
Data Characteristics
方法 P R F1
CRF 86.67% 87.84% 87.25%
BiLSTM 93.25% 87.98% 90.53%
BiLSTM+CRF 94.97% 92.10% 93.52%
BBC 96.79% 96.85% 96.74%
Model Layer Verification
i P R F1
0.40 84.64% 64.01% 72.89%
0.45 87.26% 69.28% 77.24%
0.50 90.93% 53.85% 67.64%
0.55 93.14% 55.42% 69.49%
0.60 91.41% 55.74% 69.25%
Experimental Results with Different Values of i
simsen P R F1
0.40 89.01% 56.07% 68.80%
0.45 91.30% 57.79% 70.78%
0.50 92.05% 58.16% 71.28%
0.55 91.03% 55.99% 69.33%
0.60 90.81% 56.50% 69.66%
Experimental Results with Different Values of simsen
SEA P R F1
0.40 87.26% 79.28% 83.07%
0.45 90.93% 83.85% 87.24%
0.50 93.14% 85.42% 89.11%
0.55 91.41% 85.74% 88.48%
0.60 90.81% 83.50% 87.00%
Experimental Results with Different Values of SEA
μ P R F1
1/5 93.14% 85.42% 89.11%
1/4 95.06% 82.12% 88.12%
1/3 97.91% 89.15% 93.30%
1/2 98.41% 88.09% 92.97%
Experimental Results with Different Values of μ
模型 μ P R F1
BBC 1 96.79% 96.85% 96.74%
1/5 93.14% 85.42% 89.11%
AttTrBBC 1/4 95.06% 82.12% 88.12%
1/3 97.91% 89.15% 93.30%
1/2 98.41% 88.09% 92.97%
Comparison BetweenAll Annotations and a Few Annotations
方法 P R F1
HMM[11] 85.49% 90.14% 87.75%
CRF[10] 83.40% 95.70% 89.10%
CNN[12] 95.03% 92.80% 93.90%
AttTrBBC 98.41% 88.09% 92.97%
Comparative Analysis of Four Methods
[1] Grishman R, Sundheim B . Message Understanding Conference-6:A Brief History [C]// Proceedings of the 16th International Conference on Computational Linguistics, Copenhagen, Denmark. Stroudsburg, PA: ACL, 1996: 466-471.
[2] Hanisch D, Fundel K, Mevissen H T, et al. ProMiner: Rule-based Protein and Gene Entity Recognition[J]. BMC Bioinformatics, 2005,6(1):S14.
[3] Lample G, Ballesteros M, Subramanian S , et al. Neural Architectures for Named Entity Recognition [C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, USA. Stroudsburg, PA: ACL, 2016: 260-270.
[4] Dong C, Zhang J, Zong C , et al. Character-based LSTM-CRF with Radical-level Features for Chinese Named Entity Recognition [C]// Proceedings of the Natural Language Understanding and Intelligent Applications,Kunming, China. Berlin, German:Springer, 2016: 239-250.
[5] Patil N V, Patil A S, Pawar B V . HMM Based Named Entity Recognition for Inflectional Language [C]// Proceedings of the 2017 International Conference on Computer, Communications and Electronics,Jaipur, India. Piscataway, NJ: IEEE, 2017: 565-572.
[6] 薛征山, 郭剑毅, 余正涛, 等. 基于HMM的中文旅游景点的识别[J]. 昆明理工大学学报:理工版, 2009,34(6):44-48.
[6] ( Xue Zhengshan, Guo Jianyi, Yu Zhengtao, et al. Recognition of HMM-Based Chinese Tourist Attractions[J]. Journal of Kunming University of Science and Technology:Science and Technology, 2009,34(6):44-48.)
[7] 郭剑毅, 薛征山, 余正涛, 等. 基于层叠条件随机场的旅游领域命名实体识别[J]. 中文信息学报, 2009,23(5):47-52.
[7] ( Guo Jianyi, Xue Zhengshan, Yu Zhengtao, et al. Named Entity Recognition for the Tourism Domain Based on Cascaded Conditional Random Fields[J]. Journal of Chinese Information Processing, 2009,23(5):47-52.)
[8] Chiu J P C, Nichols E. Named Entity Recognition with Bidirectional LSTM-CNNs[J]. Transactions of the Association for Computational Linguistics, 2016,4:357-370.
[9] 黄菡, 王宏宇, 王晓光. 结合主动学习的条件随机场模型用于法律术语的自动识别[J]. 数据分析与知识发现, 2019,3(6):66-74.
[9] ( Huang Han, Wang Hongyu, Wang Xiaoguang. Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model[J]. Data Analysis and Knowledge Discovery, 2019,3(6):66-74.)
[10] Greenberg N, Bansal T, Verga P , et al. Marginal Likelihood Training of BiLSTM-CRF for Biomedical Named Entity Recognition from Disjoint Label Sets [C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium. Stroudsburg, PA: ACL, 2018: 2824-2829.
[11] 刘小安, 彭涛.基于卷积神经网络的中文景点识别研究[J/OL].计算机工程与应用.[ 2019- 08- 01]. http://kns.cnki.net/kcms/detail/11.2127.TP.20190307.1807.007.html.
[11] ( Liu Xiaoan, Peng Tao. Research on Chinese Scenic Spot Named Entity Recognition Based on Convolutional Neural Network[J/OL]. Computer Engineering and Applications.[ 2019- 08- 01]. http://kns.cnki.net/kcms/detail/11.2127.TP.20190307.1807.007.html.)
[12] Devlin J, Chang M W, Lee K , et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, USA. Stroudsburg, PA: ACL, 2019: 4171-4186.
[13] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997,9(8):1735-1780.
[14] Sutton C, McCallum A. An Introduction to Conditional Random Fields[J]. Foundations and Trends® in Machine Learning, 2012,4(4):267-373.
[15] Peng D L, Wang Y R, Liu C, et al. TL-NER: A Transfer Learning Model for Chinese Named Entity Recognition[J]. Information Systems Frontiers, 2019. https://doi.org/10.1007/s10796-019-09932-y.
[16] Gomaa W H, Fahmy A A. A Survey of Text Similarity Approaches[J]. International Journal of Computer Applications, 2013,68(13):13-18.
[17] Zhang W, Yoshida T, Tang X. A Comparative Study of TF*IDF, LSI and Multi-Words for Text Classification[J]. Expert Systems with Applications, 2011,38(3):2758-2765.
[18] 俞士汶, 段慧明, 吴云芳.现代汉语多级加工语料库[DS/OL].[ 2019- 01- 03]. http://dx.doi.org/10.18170/DVN/SEYRX5.
[18] ( Yu Shiwen, Duan Huiming, Wu Yunfang. Corpus of Multi-Level Processing for Modern Chinese[DS/OL]. [ 2019- 01- 03]. http://dx.doi.org/10.18170/DVN/SEYRX5.)
[1] Zhang Dongyu,Cui Zijuan,Li Yingxia,Zhang Wei,Lin Hongfei. Identifying Noun Metaphors with Transformer and BERT[J]. 数据分析与知识发现, 2020, 4(4): 100-108.
[2] Liu Tong,Ni Weijian,Sun Yujian,Zeng Qingtian. Predicting Remaining Business Time with Deep Transfer Learning[J]. 数据分析与知识发现, 2020, 4(2/3): 134-142.
[3] Xiang Fei,Xie Yaotan. Recognition Model of Patient Reviews Based on Mixed Sampling and Transfer Learning[J]. 数据分析与知识发现, 2020, 4(2/3): 39-47.
[4] Meishan Chen,Chenxi Xia. Identifying Entities of Online Questions from Cancer Patients Based on Transfer Learning[J]. 数据分析与知识发现, 2019, 3(12): 61-69.
[5] Jiehua Wu,Jing Shen,Bei Zhou. Classifying Multilayer Social Network Links Based on Transfer Component Analysis[J]. 数据分析与知识发现, 2018, 2(9): 88-99.
[6] Huihui Tang,Hao Wang,Zixuan Zhang,Xueying Wang. Extracting Names of Historical Events Based on Chinese Character Tags[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[7] Chuanming Yu,Bolin Feng,Lu An. Sentiment Analysis in Cross-Domain Environment with Deep Representative Learning[J]. 数据分析与知识发现, 2017, 1(7): 73-81.
[8] Xiaoyu Wang,Bin Li. Automatically Segmenting Middle Ancient Chinese Words with CRFs[J]. 数据分析与知识发现, 2017, 1(5): 62-70.
[9] Dongbo Wang,Yi Wu,Wenhao Ye,Ruilun Liu. Extracting Events of Food Safety Emergencies with Characteristics Knowledge[J]. 数据分析与知识发现, 2017, 1(3): 54-61.
[10] He Huixin,Liu Lijuan. A Scientific Research Object Labeling System Based on Active earning[J]. 现代图书情报技术, 2016, 32(3): 67-73.
[11] Jiang Chuntao. Automatic Annotation of Bibliographical References in Chinese Patent Documents[J]. 现代图书情报技术, 2015, 31(10): 81-87.
[12] He Yu, Lv Xueqiang, Xu Liping. A Chinese Term Extraction System in New Energy Vehicles Domain[J]. 现代图书情报技术, 2015, 31(10): 88-94.
[13] Xia Lixin, Cai Xin, Shi Yijin, Sun Danxia, Wang Zhongyi. Organization and Visualization of Web Life Service Information Research[J]. 现代图书情报技术, 2014, 30(4): 85-91.
[14] Zeng Zhen, Lv Xueqiang, Li Zhuo. The Automatic Identification of Chinese Names in Query Logs[J]. 现代图书情报技术, 2014, 30(12): 71-77.
[15] Zhang Zhiwu. Sentiment Analysis of Product Reviews by means of Cross-domain Transfer Learning[J]. 现代图书情报技术, 2013, (6): 49-54.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn