Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (2/3): 165-172    DOI: 10.11925/infotech.2096-3467.2019.0640
Current Issue | Archive | Adv Search |
Reconstructing Tour Routes Based on Travel Notes
Gao Yuan1,Shi Yuanlei2,Zhang Lei2,Cao Tianyi2,Feng Jun2()
1School of Economics and Management, Northwest University, Xi’an 710127, China
2School of Information Science and Technology, Northwest University, Xi’an 710127, China
Download: PDF (902 KB)   HTML ( 13
Export: BibTeX | EndNote (RIS)      

[Objective] This study tries to reconstruct tourists’ itineraries based on their travel notes and scenic information.[Methods] Firstly, we combined the TF-IDF and Word2Vec models. Then, we built a recognition method for named entities based on text similarity, which helped us identify scenic spots from travel notes. Finally, we proposed a model based on Markov property, prior knowledge and spatial characteristics to reconstruct tour itineraries.[Results] The recall, precision and F1 index values of the proposed method were 90.72%, 89.65%, and 0.9018, which were all better than those of the methods based on Conditional Random Field. The degree of similarity between the reconstructed routes and the actual ones was 83.27%.[Limitations] The completeness of scenic information might impact the performance of our model.[Conclusions] The proposed method can automatically identify scenic spots, and reconstruct travel itinerary effectively.

Key wordsNamed Entity Recognition      Text Similarity      Markov Property      Travel Reconfiguration     
Received: 10 June 2019      Published: 26 April 2020
ZTFLH:  TP393  
Corresponding Authors: Jun Feng     E-mail:

Cite this article:

Gao Yuan,Shi Yuanlei,Zhang Lei,Cao Tianyi,Feng Jun. Reconstructing Tour Routes Based on Travel Notes. Data Analysis and Knowledge Discovery, 2020, 4(2/3): 165-172.

URL:     OR

The Framework of Automatic Reconstruction Method of Tourist Itinerary
Technology Roadmap of Attractions Identification Method
Word2Vec Model
景点名称 tf-idf 景点名称 tf-idf
定西玉湖公园 1.54 拉卜楞寺 3.13
西岩寺 1.32 嘉峪关关城 0.58
米拉日巴佛阁 2.96 悬壁长城 0.67
郎木寺 2.35 博罗转井 2.82
尕海湖 0.98 雅丹国家地质公园 0.39
tf-idf Values for Attractions in the Travel Notes
The Relationship Between Precision and Similarity
The Relationship Between Number of Recognition Errors and Similarity
方法 平均查全率 平均查准率 F值
条件随机场 81.38% 75.33% 0.782 4
本文方法 90.72% 89.65% 0.901 8
Attractions Recognition Results
[1] 张晓艳, 王挺, 陈火旺 . 命名实体识别研究[J]. 计算机科学, 2005,32(4):44-48.
[1] ( Zhang Xiaoyan, Wang Ting, Chen Huowang . Research on Named Entity Recognition[J]. Computer Science, 2005,32(4):44-48.)
[2] Phithakkitnukoon S, Horanont T, Witayangkurn A , et al. Understanding Tourist Behavior Using Large-Scale Mobile Sensing Approach: A Case Study of Mobile Phone Users in Japan[J]. Pervasive and Mobile Computing, 2015,18:18-39.
[3] Budig B, Van Dijk T C . Journeys of the Past: A Hidden Markov Approach to Georeferencing Historical Itineraries[C]// Proceedings of the 11th Workshop on Geographic Information Retrieval. ACM, 2017: Article No. 7.
[4] Blank D, Henrich A . Geocoding Place Names from Historic Route Descriptions[C]// Proceedings of the 9th Workshop on Geographic Information Retrieval. ACM, 2015: Article No. 9.
[5] Blank D, Henrich A . A Depth-First Branch-and-Bound Algorithm for Geocoding Historic Itinerary Tables[C]// Proceedings of the 10th Workshop on Geographic Information Retrieval. ACM, 2016: Article No. 3.
[6] Adelfio M D, Samet H . Itinerary Retrieval: Travelers, Like Traveling Salesmen, Prefer Efficient Routes[C]// Proceedings of the 8th Workshop on Geographic Information Retrieval. ACM, 2014: Article No. 1.
[7] Zhou J, Li B, Chen G . Automatically Building Large-Scale Named Entity Recognition Corpora from Chinese Wikipedia[J]. Frontiers of Information Technology & Electronic Engineering, 2015,16(11):940-956.
[8] 张玥杰, 徐智婷, 薛向阳 . 融合多特征的最大熵汉语命名实体识别模型[J]. 计算机研究与发展, 2008,45(6):1004-1010.
[8] ( Zhang Yuejie, Xu Zhiting, Xue Xiangyang . Fusion of Multiple Features for Chinese Named Entity Recognition Based on Maximum Entropy Model[J]. Journal of Computer Research and Development, 2008,45(6):1004-1010.)
[9] 康才畯, 龙从军, 江荻 . 基于条件随机场的藏文人名识别研究[J]. 计算机工程与应用, 2015,51(3):109-111, 185.
[9] ( Kang Caijun, Long Congjun, Jiang Di . Tibetan Names Recognition Research Based on CRF[J]. Computer Engineering & Applications, 2015,51(3):109-111, 185.)
[10] 何炎祥, 罗楚威, 胡彬尧 . 基于CRF和规则相结合的地理命名实体识别方法[J]. 计算机应用与软件, 2015,32(1):179-185, 202.
[10] ( He Yanxiang, Luo Chuwei, Hu Binyao . Geographic Entity Recognition Method Based on CRF Model and Rules Combination[J]. Computer Applications and Software, 2015,32(1):179-185,202.)
[11] 张永富, 李志宏, 李军军 , 等. 一种基于自然语言处理的环境科学命名实体识别方法[J]. 科技创新导报, 2017,14(21):120-121.
[11] ( Zhang Yongfu, Li Zhihong, Li Junjun , et al. A Named Entity Recognition Method for Environmental Science Based on Natural Language Processing[J]. Science and Technology Innovation Herald, 2017,14(21):120-121.)
[12] Southall H, Mostern R, Berman M L . On Historical Gazetteers[J]. International Journal of Humanities and Arts Computing, 2011,5(2):127-145.
[13] Jordan P . Placing Names: Enriching and Integrating Gazetteers[J]. The Cartographic Journal, 2017,54(4):377-379.
[14] Melo F, Martins B . Automated Geocoding of Textual Documents: A Survey of Current Approaches[J]. Transactions in GIS, 2017,21(1):3-38.
[15] Khan A, Vasardani M, Winter S . Extracting Spatial Information from Place Descriptions [C]// Proceedings of the 1st ACM SIGSPATIAL International Workshop on Computational Models of Place. 2013: 62-69.
[16] Newson P, Krumm J . Hidden Markov Map Matching Through Noise and Sparseness [C]// Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. 2009: 336-343.
[17] Moncla L, Gaio M, Noguerasiso J , et al. Reconstruction of Itineraries from Annotated Text with an Informed Spanning Tree Algorithm[J]. International Journal of Geographical Information Science, 2016,30(6):1137-1160.
[18] Moncla L, Renteria-Agualimpia W, Noguerasiso J , et al. Geocoding for Texts with Fine-Grain Toponyms: An Experiment on a Geoparsed Hiking Descriptions Corpus [C]// Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. 2014: 183-192.
[19] Salton G, Buckley C . Term-weighting Approaches in Automatic Text Retrieval[J]. Information Processing & Management, 1988,24(5):513-523.
[20] 武永亮, 赵书良, 李长镜 , 等. 基于TF-IDF和余弦相似度的文本分类方法[J]. 中文信息学报, 2017,31(5):138-145.
[20] ( Wu Yongliang, Zhao Shuliang, Li Changjing , et al. Text Classification Method Based on TF-IDF and Cosine Similarity[J]. Journal of Chinese Information Processing, 2017,31(5):138-145.)
[21] Niu K, Zhang H, Zhou T , et al. A Novel Spatio-Temporal Model for City-Scale Traffic Speed Prediction[J]. IEEE Access, 2019,7:30050-30057.
[1] Xu Chenfei, Ye Haiying, Bao Ping. Automatic Recognition of Produce Entities from Local Chronicles with Deep Learning[J]. 数据分析与知识发现, 2020, 4(8): 86-97.
[2] Ma Jianxia,Yuan Hui,Jiang Xiang. Extracting Name Entities from Ecological Restoration Literature with Bi-LSTM+CRF[J]. 数据分析与知识发现, 2020, 4(2/3): 78-88.
[3] Liu Jingru,Song Yang,Jia Rui,Zhang Yipeng,Luo Yong,Ma Jingdong. A BiLSTM-CRF Model for Protected Health Information in Chinese[J]. 数据分析与知识发现, 2020, 4(10): 124-133.
[4] Han Huang,Hongyu Wang,Xiaoguang Wang. Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model[J]. 数据分析与知识发现, 2019, 3(6): 66-74.
[5] Meishan Chen,Chenxi Xia. Identifying Entities of Online Questions from Cancer Patients Based on Transfer Learning[J]. 数据分析与知识发现, 2019, 3(12): 61-69.
[6] Li Yu,Li Qian,Changlei Fu,Huaming Zhao. Extracting Fine-grained Knowledge Units from Texts with Deep Learning[J]. 数据分析与知识发现, 2019, 3(1): 38-45.
[7] Tang Huihui,Wang Hao,Zhang Zixuan,Wang Xueying. Extracting Names of Historical Events Based on Chinese Character Tags[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[8] Li Lin,Li Hui. Computing Text Similarity Based on Concept Vector Space[J]. 数据分析与知识发现, 2018, 2(5): 48-58.
[9] Fan Xinyue,Cui Lei. Using Text Mining to Discover Drug Side Effects: Case Study of PubMed[J]. 数据分析与知识发现, 2018, 2(3): 79-86.
[10] Chen Erjing,Jiang Enbo. Review of Studies on Text Similarity Measures[J]. 数据分析与知识发现, 2017, 1(6): 1-11.
[11] Bai Rujiang,Leng Fuhai,Liao Junhua. An Improved Cosine Text Similarity Computing Method Based on Semantic Chunk Feature[J]. 数据分析与知识发现, 2017, 1(6): 56-64.
[12] Guo Xu,Qi Ruihua. Using Non-standard Text Features to Identify Authors[J]. 现代图书情报技术, 2016, 32(11): 27-33.
[13] Sui Mingshuang,Cui Lei. Extracting Chemical and Disease Named Entities with Multiple-Feature CRF Model[J]. 现代图书情报技术, 2016, 32(10): 91-97.
[14] Yang Zhimo, Liu Huailiang, Zhao Hui. An Algorithm of Chinese Text Representation Based on Complex Network[J]. 现代图书情报技术, 2014, 30(11): 38-44.
[15] Wang Run,He Lin,Wang Dongbo,Huang Shuiqing,Fan Yuanbiao. Research on Plant Growth and Development Stage Named Entity Recognition for Text Mining[J]. 现代图书情报技术, 2014, 30(1): 24-27.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938