Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (2/3): 165-172    DOI: 10.11925/infotech.2096-3467.2019.0640
Reconstructing Tour Routes Based on Travel Notes
Gao Yuan1,Shi Yuanlei2,Zhang Lei2,Cao Tianyi2,Feng Jun2()
1School of Economics and Management, Northwest University, Xi’an 710127, China
2School of Information Science and Technology, Northwest University, Xi’an 710127, China
[Objective] This study tries to reconstruct tourists’ itineraries based on their travel notes and scenic information.[Methods] Firstly, we combined the TF-IDF and Word2Vec models. Then, we built a recognition method for named entities based on text similarity, which helped us identify scenic spots from travel notes. Finally, we proposed a model based on Markov property, prior knowledge and spatial characteristics to reconstruct tour itineraries.[Results] The recall, precision and F1 index values of the proposed method were 90.72%, 89.65%, and 0.9018, which were all better than those of the methods based on Conditional Random Field. The degree of similarity between the reconstructed routes and the actual ones was 83.27%.[Limitations] The completeness of scenic information might impact the performance of our model.[Conclusions] The proposed method can automatically identify scenic spots, and reconstruct travel itinerary effectively.

Key wordsNamed Entity Recognition      Text Similarity      Markov Property      Travel Reconfiguration     
Received: 10 June 2019      Published: 26 April 2020
ZTFLH:  TP393  
Corresponding Authors: Jun Feng



Gao Yuan,Shi Yuanlei,Zhang Lei,Cao Tianyi,Feng Jun. Reconstructing Tour Routes Based on Travel Notes. Data Analysis and Knowledge Discovery, 2020, 4(2/3): 165-172.



The Framework of Automatic Reconstruction Method of Tourist Itinerary
Technology Roadmap of Attractions Identification Method
Word2Vec Model
景点名称 tf-idf 景点名称 tf-idf
定西玉湖公园 1.54 拉卜楞寺 3.13
西岩寺 1.32 嘉峪关关城 0.58
米拉日巴佛阁 2.96 悬壁长城 0.67
郎木寺 2.35 博罗转井 2.82
尕海湖 0.98 雅丹国家地质公园 0.39
tf-idf Values for Attractions in the Travel Notes
The Relationship Between Precision and Similarity
The Relationship Between Number of Recognition Errors and Similarity
方法 平均查全率 平均查准率 F值
条件随机场 81.38% 75.33% 0.782 4
本文方法 90.72% 89.65% 0.901 8
Attractions Recognition Results
