Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (7): 58-73     https://doi.org/10.11925/infotech.2096-3467.2022.0704
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于并置模式的轨迹热点挖掘研究*
颜瑞彬,尹德春(),顾益军
中国人民公安大学信息网络安全学院 北京 102600
Mining Trajectory Hotspots Based on Co-location Patterns
Yan Ruibin,Yin Dechun(),Gu Yijun
College of Information and Cyber Security, People's Public Security University of China, Beijing 102600, China
全文: PDF (3386 KB)   HTML ( 6
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 为降低轨迹热点挖掘的时空复杂度,针对不同的轨迹数据特征,分别提出基于N度路径表连接、基于N度路径表遍历和基于图数据库的轨迹热点挖掘算法。【方法】 如果轨迹数据不存在明显的图结构,基于N度路径表连接和基于N度路径表遍历的算法根据轨迹数据分布是否密集,选择连接或遍历的方式对路径表进行多次迭代,从而得到轨迹热点。如果轨迹数据明显存在图结构,基于图数据库的算法在图数据库中做遍历搜索和剪枝优化,从而得到轨迹热点。【结果】 在ChoroChronos开源真实数据集上展开实验。在时间复杂度上,基于图数据库的轨迹热点挖掘算法与表现最好的对比算法相比,运行时间减少1/4。在空间复杂度上,基于N度路径表连接和基于N度路径表遍历的算法与表现最好的对比算法相比,占用内存空间减少2/3。【局限】 未考虑轨迹序列包含的时序特征,未在更广泛的数据集上展开实验。【结论】 与其他的轨迹热点挖掘对比算法相比,本文算法能够有效降低时空复杂度。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
颜瑞彬
尹德春
顾益军
关键词 轨迹大数据热点路径挖掘轨迹序列有向图并置模式    
Abstract

[Objective] This paper proposes multiple Trajectory Traversal Hotspots Mining algorithms based on different trajectory characteristics like N-Degree Trajectory Table Join, N-Degree Trajectory Table Traversal, and graph databases. These algorithms will help us reduce the time and space complexity of trajectory hotspots mining, [Methods] If the trajectory data does not form a complete graph structure, we will use the N-Degree Trajectory Table Join algorithm or N-Degree Trajectory Table Traversal algorithm to iterate the path table multiple times. Based on the distribution density of the trajectory data, the algorithms help us obtain the hotspots. If the trajectory data forms a graph structure, the Trajectory Traversal Hotspots Search algorithm will perform traversal search and pruning optimization to obtain the trajectory hotspots. [Results] We conducted experiments with the ChoroChronos open-source dataset. Regarding time complexity, the running time of the Trajectory Traversal Hotspots Search algorithm was reduced by 25% compared with the best comparison algorithm. Regarding space complexity, the N-Degree Trajectory Table Join algorithm and N-Degree Trajectory Table Traversal algorithm consume 67% less memory space than the best comparison algorithm. [Limitations] We still need to fully utilize the temporal features in the trajectory sequences and should conduct experiments on a more comprehensive dataset. [Conclusions] Compared with other trajectory hotspots mining algorithms, the proposed one effectively reduces the space and time complexity.

Key wordsTrajectory Big Data    Hotspot Detection    Trajectory Sequence    Directed Graph    Co-Location Pattern
收稿日期: 2022-07-08      出版日期: 2022-11-09
ZTFLH:  TP301  
  G350  
基金资助:*公安部科技强警基础工作专项项目(2020GABJC02);中国人民公安大学基本科研业务费项目的研究成果之一(2022JKF02039)
通讯作者: 尹德春, E-mail: yindechun@ppsuc.edu.cn。   
引用本文:   
颜瑞彬, 尹德春, 顾益军. 基于并置模式的轨迹热点挖掘研究*[J]. 数据分析与知识发现, 2023, 7(7): 58-73.
Yan Ruibin, Yin Dechun, Gu Yijun. Mining Trajectory Hotspots Based on Co-location Patterns. Data Analysis and Knowledge Discovery, 2023, 7(7): 58-73.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0704      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I7/58
Fig.1  轨迹序列关系示例
Fig.2  有关轨迹数据的图结构示例
Fig.3  有关轨迹数据的疏密程度示例
N阶子路径 包含N阶子路径的轨迹序列集合
n1n2→…→nk {R1,…,Rj,…,Rn}
Table 1  N阶“路径长度-频繁度”表
Fig.4  NDTTJ算法流程
Fig.5  NDTTT算法流程
Fig.6  TTHS算法流程
算法的复杂度 NDTTJ NDTTT TTSH
时间复杂度 O ( i = 2 k ( c i 2 ) + i = 1 k ( c i + n ) ) O ( l o g 2 n + i = 1 k c i + i = 2 k ( c i × l o g 2 n ) ) O ( n )
空间复杂度 O (n + 2 c 1 + i = 2 k ( c i 2 ) ) O (2 n + 2 c 1 + i = 2 k c i ) O ( n + c f )
时间渐进复杂度 O ( n 2 ) O ( n × l o g n ) O ( n )
空间渐进复杂度 O ( n 2 ) O ( n ) O ( n )
Table 2  三种算法的时空复杂度
参数 意义
kmin 路径长度阈值
mmin 频繁度阈值
N 实例个数
α 剪枝量
T 挖掘时所用时间
M 运行时占用内存空间
Table 3  实验数据参数
软硬件环境 参数值
CPU Intel(R) Core(TM) i5-9300H, 2.40GHz
操作系统 Windows 10 64 bit
Python Python 3.6.5 (64-bit)
Neo4j Neo4j-Community-3.5.14
Table 4  实验平台参数
Fig.7  数据数量对时间的影响
算法 1 000条实验数据量下 5 000条实验数据量下
阈值
为3
阈值
为5
阈值
为7
阈值
为3
阈值
为5
阈值
为7
TSPMG-B 3 559 5 664 7 692 14 462 25 807 37 022
TSPMG-P 2 578 3 827 4 970 11 019 18 456 25 287
NDTTJ 2 537 2 523 2 519 10 208 9 893 9 619
NDTTT 2 545 2 529 2 521 10 012 9 907 9 662
Table 5  不同算法处理位置信息总数量对比
Fig.8  剪枝量与数据数量、阈值关系
Fig.9  阈值与时间关系
[1] 曾庆田, 戴明弟, 李超, 等. 轨迹数据融合用户表示方法的重要位置发现[J]. 数据分析与知识发现, 2019, 3(6): 75-82.
[1] (Zeng Qingtian, Dai Mingdi, Li Chao, et al. Discovering Important Locations with User Representation and Trace Data[J]. Data Analysis and Knowledge Discovery, 2019, 3(6): 75-82.)
[2] 陆妍玲, 韦晶闪, 赵雨萌, 等. 提取热点区域的时空轨迹数据聚类分析[J]. 数学的实践与认识, 2021, 51(13): 129-138.
[2] (Lu Yanling, Wei Jingshan, Zhao Yumeng, et al. Research on Cluster Analysis of Spatio-Temporal Trajectory Data for Extracting Hot Spots[J]. Mathematics in Practice and Theory, 2021, 51(13): 129-138.)
[3] Li H H, Liu J X, Wu K F, et al. Spatio-Temporal Vessel Trajectory Clustering Based on Data Mapping and Density[J]. IEEE Access, 2018, 6: 58939-58954.
doi: 10.1109/ACCESS.2018.2866364
[4] 王侃, 梅克进, 朱家辉, 等. 基于时空轨迹的热点区域提取[J]. 电子科技大学学报, 2019, 48(6): 925-930.
[4] (Wang Kan, Mei Kejin, Zhu Jiahui, et al. Hotspots Extraction Based on Spatial-Temporal Trajectory Data[J]. Journal of University of Electronic Science and Technology of China, 2019, 48(6): 925-930.)
[5] Gariel M, Srivastava A N, Feron E. Trajectory Clustering and an Application to Airspace Monitoring[J]. IEEE Transactions on Intelligent Transportation Systems, 2011, 12(4): 1511-1524.
doi: 10.1109/TITS.2011.2160628
[6] Yuan G, Sun P H, Zhao J, et al. A Review of Moving Object Trajectory Clustering Algorithms[J]. Artificial Intelligence Review, 2017, 47(1): 123-144.
doi: 10.1007/s10462-016-9477-7
[7] 牟乃夏, 徐玉静, 张恒才, 等. 移动轨迹聚类方法研究综述[J]. 测绘通报, 2018(1): 1-7.
doi: 10.13474/j.cnki.11-2246.2018.0001
[7] (Mou Naixia, Xu Yujing, Zhang Hengcai, et al. A Review of the Mobile Trajectory Clustering Methods[J]. Bulletin of Surveying and Mapping, 2018(1): 1-7.)
doi: 10.13474/j.cnki.11-2246.2018.0001
[8] 高强, 张凤荔, 王瑞锦, 等. 轨迹大数据: 数据处理关键技术研究综述[J]. 软件学报, 2017, 28(4): 959-992.
[8] (Gao Qiang, Zhang Fengli, Wang Ruijin, et al. Trajectory Big Data: A Review of Key Technologies in Data Processing[J]. China Industrial Economics, 2017, 28(4): 959-992.)
[9] Pei J, Han J W, Mortazavi-Asl B, et al. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth[C]// Proceedings of the 17th International Conference on Data Engineering. IEEE, 2002: 215-224.
[10] 吕锋, 张炜玮. 4种序列模式挖掘算法的特性研究[J]. 武汉理工大学学报, 2006, 28(2): 57-60.
doi: 10.1007/s11595-013-0640-6
[10] (Lu Feng, Zhang Weiwei. Research on the Characters of Four Sequential Patterns Mining Algorithms[J]. Journal of Wuhan University of Technology, 2006, 28(2): 57-60.)
doi: 10.1007/s11595-013-0640-6
[11] 胡自松, 王丽珍, Vanha Tran, 等. 基于图数据库的空间频繁并置模式挖掘[J]. 计算机科学与探索, 2022, 16(4): 806-821.
doi: 10.3778/j.issn.1673-9418.2010015
[11] (Hu Zisong, Wang Lizhen, Vanha Tran, et al. Mining Spatial Prevalent Co-Location Patterns Based on Graph Databases[J]. Journal of Frontiers of Computer Science & Technology, 2022, 16(4): 806-821.)
doi: 10.3778/j.issn.1673-9418.2010015
[12] Zheng K, Zheng Y, Yuan N J, et al. Online Discovery of Gathering Patterns over Trajectories[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8): 1974-1988.
doi: 10.1109/TKDE.2013.160
[13] Zhao P X, Qin K, Ye X Y, et al. A Trajectory Clustering Approach Based on Decision Graph and Data Field for Detecting Hotspots[J]. International Journal of Geographical Information Science, 2017, 31(6): 1101-1127.
[14] Yang Y F, Cui Z, Wu J, et al. Trajectory Analysis Using Spectral Clustering and Sequene Pattern Mining[J]. Journal of Computational Information Systems, 2012, 8(6):2637-2645.
[15] Khetarpaul S, Chauhan R, Gupta S K, et al. Mining GPS Data to Determine Interesting Locations[C]// Proceedings of the 8th International Workshop on Information Integration on the Web: In Conjunction with WWW 2011. New York: ACM, 2011: 1-6.
[16] Li F, Shi W Z, Zhang H. A Two-Phase Clustering Approach for Urban Hotspot Detection with Spatiotemporal and Network Constraints[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021, 14: 3695-3705.
doi: 10.1109/JSTARS.2021.3068308
[17] Koperski K, Han J W. Discovery of Spatial Association Rules in Geographic Information Databases[C]// Proceedings of the 4th International Symposium on Advances in Spatial Databases. New York: ACM, 1995: 47-66.
[18] Agrawal R, Srikant R. Mining Sequential Patterns[C]// Proceedings of the 11th International Conference on Data Engineering. IEEE, 2002: 3-14.
[19] Bao X G, Wang L Z. A Clique-Based Approach for Co-Location Pattern Mining[J]. Information Sciences, 2019, 490: 244-264.
doi: 10.1016/j.ins.2019.03.072
[20] Wang X X, Lei L, Wang L Z, et al. Spatial Co-Location Pattern Discovery Incorporating Fuzzy Theory[J]. IEEE Transactions on Fuzzy Systems, 2022, 30(6): 2055-2072.
doi: 10.1109/TFUZZ.2021.3074074
[21] Lei M T, Zhang X, Chu L Y, et al. Finding Route Hotspots in Large Labeled Networks[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(6): 2479-2492.
doi: 10.1109/TKDE.2019.2956924
[22] Lei M T, Chu L Y, Wang Z F, et al. Mining Top-K Sequential Patterns in Transaction Database Graphs[J]. World Wide Web, 2020, 23(1): 103-130.
doi: 10.1007/s11280-019-00686-w
[23] Yoo J S, Boulware D, Kimmey D. A Parallel Spatial Co-Location Mining Algorithm Based on MapReduce[C]// Proceedings of the IEEE International Congress on Big Data. IEEE, 2014: 25-31.
[24] Refaye E M, Hegazy O. Parallel Co-Location Pattern Mining Discovery Constraint Neighborhood Approach[J]. International Journal of Applied Engineering Research, 2016, 11(1): 586-591.
[25] Shekhar S, Huang Y. Discovering Spatial Co-Location Patterns: A Summary of Results[C]// Proceedings of the 7th International Symposium on Spatial and Temporal Databases. 2001: 236-256.
[26] Huang Y, Shekhar S, Xiong H. Discovering Colocation Patterns from Spatial Data Sets: A General Approach[J]. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(12): 1472-1485.
doi: 10.1109/TKDE.2004.90
[27] 徐文进, 管克航, 马越, 等. 基于K-means算法的轨迹数据热点挖掘算法[J]. 计算机与现代化, 2021(10): 23-28.
[27] (Xu Wenjin, Guan Kehang, Ma Yue, et al. Track Data Hot Spot Mining Algorithm Based on K-Means[J]. Computer and Modernization, 2021(10): 23-28.)
[28] 李旭东, 成烽. 一种基于密度峰值聚类的经典轨迹计算方法[J]. 中国电子科学研究院学报, 2019, 14(9): 967-972.
[28] (Li Xudong, Cheng Feng. Computing Classical Trajectories Using Density-Peak Based Clustering[J]. Journal of China Academy of Electronics and Information Technology, 2019, 14(9): 967-972.)
[29] 吴笛, 杜云艳, 易嘉伟, 等. 基于密度的轨迹时空聚类分析[J]. 地球信息科学学报, 2015, 17(10): 1162-1171.
doi: 10.3724/SP.J.1047.2015.01162
[29] (Wu Di, Du Yunyan, Yi Jiawei, et al. Density-Based Spatiotemporal Clustering Analysis of Trajectories[J]. Journal of Geo-Information Science, 2015, 17(10): 1162-1171.)
doi: 10.3724/SP.J.1047.2015.01162
[30] 沈志宏, 赵子豪, 王海波. 以图为中心的新型大数据技术栈研究[J]. 数据分析与知识发现, 2020, 4(7): 50-65.
[30] (Shen Zhihong, Zhao Zihao, Wang Haibo. Big Data Technology Stack Shifting: From SQL Centric to Graph Centric[J]. Data Analysis and Knowledge Discovery, 2020, 4(7): 50-65.)
[31] Lake P, Crowther P. Concise Guide to Databases[M]. London: Springer, 2013.
[32] 吕绍仟, 孟凡荣, 袁冠. 基于轨迹结构的移动对象热点区域发现[J]. 计算机应用, 2017, 37(1): 54-59.
doi: 10.11772/j.issn.1001-9081.2017.01.0054
[32] (Lyu Shaoqian, Meng Fanrong, Yuan Guan. Trajectory Structure-Based Moving Object Hotspots Discovery[J]. Journal of Computer Applications, 2017, 37(1): 54-59.)
doi: 10.11772/j.issn.1001-9081.2017.01.0054
[33] Pelekis N. Trucks Revised[EB/OL]. (2012-06-14). [2022-03-24]. https://ChoroChronos.datastories.org/?q=node/10.
[34] Zhidchenko T V, Seredina M N, Udintsova N M, et al. Design of Energy-Loaded Systems Using the Neo4j Graph Database[J]. IOP Conference Series: Earth and Environmental Science, 2021, 659(1): 012108.
doi: 10.1088/1755-1315/659/1/012108
[1] 仇丽青,贾玮,范鑫. 基于重叠社区的影响力最大化算法 *[J]. 数据分析与知识发现, 2019, 3(7): 94-102.
[2] 陈晓威, 史昱天. 社会网络中关键节点的识别——基于符号网络的PageRank算法改进[J]. 数据分析与知识发现, 2017, 1(8): 68-75.
[3] 薛福亮, 刘君玲. 基于用户间信任关系改进的协同过滤推荐方法*[J]. 数据分析与知识发现, 2017, 1(7): 90-99.
[4] 李志鹏, 李卫忠. 基于可拓小生境量子粒子群算法的特征选择*[J]. 数据分析与知识发现, 2017, 1(7): 82-89.
[5] 吴应良, 姚怀栋, 李成安. 一种引入间接信任关系的改进协同过滤推荐算法[J]. 现代图书情报技术, 2015, 31(9): 38-45.
[6] 姜书浩, 潘旭华, 薛福亮. 一种基于项目聚类的自主推荐多样性优化算法[J]. 现代图书情报技术, 2015, 31(5): 34-41.
[7] 原福永, 蔡红蕾. 一种在信任网络中随机游走的推荐算法[J]. 现代图书情报技术, 2014, 30(10): 70-75.
[8] 薛福亮, 张慧颖. 一种基于自组织映射与径向基函数预测补值的协同过滤推荐方法[J]. 现代图书情报技术, 2014, 30(7): 56-63.
[9] 姜书浩, 薛福亮. 一种利用协同过滤预测和模糊相似性改进的基于内容的推荐方法[J]. 现代图书情报技术, 2014, 30(2): 41-47.
[10] 张慧颖, 薛福亮. 一种利用Vague集理论改进的协同过滤推荐算法[J]. 现代图书情报技术, 2012, 28(3): 35-39.
[11] 张慧颖, 薛福亮. 一种集成客户终身价值与协同过滤的推荐方法[J]. 现代图书情报技术, 2012, 28(1): 46-52.
[12] 田金凤, 曾新红, 黄华军, 林伟明. 中文叙词表本体概念定义注释的自动构建研究[J]. 现代图书情报技术, 2011, (11): 9-16.
[13] 李英 吴园园 宁福锦. 基于PSO的K-means改进算法在证券客户细分中的应用[J]. 现代图书情报技术, 2010, 26(7/8): 88-94.
[14] 张巍,于洋,游宏梁. 面向词汇知识库自动构建的概念术语关系识别[J]. 现代图书情报技术, 2009, 25(11): 10-16.
[15] 丁振国,李成家,田宛欣. Web呼叫中心路由和排队策略研究[J]. 现代图书情报技术, 2009, 25(7-8): 65-69.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn