Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (2): 14-31    DOI: 10.11925/infotech.2096-3467.2020.1026
Current Issue | Archive | Adv Search |
Identifying Relationship Between Pollution Sources and Cancer Cases with Spatial Ordered Pair Patterns
Xie Wang,Wang Lizhen(),Chen Hongmei,Zeng Lanqing
School of Information Science and Engineering, Yunnan University, Kunming 650500, China
Download: PDF (1577 KB)   HTML ( 14
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to identify the relationship between pollution sources and cancer cases, aiming to address the issues of discovering too many non-pertnient patterns by method using spatial co-location patterns. [Methods] First, we combined the properties of Voronoi diagram and the star instance model. Then, we defined the proximity relationship between spatial instances and the concept of spatial ordered pair patterns. Third, we decided the prevalence and the influence of the spatial ordered pair patterns based on the distance attenuation and the influence superposition effects. Finally, we proposed a basic algorithm and an optimization algorithm to examine the spatial ordered pair patterns.[Results] The proposed algorithms revealed more pertinent relationship which cannot be identified by the traditional algorithms. And the total number of results was much less than those of the traditional algorithms. Compared with the basic algorithm, the pruning rate of the optimization algorithm surpassed 80%. The larger the data set, the better the results. [Limitations] The default data are all point-spatial objects, while the extended spatial objects merit more studies. [Conclusions] The spatial ordered pair patterns could effectively identify the relationship between pollution sources and cancer cases.

Key wordsSpatial Data Mining      Spatial Ordered Pair Pattern      Voronoi Diagram      Pollution Source      Cancer Case     
Received: 21 October 2020      Published: 11 March 2021
ZTFLH:  TP391  
Fund:National Natural Science Foundation of China(61966036);National Natural Science Foundation of China(61662086);Project of Innovative Research Team of Yunnan Province of China(2018HC019)
Corresponding Authors: Wang Lizhen ORCID:0000-0003-2214-2299     E-mail: lzhwang@ynu.edu.cn

Cite this article:

Xie Wang, Wang Lizhen, Chen Hongmei, Zeng Lanqing. Identifying Relationship Between Pollution Sources and Cancer Cases with Spatial Ordered Pair Patterns. Data Analysis and Knowledge Discovery, 2021, 5(2): 14-31.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.1026     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I2/14

An Example of Spatial Features and Instances Distribution
Voronoi Partition of Pollution Source Instance Set on Cancer Feature a
Voronoi Partition of Pollution Source Instance Set on Cancer Feature b
Curve Function f(x)=(cosπx)/2+0.5
数据类型 特征数 实例数 范围
癌症数据 26 5 238 经度102.5~105.5
纬度25~27
污染源数据 7 986
Real Data Set Parameters
Distribution of the Real Data Set
参数 默认值
α 0.2
min_prev 0.3
min_pii 0.6
Default Parameter Description
Effect of α on the Number of Candidate Patterns on the Real Data Set
Effect of α on Execution Time on the Real Data Set
Effect of min_prev on the Number of Candidate Patterns on the Real Data Set
Effect of min_prev on Execution Time on the Real Data Set
Effect of min_pii on the Number of Candidate Patterns on the Real Data Set
Effect of min_pii on Execution Time on the Real Data Set
α 0.10 0.12 0.14 0.16 0.18
距离阈值(米) 2 324.49 2 789.39 3 254.28 3 719.18 4 184.08
算法2挖掘到模式数量 9 13 13 15 15
join-less算法挖掘到模式数量 227 344 531 897 1 290
fraction-score算法挖掘到模式数量 41 51 62 72 79
join-less算法挖掘到有意义模式数量 27 40 74 94 125
fraction-score算法挖掘到有意义模式数量 3 5 5 8 8
相同模式数量 0 0 0 1 1
Mining Results of Algorithm 2, join-less Algorithm and fraction-score Algorithm
模式阶 模式 PI PII
2阶 [{金属加工厂},{多系统继发性恶性肿瘤}] 0.7 0.69
[{化工厂},{多系统继发性恶性肿瘤}] 0.7 0.7
3阶 [{金属加工厂,化工厂},{多系统继发性恶性肿瘤}] 0.6 0.6
[{金属加工厂,纺织厂},{多系统继发性恶性肿瘤}] 0.6 0.6
[{金属加工厂,发电厂},{多系统继发性恶性肿瘤}] 0.6 0.6
[{化工厂,纺织厂},{多系统继发性恶性肿瘤}] 0.6 0.6
[{化工厂,发电厂},{多系统继发性恶性肿瘤}] 0.6 0.6
[{纺织厂,发电厂},{多系统继发性恶性肿瘤}] 0.6 0.66
Mining Results on Real Data Set
Pollution Source Instances and Cancer Instances Distribution
Effect of Number of Features on Execution Time over Synthetic Data Sets
Effect of Number of Instances on Execution Time over Synthetic Data Sets
Effect of Number of Features and Number of Instances on Execution Time over Synthetic Data Sets at the Same Time
[1] Bray F, Ferlay J, Soerjomataram I, et al. Global Cancer Statistics 2018: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries[J]. CA: A Cancer Journal for Clinicians, 2018,68(6):394-424.
doi: 10.3322/caac.v68.6
[2] Chen W, Zheng R, Baade P D, et al. Cancer Statistics in China, 2015[J]. CA: A Cancer Journal for Clinicians, 2016,66(2):115-132.
doi: 10.3322/caac.21338
[3] 余艳琴, 乔友林. 人群肿瘤环境危险因素归因危险度概述[J]. 现代预防医学, 2019,46(1):162-165, 175.
[3] ( Yu Yanqin, Qiao Youlin. Attributable Risk Factors of Tumor Environmental, China[J]. Modern Preventive Medicine, 2019,46(1):162-165, 175.)
[4] Huang Y, Shekhar S, Xiong H. Discovering Colocation Patterns from Spatial Data Sets: A General Approach[J]. IEEE Transactions on Knowledge and Data Engineering, 2004,16(12):1472-1485.
doi: 10.1109/TKDE.2004.90
[5] Yoo J S, Shekhar S, Smith J, et al. A Partial Join Approach for Mining Co-location Patterns[C]//Proceedings of the 12th Annual ACM International Workshop on Geographic Information Systems (GIS), Washington. New York: ACM, 2004: 241-249.
[6] Yoo J S, Shekhar S, Celik M. A Join-Less Approach for Co-location Pattern Mining: A Summary of Results[C]//Proceedings of the 5th IEEE International Conference on Data Mining. IEEE, 2005: 813-816.
[7] Wang L, Bao Y, Lu Z. Efficient Discovery of Spatial Co-location Patterns Using the iCPI-tree[J]. The Open Information Systems Journal, 2009,3(2):69-80.
doi: 10.2174/1874133900903020069
[8] Wang L, Bao X, Chen H, et al. Effective Lossless Condensed Representation and Discovery of Spatial Co-location Patterns[J]. Information Sciences, 2018, 436-437:197-213.
doi: 10.1016/j.ins.2018.01.011
[9] Wang L, Bao X, Zhou L. Redundancy Reduction for Prevalent Co-location Patterns[J]. IEEE Transactions on Knowledge and Data Engineering, 2018,30(1):142-155.
doi: 10.1109/TKDE.69
[10] Tobler W R. A Computer Movie Simulating Urban Growth in the Detroit Region[J]. Economic Geography, 2016,46(1970):234-240.
doi: 10.2307/143141
[11] 胡新, 王丽珍, 周丽华, 等. 空间极大co-location模式挖掘研究[J]. 计算机科学与探索, 2014,8(2):150-160.
doi: 10.3778/j.issn.1673-9418.1306010
[11] ( Hu Xin, Wang Lizhen, Zhou Lihua, et al. Mining Spatial Maximal Co-location Patterns[J]. Journal of Frontiers of Computer Science and Technology, 2014,8(2):150-160.)
doi: 10.3778/j.issn.1673-9418.1306010
[12] 王光耀, 王丽珍, 杨培忠, 等. 极小负co-location模式及有效的挖掘算法[J]. 计算机科学与探索, 2021,15(2):366-378.
[12] ( Wang Guangyao, Wang Lizhen, Yang Peizhong, et al. Minimal Negative Co-location Patterns and Effective Mining Algorithm[J]. Journal of Frontiers of Computer and Technology, 2021,15(2):366-378.)
[13] Chan H K, Cheng L, Da Y, et al. Fraction-Score: A New Support Measure for Co-location Pattern Mining[C]//Proceedings of the 2019 IEEE 35th International Conference on Data Engineering. IEEE, 2019: 1514-1525.
[14] Wang L, Han J, Chen H, et al. Top-k Probabilistic Prevalent Co-location Mining in Spatially Uncertain Data Sets[J]. Frontiers of Computer Science, 2016,10(3):488-503.
doi: 10.1007/s11704-015-4196-9
[15] Wang L, Chen H, Zhao L, et al. Efficiently Mining Co-location Rules on Interval Data[C]//Proceedings of the 6th International Conference on Advanced Data Mining and Applications. Berlin: Springer, 2010: 477-488.
[16] Ouyang Z, Wang L, Wu P. Spatial Co-location Pattern Discovery from Fuzzy Objects[J]. International Journal on Artificial Intelligence Tools, 2017,26(2):1-20.
[17] Yang P, Wang L, Wang X, et al. An Effective Approach on Mining Co-location Patterns from Spatial Databases with Rare Features[C]//Proceedings of the 20th IEEE International Conference on Mobile Data Management. IEEE, 2019: 53-62.
[18] 王晓璇, 王丽珍, 陈红梅, 等. 基于特征效用参与率的空间高效用co-location模式挖掘方法[J]. 计算机学报, 2019,42(8):1721-1738.
[18] ( Wang Xiaoxuan, Wang Lizhen, Chen Hongmei, et al. Mining Spatial High Utility Co-location Patterns Based on Feature Utility Ratio[J]. Chinese Journal of Computers, 2019,42(8):1721-1738.)
[19] Ge Y, Yao Z, Li H. Computing Co-location Patterns in Spatial Data with Extended Objects: A Scalable Buffer-based Approach[J]. IEEE Transactions on Knowledge and Data Engineering, 2019.
doi: 10.1109/TKDE.2012.149 pmid: 24693210
[20] Tran V, Wang L. Delaunay Triangulation-based Spatial Co-location Pattern Mining Without Distance Thresholds[J]. Statistical Analysis and Data Mining: The ASA Data Science Journal, 2020,13(3):282-304.
doi: 10.1002/sam.v13.3
[21] Qian F, He Q, Chiew K, et al. Spatial Co-location Pattern Discovery Without Thresholds[J]. Knowledge and Information Systems, 2012,33(2):419-445.
doi: 10.1007/s10115-012-0506-9
[22] Qian F, Chiew K, He Q, et al. Mining Regional Co-location Patterns with kNNG[J]. Journal of Intelligent Information Systems, 2014,42(3):485-505.
doi: 10.1007/s10844-013-0280-5
[23] Li J, Adilmagambetov A, Mohomed Jabbar M S, et al. On Discovering Co-location Patterns in Datasets: A Case Study of Pollutants and Child Cancers[J]. Geoinformatica, 2014,20(4):651-692.
doi: 10.1007/s10707-016-0254-1
[24] 储传鑫, 王丽珍, 周丽华, 等. 恶性肿瘤与工业污染之间的模糊关系挖掘[J]. 计算机科学与探索, 2020,14(12):2061-2071.
[24] ( Chu Chuanxin, Wang Lizhen, Zhou Lihua, et al. Mining the Fuzzy Relationship Between Malignant Tumors and Industrial Pollution[J]. Journal of Frontiers of Computer Science and Technology, 2020,14(12):2061-2071.)
[1] Ge Dengke,WangYamin. Discovery of Spatial Association Rules Based on GIS[J]. 现代图书情报技术, 2009, 25(7-8): 97-101.
[2] Sun Wandong,Yue Jun,Zhang Jing. Literatures Supply Chain Knowledge Representation and Reasoning Based on Ontology Theory[J]. 现代图书情报技术, 2007, 2(12): 34-38.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn